Plot Per-Gene Expression in a Sample to Compare GeneSets

From Array Suite Wiki

Jump to: navigation, search

MultiSampleGeneSetBoxPlot.png

How to visualize expression of all genes in GeneSets for one or more samples

Contents

Overview

It is easy to summarize mean expression of GeneSets in an -Omic data expression matrix, using Summarize -Omic Data. But what if you want to visually inspect per-gene expression of each gene, in each GeneSet, within a sample (or multiple samples).

In this procedure, we will use multiple Table manipulation functions to "massage" a matrix into a format that will do exactly this, creating a BoxPlot that also contains expression values of each gene within each GeneSet, side-by-side, for multiple samples.

[back to top]

Input Data Requirements

In addition to an expression matrix (-Omic data or Table), one or more Lists containing Gene IDs will be required.

Procedure

The basic procedure will be to generate a separate Samples X Genes Table object for each GeneSet, then concatenate these tables to generate a Source column for each row (gene). The resulting table can be used to create the GeneSet Comparison BoxPlots.

Step 1:Generate a Table for each GeneSet

With the expression matrix open, use the View Controller Variable Filter tab to filter for each list of genes:

FilterByGeneList.png

Step 1a: Generate Table from Visible Data

While filtered, click Generate Data From View in the Task tab, to generate a new Table object containing expression only for genes in the selected List:

GenerateDataFromView.png

Step 1b: Rename new table

Rename the resulting table something meaningful for the GeneSet:

RenameTable.png

Step 1c: Repeat for remaining GeneSets

Repeat Steps 1a and 1b to create a group of tables, each containing a Sample X Gene matrix for each GeneSet, and optionally, the full expression matrix:

CompleteGeneSetTables.png

[back to top]

Step 2: Concatenate Tables

Select Table | Concatenate to combine all tables by column, while simultaneously generating a new column identifying the source of each row:

Table Concatenate Menu.pngTable Concatenate Window.png

Step 2a: Select Tables to Concatenate

Using the + symbol, open the Select Data window, and select the tables with expression data for each GeneSet:

Table Concatenate SelectData Window.png

Step 2b: Change Output Parameters

Make sure that Remove duplicate names is deselected, and that Create source column is selected. Specify an output table name.

Table Concatenate Window.png

[back to top]


Step 3: Create GeneSet BoxPlot

The output table should be an expression matrix of Samples X Genes, but there may be redundant gene rows (if multiple GeneSets share the same gene), and there should be a Source column, named after the source table (e.g. UpInK562_1,UpInMCF7_2):

MergedTable.png

Step 3a: Add Variable View

Right-click on the merged table, and select Add View | Variable View, which will generate a View that looks similar to this:

VariableView Default.png

Step 3b: Group Variables by GeneSet

The Source column indicates which table each row came from, i.e. which GeneSet the gene belongs to. To group each gene to its GeneSet, in the View Controller, select Specify Profile Column, and select Source:

SelectProfile Source Window.png

Step 3c: Modify View

Optionally, change additional features of the plot:

  • Specify Variable Columns: Choose which Sample(s) should be plotted, one per chart.
  • Specify Transformation (log2+.1)
  • Change Profile Gallery (RBoxPlot)
  • Change Symbol Properties
    • Color: Source
    • Jitter: Increase to reduce overlap
    • size: Increase to improve visibility
    • Shape: Circle
  • Change Fill Properties
    • Opacity: Reduce to make transparent BoxPlot
[back to top]

Output Results

SingleSampleGeneSetBoxPlot.png

In this plot, the first column shows (log2-)expression of every gene for sample SRR521461 (K562 cell). Subsequent columns show expression of genes that were identified by different methods as being up-regulated in either K562 or MCF7 cells. Each aspect of the BoxPlot (quartile, median, mean, etc) could be calculated by MicroArray Summary Statistics, but it can be useful to see each data point's contribution.

Advanced usage:Compare multiple samples in a single chart

By performing one additional manipulation on the concatenated expression table, multiple samples can be viewed together in a single chart.

Step 1: Stack Table Data

With the merged expression table selected, click Table | Stack Table.

Table Stack Menu.png

Step 1a:Select Columns To Stack

In the left window, select all of the columns containing expression data, but not the Source column, then click the Right Button arrow to move these to the right (Stack) window:

StackTable SelectColumns Window.png

Step 1b: Label Columns and Output Table

Do not select Drop non-stacked columns. Specify a Source label column name and Stacked data column name, as well as an Output table name:

StackTable SpecifyLabels Window.png

Step 2:Add Variable View

The output table will have one gene's expression, in one sample, per row. There should be a Sample column and a Source (GeneSet) column:

StackedTable.png

As performed above, add a Variable View.

Step 2a:Group Sample and GeneSet Columns

Using Specify Multiple Profile Columns, choose the Sample and Source columns. Each GeneSet will be grouped by Sample, or Vice-versa, depending on the order of profiled rows.

SpecifyMultipleProfileColumns.png

Step 2b:Modify Variable View

If necessary, hide the AllGenes GeneSet to improve performance (drawing 60,000 points for each sample may be taxing on your computer).

In Change Chart Properties, rotate the X-axis labels to improve visibility of groups. Perform the other chart manipulations above, until you achieve the desired chart:

MultiSampleGeneSetBoxPlot.png

In this chart, it is clear that both "UpIn____" GeneSets contain genes that are equivalently expressed, on average, in the expected samples.

However, the expression in the contrasted samples differs significantly in how low average expression is between "UpIn___1" and "UpIn___2". This is not surprising, as UpIn___1 GeneSets were identified by P-value, while UpIn___2 were identified by Fold-Change.

[back to top]


Related Articles

EnvelopeLarge2.png