Plot Per-Gene Expression in a Sample to Compare GeneSets
From Array Suite Wiki
How to visualize expression of all genes in GeneSets for one or more samples
It is easy to summarize mean expression of GeneSets in an -Omic data expression matrix, using Summarize -Omic Data. But what if you want to visually inspect per-gene expression of each gene, in each GeneSet, within a sample (or multiple samples).
In this procedure, we will use multiple Table manipulation functions to "massage" a matrix into a format that will do exactly this, creating a BoxPlot that also contains expression values of each gene within each GeneSet, side-by-side, for multiple samples.
Input Data Requirements
In addition to an expression matrix (-Omic data or Table), one or more Lists containing Gene IDs will be required.
The basic procedure will be to generate a separate Samples X Genes Table object for each GeneSet, then concatenate these tables to generate a Source column for each row (gene). The resulting table can be used to create the GeneSet Comparison BoxPlots.
Step 1:Generate a Table for each GeneSet
With the expression matrix open, use the View Controller Variable Filter tab to filter for each list of genes:
Step 1a: Generate Table from Visible Data
While filtered, click Generate Data From View in the Task tab, to generate a new Table object containing expression only for genes in the selected List:
Step 1b: Rename new table
Rename the resulting table something meaningful for the GeneSet:
Step 1c: Repeat for remaining GeneSets
Repeat Steps 1a and 1b to create a group of tables, each containing a Sample X Gene matrix for each GeneSet, and optionally, the full expression matrix:
Step 2: Concatenate Tables
Select Table | Concatenate to combine all tables by column, while simultaneously generating a new column identifying the source of each row:
Step 2a: Select Tables to Concatenate
Using the + symbol, open the Select Data window, and select the tables with expression data for each GeneSet:
Step 2b: Change Output Parameters
Make sure that Remove duplicate names is deselected, and that Create source column is selected. Specify an output table name.
Step 3: Create GeneSet BoxPlot
The output table should be an expression matrix of Samples X Genes, but there may be redundant gene rows (if multiple GeneSets share the same gene), and there should be a Source column, named after the source table (e.g. UpInK562_1,UpInMCF7_2):
Step 3a: Add Variable View
Right-click on the merged table, and select Add View | Variable View, which will generate a View that looks similar to this:
Step 3b: Group Variables by GeneSet
The Source column indicates which table each row came from, i.e. which GeneSet the gene belongs to. To group each gene to its GeneSet, in the View Controller, select Specify Profile Column, and select Source:
Step 3c: Modify View
Optionally, change additional features of the plot:
- Specify Variable Columns: Choose which Sample(s) should be plotted, one per chart.
- Specify Transformation (log2+.1)
- Change Profile Gallery (RBoxPlot)
- Change Symbol Properties
- Color: Source
- Jitter: Increase to reduce overlap
- size: Increase to improve visibility
- Shape: Circle
- Change Fill Properties
- Opacity: Reduce to make transparent BoxPlot
In this plot, the first column shows (log2-)expression of every gene for sample SRR521461 (K562 cell). Subsequent columns show expression of genes that were identified by different methods as being up-regulated in either K562 or MCF7 cells. Each aspect of the BoxPlot (quartile, median, mean, etc) could be calculated by MicroArray Summary Statistics, but it can be useful to see each data point's contribution.
Advanced usage:Compare multiple samples in a single chart
By performing one additional manipulation on the concatenated expression table, multiple samples can be viewed together in a single chart.
Step 1: Stack Table Data
With the merged expression table selected, click Table | Stack Table.
Step 1a:Select Columns To Stack
In the left window, select all of the columns containing expression data, but not the Source column, then click the Right Button arrow to move these to the right (Stack) window:
Step 1b: Label Columns and Output Table
Do not select Drop non-stacked columns. Specify a Source label column name and Stacked data column name, as well as an Output table name:
Step 2:Add Variable View
The output table will have one gene's expression, in one sample, per row. There should be a Sample column and a Source (GeneSet) column:
As performed above, add a Variable View.
Step 2a:Group Sample and GeneSet Columns
Using Specify Multiple Profile Columns, choose the Sample and Source columns. Each GeneSet will be grouped by Sample, or Vice-versa, depending on the order of profiled rows.
Step 2b:Modify Variable View
If necessary, hide the AllGenes GeneSet to improve performance (drawing 60,000 points for each sample may be taxing on your computer).
In Change Chart Properties, rotate the X-axis labels to improve visibility of groups. Perform the other chart manipulations above, until you achieve the desired chart:
In this chart, it is clear that both "UpIn____" GeneSets contain genes that are equivalently expressed, on average, in the expected samples.
However, the expression in the contrasted samples differs significantly in how low average expression is between "UpIn___1" and "UpIn___2". This is not surprising, as UpIn___1 GeneSets were identified by P-value, while UpIn___2 were identified by Fold-Change.