Create ComparisonLand from External Inference Reports

From Array Suite Wiki

Jump to: navigation, search

Contents

Create ComparisonLand from External Inference Reports

If you would like to import your externally-generated expression and inference data into your ArrayLand, you can easily create Land-compatible .alv and .tlv files. With proper design of your metadata tables, you will be able to take full advantage of rich ArrayLand visualization and analysis functions, integrate your expression and inference data, and seamlessly combine your private data with expression and inference data from other Lands, such as DiseaseLand.

For a more detailed tutorial of these steps, you can follow our ComparisonLand from External Data tutorial, which includes a small example data set. You can quickly build your own small ComparisonLand; then you have the option of adding and removing metadata columns to the Sample Metadata and Comparison Metadata file, rebuilding, and testing how this affects the resulting Land Metadata.

Input Data Requirements

This workflow requires four data files:

  1. A table of expression values (Expression Data)
  2. A table of sample metadata
  3. A table of statistical inferences based on sample groupings
  4. A table of comparison metadata

The expression data will be added into Array Studio as an -Omic data object, and saved to generate an osobj file; this osobj file will be converted to ALV.

The inference data, along with sample and comparison metadata, will be directly converted from tab-delimited file to TLV.

Expression data File

Microarray Data

You should import your microarray Expression data file into a distributed Array Studio project, so that the objects are saved as separate files. You can either create a new Array Studio project, or import your data into an existing project.

Expression data should be imported as, or converted to, a file recognized by Array Studio as expression data, which will be saved as an -Omic data type in an Array Studio project. You will be prompted to attach Annotation and Design metadata during the import process; for consistency, your attached Design table should be the same as the Sample MetaData file (below).

Alternatively, you can Add Table Data, then convert the Table object to Omic data using the Convert to MicroArray function.

RNAseq Data

RNAseq data should be in a tab-delimited matrix format, with each column containing FPKM values for each sample, and each row containing all FPKM values for one gene. The first row should contain sample names matching the sample metadata file, and the first column should contain geneIDs, matching the geneIDs of the specified gene model.

Sample Metadata File

SampleMetaDataFile.png

Sample Metadata should contain useful information about each sample in the expression data set. The first column must have SampleIDs, matching the Expression Data sample names. ComparisonLand requires three additional Sample Metadata columns: Treatment, CellType, and Tissue.(but not DiseaseCategory or TissueCategory.)

Warning.png WARNING: Please see This page for restrictions on metadata column names.


OLandDiseaseState should also be included, using one of the Land Controlled Vocabulary terms. Besides these columns, you can include any additional sample Metadata, such as processing data, treatment dosage, time after treatment, etc.

[back to top]


Comparison Data File

ComparisonDataFile.png

The Comparison Data File will contain statistical inference values, including Fold-Change(log2-transformed), Raw- and FDR-Adjusted P-values, and the mean values for your case and control samples. You can include multiple comparisons in a single file, by repeating the set of five columns.

Tips.pngThe rawPValue and log2FC columns are required, the other columns are optional.


For each Comparison name (e.g. treatment1 vs control), data columns must follow following formats to be recognized:

Column title Comparison.log2FC ComparisonName.rawPValue ComparisonName.adjPValue ComparisonName.caseMean ComparisonName.controlMean
Contents Fold Change (log2-transformed) Raw P-value Adjusted P-Value Test group mean Control group mean
Required? Required Required Optional Optional Optional
[back to top]


Comparison MetaData File

ComparisonMetaDataFile.png

The Comparison MetaData File will contain information about the comparison. This includes both information about the sample groups (derived from the Sample MetaData File, as specified by the Comparison MetaData File MetaColumns column, and information about the comparison as a whole, as specified by additional columns in this file.

The TlvID column should be unique among all comparisons in your Land. If you have both microarray and RNA-seq data for the same samples, statistical tests from both data sets can be published, by providing unique TlvIDs, such as Treatment1.Microarray and Treatment1.RNAseq.

[back to top]


Step 1: Convert quantitative data to .alv files

Microarray Expression Data

Once you have imported your quantitative data as an -Omic data type in Array Studio, you can convert the resulting .osobj file into .alv files. You will find your .osobj file in the subdirectory of your Array Studio project that has the same name as your project.

Convert the .osobj file with ConvertExpressionOsobj. For each sample, and for each data type (e.g. MicroArrayExpression Data, RNA-seq data, protein chip levels), one .alv file will be created.

Example:

Begin LandTools /Namespace=NgsLib;
Files "Z:\Users\Joe\Tutorial\ComparisonLand\Final\
ComparisonLandFromExternal\
Brawndocin_Slurmycin.Expression.osobj";
Reference Human.B37.3;
GeneModel OmicsoftGene20130723;
Options /Action=ConvertExpressionOsobj
/SampleIDColumn="Observation" 
/MappingID=Affymetrix.HG-U133A_Human.B37.3
/IsRatio=False /MedianNormalization=False 
/TargetMedian=0 
/OutputFolder="Z:\Users\Joe\Tutorial\ComparisonLand\Final\ALV";
End;

In the tutorial example, this script converts one .osobj file, containing expression data for 56 samples, into 56 .alv files.

RNAseq Expression Data

RNAseq data will be converted directly from tab-delimited files into .alv files, using ConvertRnaSeqGeneTxt.

Tips.pngIf you have both gene-level and transcript-level data, you should instead use ConvertRnaSeq2Txt


Example:

Begin LandTools /Namespace=NgsLib;
Files "
C:\ComparisonLand\InputFiles\RNAseq\Brawndocin_Slurmycin.V2.RNAseq.txt
";
Reference Human.B37.3;
GeneModel OmicsoftGene20130723;
Options /Action=ConvertRnaSeqGeneTxt /SampleIDMappingFileName="C:\ComparisonLand\InputFiles\Brawndocin_Slurmycin.V2.Expression.Design.txt"  
/SampleIDColumnInMappingFile="Observation" /OutputFolder="C:\ComparisonLand\ALV\RNAseq\" /UseGeneIDForMapping=True;
End;
[back to top]


Step 2: Convert inference report to .tlv files

To generate ComparisonLand Vector (.tlv) files from your external inferences that have been properly formatted, use ExtractExternalInferenceReport. For each comparison, one .tlv file will be created.

Tips.pngThe inference report must have raw P-value (ComparisonName.rawPValue) and log2-transformed fold-change (ComparisonName.log2FC) columns.


Using the tutorial data set, this script will generate one .tlv file per comparison (e.g. Brawndocin_10uM_HTB-57).

Microarray Expression Data

The MappingID should be in the format AnnotationID_ReferenceLibraryID, and match a .mapping file, such as those included at [omicsoft.com/downloads/mapping].

Example:

Begin ComparisonLandTools /Namespace=NgsLib;
Files "";
Reference Human.B37.3;
GeneModel OmicsoftGene20130723;      
Options 
/Action=ExtractExternalInferenceReport
/ComparisonDataFile= "Z:\Users\Joe\Tutorial\ComparisonLand\Final\InputFiles\Brawndocin_Slurmycin.V2.Comparison.Data.txt"
/SampleMetaDataFile= "Z:\Users\Joe\Tutorial\ComparisonLand\Final\InputFiles\Brawndocin_Slurmycin.V2.Expression.Design.txt"
/ComparisonMetaFile= "Z:\Users\Joe\Tutorial\ComparisonLand\Final\InputFiles\Brawndocin_Slurmycin.V2.Comparison.MetaFile.txt"
/OutputFolder="Z:\Users\Joe\Tutorial\ComparisonLand\Final\TLV" 
/MappingID="Affymetrix.HT_HG-U133A_Human.B37.3" ;
End;

RNAseq Expression Data

RNA-seq data should have a mapping ID starting with NGS, followed by any name that would be useful, e.g. NGS.Human.B37.3-OmicSoft2013.

Example:

Begin ComparisonLandTools /Namespace=NgsLib;
Files "
";
Reference Human.B37.3;
GeneModel OmicsoftGene20130723;      
Options 
	/Action=ExtractExternalInferenceReport 
	/ComparisonDataFile="C:\ComparisonLand\InputFiles\RNAseq\Brawndocin_Slurmycin.V2.Comparison.RNAseq.txt"
	/SampleMetaDataFile="C:\ComparisonLand\InputFiles\RNAseq\Brawndocin_Slurmycin.V2.Expression.Design.RNAseq.txt"
	/ComparisonMetaFile="C:\ComparisonLand\InputFiles\RNAseq\Brawndocin_Slurmycin.V2.Comparison.Meta.RNAseq.txt"
	/OutputFolder="C:\ComparisonLand\TLV\RNAseq\"
	/MappingID="NGS.Human.B37.3" ;
End;
[back to top]


Step 3:Publish Files to Land

You can either add your Comparison data to an existing Land, or you can first Create a new Land.

To publish .alv and .tlv files created in previous steps, first use the File Browser to upload your files to your ArrayServer, in a location where you will be able to find them if you need to add or replace your data.

ComparisonLand Server AlvFiles.png ComparisonLand Server TlvFiles.png

After uploading your files to your ArrayServer, switch to the Land tab, and click Tools | Publish To Land.

ComparisonLand PublishToLand.png

You should also Add Sample Metadata at this time.

[back to top]


Step 4: Explore your Land

You may need to log off your Array Server and log on again to see your new Land data.

You will see that both expression and inference data are available in your Views. For example, the default Gene-Level View will likely be the Treatment vs Control Comparison View.

ComparisonLand GeneView.png

In this View, log2-fold change of your each comparison's case samples, as specified in your Comparison Data file, are plotted on the X-axis. The size of each circle representing a comparison reflects the P-value, also from the Comparison Data file.

Selecting a Comparison will display details in the Details Window, including the a Scatter Plot of the expression data for each sample, from the Expression Data file.

ArrayLands contain many different Views for exploring your data at different levels, as well as analytic modules to query your data. For more a more detailed walkthrough of ArrayLand views and analytics for Expression and Comparison data, please see the OncoLand and ImmunoLand tutorials, respectively.

[back to top]


Related Articles

EnvelopeLarge2.png