RNASeq Comparison Land with PScript

From Array Suite Wiki

Jump to: navigation, search


Contents

Introduction

Land users are often interested in exploring data across multiple Lands, or comparing internal data with public OmicSoft Lands. Comparison Land is a type of ArrayLand, customized to store, integrate, and display both expression and comparison data from Omic projects. By creating a Comparison Land, users can search and compare differences in expression of genes, the fold change and significance calculations.

This page presents an ambitious Pscript that will automatically generate ComparisonLand from raw FASTQ files.

Input Dataset

For this demonstration, we will use the RNASeq_Tutorial RNA-seq tutorial dataset, starting with the fastq files. The full dataset is available on the SRA archives GSM958729 SRR521461-521463 and GSM958745 SRR521522-521524.

Step 1: Create a DiseaseLand

Create a Test Land in GUI

Create a Disease land by following the instruction here: Create_Land

Create comparison land.png
Name=0502_TestDiseaseLand
ReferenceLibraryID=Human.B37.3
GeneModelID=OmicsoftGene20130723
PrimaryGrouping=DiseaseCategory
SecondaryGrouping=TissueCategory
MutationGeneModelID=Uniprot.Ensembl75
VariantClassifiers=ClinVar_20160815,FunctionalMutation_20160815,1000GenomesSimple_20160815,ExAC_20160815,ESP6500_20160815,UK10K_20160815,RegulomeDB_20160815
Description=Disease data from human studies have carefully curated and incorporated from GEO (Gene Expression Omnibus), SRA (Sequence Read Archive), ArrayExpress, and more.
EnableLandNumericCache=True
EnableLandAnalysisCache=True

Step 2: Run the Pscript

Workflow of the Pipeline Script

The pscript can be found at the bottom of the page. This section will describe how to install the script.

  1. Generate ALVs (ArrayLand Vector files) from raw FASTQ data
  2. Convert ALVs to osprj (OmicSoft Project file)
  3. Generate generate TLVs (ArrayLand Comparison files) from the osprj
  4. Fetch metadata tables

Add PScript to ArrayServer

To add a PScript to ArrayServer, server admin could go to Server -> Manage -> Manage Scripts -> Manage Scripts

Manage server scripts.png


In the Manage Server Scripts window, click Add, choose Load from file nevigate to the *.pscript file, and click open, the pscript will be loaded to ArrayServer.

Load pscript to server.png


After adding the PScript to ArrayServer, users could create Land ALV files (the input data type for Lands) TLV files (for comparison) and publish all file to target Land by running the customized RNASeq_Pipeline.

Customized RNA Pscript.png

Prepare inputs for the Pscript

This pipeline script executes a number of potentially complicated steps, including a full statistical analysis. To successfully run this script, care must be made to properly format each file.

Add Comparison Mapping and Project Meta files

The two EXCEL and the BamFileName.txt file for this tutorial set can be found below:

File:0502ContrastMapping.xlsx

File:0502Raw2ComLand GPL11154.xlsx

File:BamFileName.txt

Submit a PScript job to ArrayServer

After downloading the raw RNASeq fastq files to ArrayServer, and put all required txt and EXCEL files to the working directory, please follow the setup shown below to submit a server job:

ServerJob setup RNA.png

Parameters setup

Within this PScript, there are seven parameters to setup, OutputFolder, StandaloneProjectName, ComparisonMetadata, SampleProjectMetadata, SampleBam, File_Name_Prefix, and ComparisonLandName.

  1. OutputFolder, users could specify the output folder for all ALVs, TLVs, osrpj, and the meta files.
  2. StandaloneProjectName, the project name for *.osprj, this has to be identical to the project name in the Comparison Meta Data file, and the Sample Project Meta Data file.
  3. ComparisonMetadata, an EXCEL file contain comparison information, this file is essential to generate TLVs.
  4. SampleProjectMetadata, an EXCEL file with at least 3 sheets, sheet_1 contents project information; sheet_2 contents sample information; sheet_3 contents comparison setup. This file is essential to generate *.osprj file.
  5. SampleBam, a txt file with bam file information for all input samples, as in this demo project, there are 6 samples, so there are 6 bams corresponding to each sample.
  6. File_Name_Prefix, the prefix for the output meta files, in order to avoid duplicate between different project.
  7. ComparisonLandName, the target comparison Land, to publish all ALVs, TLVs, and metadata files. It could be an exist Land, or a newly created Land, as in this demo, I created a test land 0502_TestDiseaseLand, so I published all land files to the test Land directly.

When the server job is done, ALVs, TLVs, Metadata files, and some other files would be available under the output folder:

Serverjob done.png

All ALVs and TLVs will be published to the target ComparisonLand automatically. Sample and project metadata have to be loaded manually.

Add Sample MetaData and Project MetaData to the Test Comparison Land

Warning.png WARNING: Please see This page for restrictions on metadata column names.

Once ALV files are published, users need to Manage Land Sample Metadata for visualization. Data cannot be viewed in the Sample Distribution view if Metadata have not been registered in the Land. Add sample metadata in Land Tab | Manage | Samples | Manage Sample Meta Data. If Manage Sample Meta Data is in grey (as in the screenshot below), you need to ask admin to change the Land permissions for you.

Manage samplemeta.png

The meta data table is required to have Sample ID, Subject ID, Primary Grouping, and Secondary Grouping columns exactly the way it was defined in the land.cfg file. Users can view the information that was provided in the land.cfg file by going to Manage | Show Land Statistics:

Showlandstatistics.png

Below is an example of the Sample Metadata file that would match the Test Oncoland from above.

Design metadata bam.png

In the Manage Sample Meta Data page, users can click Add/Replace tab to add metadata table:

Load SampleMeta fromOSOBJ.png

If you want users to be able to open RNA-Seq bam files in the Genome Browser directly from Land Views (Browse Selected Samples, the meta data table needs a "BamFileName" column and the Land configuration needs a few BAM/BAS configuration options.

Explore the Comparison Land created

After uploading metadata, the land views should be available.

Explor comparison land.png

The PScript

<Info>
Label=RNA-Seq Custom Pipeline Raw2ComparisonLand
Description=This pipeline will map fastqs to human genome, generate ALVs, create osprj, and convert to TLVs, then publish them to a target comparison Land
Category=NGS\RNA-Seq\Illumina
 
<Input>
ExternalScriptInputType=Files
ExternalScriptMenuText= RNASeq_Pipeline_DESeq
ExternalScriptMenuStructure=NGS\RNA_raw2ComparisonLand
ExternalScriptFileFilter=TXT files|*.fastq|*.gz|*.txt|*.fastq.gz
 
@PairedSamples@=True
~@PairedSamples@=Data is paired. Options are True or False (True by default)
~@PairedSamples@Levels=True,False
~@PairedSamples@ExclusiveLevels=True
 
@ThreadNumberPerJob@=8
~@ThreadNumberPerJob@=Number of threads to run for each of the steps.
~@ThreadNumerPerJob@Levels=1,2,3,4,5,6,7,8
~@ThreadNumberPerJob@ExclusiveLevels=False
 
@ParallelJobNumber@=8
~@ParallelJobNumber@=Number of parallel jobs to run for each of the steps
 
@Gzip@=Gzip
~@Gzip@=Set to Gzip if input files are gzipped or None
~@Gzip@Levels=Gzip,None
~@Gzip@ExclusiveLevels=True
 
@OutputFolder@=
~@OutputFolder@Type=FilePath
~@OutputFolder@=Output folder for results and BAM files
 
@StandaloneProjectName@=
~@StandaloneProjectName@Type=String
~@StandaloneProjectName@=Input the name for the server project
 
@ComparisonMetadata@=
~@ComparisonMetadata@Type=FileName
~@ComparisonMetadata@=Comparison Meta Data
 
@SampleProjectMetadata@=
~@SampleProjectMetadata@Type=FileName
~@SampleProjectMetadata@=Sample Design for Server Project
 
@SampleBam@=
~@SampleBam@Type=FileName
~@SampleBam@=Sample Bam Files
 
@File_Name_Prefix@=
~@File_Name_Prefix@Type=String
 
@ComparisonLandName@=
~@ComparisonLandName@Type=String
~@ComparisonLandName@=Input the name for the Comparison Land
 
<Script>
//Mapping Section
Begin RnaSeqPipeline /Namespace=NgsLib /RunOnServer=True;
Files
"@FileNames@"; 
Reference Human.B37.3;
GeneModel OmicsoftGene20130723;
NgsQCWizard /Run=False;
FilterNgsFiles /Run=True;
Count /Run=True /AutoTrimUtr=True;
RnaSeqQCMetrics /Run=False;
SummarizeExonJunction /Run=False;
SummarizeMutation2Snp /Run=False;
CombinedFusionAnalysis /Run=False;
GenerateBas /Run=False;
GenerateLandAlv /Run=True;
Options /ParallelJobNumber=32 /ThreadNumberPerJob=8 /PairedEnd=@PairedSamples@ /FileFormat=AUTO /OutputFolder="@OutputFolder@" /CompressionMethod=Gzip /Gzip=True /Replace=False;
Output ;
End;
 
 
 
//Save project
Begin SaveProject;
End;
 
 
//Convert ALV files to simple Omicsoft project, *****DESeq2 is ued for DEG identification*****
Begin ComparisonLandTools /Namespace=NgsLib;
SearchFiles "@OutputFolder@/ALV/RnaSeq_Transcript" /Pattern=*.alv /Recursive=True;
    Reference Human.B37.3;
    GeneModel OmicsoftGene20130723;
    Options 
        /Action=AnalyzeRnaSeqData   
        /ProjectName="@StandaloneProjectName@"
        /ProjectDesignFileName="@SampleProjectMetadata@" 
        /ParallelJobNumber=1  
        /ThreadNumber=1 
        /OutputFolder="@OutputFolder@/TLV";
End;
 
 
 
//Save project
Begin SaveProject;
End;
 
 
 
//Convert Omicsoft project to TLV file
Begin ComparisonLandTools /Namespace=NgsLib;
Files "@OutputFolder@/TLV/@StandaloneProjectName@.osprj";
    Reference Human.B37.3;
    GeneModel OmicsoftGene20130723; 
   Options 
       /Action=ConvertInferenceReportToTlv                   
       /ComparisonMetaDataFileName="@ComparisonMetadata@"
       /DataFormat="xls" 
       #/ExportAllUniqueColumn=True                         
       /OutputFolder="@OutputFolder@/TLV/RnaSeq";
End;
 
 
Begin SaveProject;
End;
 
 
 
//Extract metadata from information file 
Begin ComparisonLandTools /Namespace=NgsLib;
Files "@SampleProjectMetadata@";
Options 
    /Action=ExtractMetaData                        
    /BamPropertyFileName="@SampleBam@"
    /ColumnNameMappingFileName=""
    /SkipSampleList=""
    /IncludeSampleList=""
    /FileNamePrefix="@File_Name_Prefix@"
    /OutputFolder="@OutputFolder@/@File_Name_Prefix@_metadata.txt";
End; 
 
 
Begin SaveProject;
End;
 
 
//publish ALVs and TLVs to target Comparison Land
Begin PublishAlv /Namespace=Land;
Land @ComparisonLandName@;
InputFolder "@OutputFolder@/ALV/RnaSeq_Transcript";
Exclude "";
Options /UseSgen=True /Memory=8GB /Recursive=True /PublishMode=Auto /ParallelJobNumber=4 /DataTypes=(all);
End;
 
Begin PublishAlv /Namespace=Land;
Land @ComparisonLandName@;
InputFolder "@OutputFolder@/TLV/RnaSeq";
Exclude "";
Options /UseSgen=True /Memory=8GB /Recursive=True /PublishMode=Auto /ParallelJobNumber=4 /DataTypes=(all);
End;
 
Begin SaveProject;
End;
[back to top]