LandTools

From Array Suite Wiki

Jump to: navigation, search

Contents

Convert result files to alv (ArrayLand Vector) files

syntax

Begin LandTools /Namespace=NgsLib;

Files "/path-to-file/file.extension";

Reference Human.B37.3;

GeneModel OmicsoftGene20130723;

Options /Action=ConvertAction /OutputFolder=”/path-to-output-folder” /MORE-Options-See-examples;

End


Note:

  • OutputFolder is required for all land tools.
  • No lift over support was provided. The coordinates in files must be consistent with the reference library

Examples of converting data

ConvertMafFile converts MAF mutation file

Begin LandTools /Namespace=NgsLib;
Files "/InternalData/DNASeqMutation.maf";
Reference Human.B37.3;
GeneModel OmicsoftGene20130723;
Options /Action=ConvertMafFile /DataMode=DnaSeq_Mutation /OutputFolder="/output/Mutation";
End;

Notes:

  • DataMode is required for ConvertMafFile.

ConvertMutationTxtFile converts mutation text file

Begin LandTools /Namespace=NgsLib;
Files "/InternalData/Mutation.txt";
Reference Human.B37.3;
GeneModel OmicsoftGene20130723;
Options /Action=ConvertMutationTxtFile /DataMode=DnaSeq_Mutation /SampleIDColumn=SampleID /ChromosomeColumn=CHROM
/PositionColumn=POS /RefColumn=REF /AltColumn=ALT /RefCountColumn=RefCount /AltCountColumn=AltCount
/GenotypeColumn=Genotype /OutputFolder="/output/Mutation" /LiftOverB36=False;
End;

Notes:

  • DataMode is required for this action mode. Possible values are DnaSeq_Mutation, DnaSeq_SomaticMutation, DnaSeq_Mutation_Exome, DnaSeq_Mutation_Target, RnaSeq_Mutation, RnaSeq_SomaticMutation
  • Alternatively, use /CoverageColumn=Coverage and /FrequencyColumn=Frequency instead of /RefCountColumn and /AltCountColumn
  • For the Genotype column, the value has to be a string separated by "/". If it does not contain two alleles separated by "/", the value will be ignored and missing genotype will be stored. The two alleles can be either real alleles or 0/1 where 0 means reference allele (the values must be equal to RefAllele or AltAllele, or 0/1). If neither real allele or 0/1 are used, it will throw an exception. RefAllele/RefAllele or 0/0 are not allowed at this moment – if it is a mutation we are not allowing it to be a homogenous reference.
  • LiftOverB36: this option is Omicsoft internal only.
  • The positions and Ref/Alt should be the same as VCF definition. Here are examples for substitution, deletion, insertion and indels:
SampleID	Chr	Pos	Ref	Alt	RefCount	AltCount	Genotype
Sample01	17	7578275	G	A	26	9	0/1
Sample02	17	7579472	G	C	182	6	0/1
Sample03	17	7572986	G	A	142	6	1/1
Sample04	17	7577095	GT	G	8	36	1/1
Sample05	17	7574002	CG	C	170	27	0/1
Sample06	17	7578413	CA	C	136	96	0/1
Sample07	17	7577127	C	CA	3	26	1/1
Sample08	17	7579458	G	GTGCA	4	44	1/1
Sample09	17	7579569	T	TCCATCCAG	25	23	0/1
Sample10	17	7578181	G	GGCGGCTC	12	13	0/1
Sample11	17	7577141	CCC	CAA	13	8	0/1
Sample12	17	7578448	GCC	GAA	70	6	0/1
Sample13	17	7578418	TCC	TAA	26	16	0/1
Sample14	17	7578548	GGG	GAA	11	10	0/1

ConvertCnvSegmentFile converts a text file containing segment information

Begin LandTools /Namespace=NgsLib;
Files "/InternalData/CnvSegment.txt";
Reference Human.B37.3;
GeneModel OmicsoftGene20130723;
Options /Action=ConvertCnvSegmentFile  /SampleIDColumn="SampleID" /bamFileNameColumn="BamFileNameColumn" /ChromosomeColumn=Chromosome
/StartColumn=Start /EndColumn=End /Log2RatioColumn=Segment_Mean /TrimChr=False /OutputFolder="/output/CNV" /LiftOverB36=False 
/SampleIDMappingFileName="/input/CNV/IDMapping.txt" /SampleIDColumnInMappingFile="SampleID";
End;
  • The segment file can now be .gz, .gzip and .zip (based on file extension – no option for this)
  • If multiple segment files were provided, multiple threading will be used to generate the files.
  • The segment file does not need to contain sample ID – optionally a sample ID mapping file can be provided using SampleIDMappingFileName option. If SampleIDMappingFileName is provided, the first column must be the ID referenced in the segment file, and it must contain a sample ID column specified by "SampleIDColumnInMappingFile".

ConvertCnvSegmentOsobj converts Copy Number in osobj format

The object file can be easily generated from the segment file in ArrayStudio.

Begin LandTools /Namespace=NgsLib;
Files "/InternalData/SegmentData.osobj";
Reference Human.B37.3;
GeneModel OmicsoftGene20130723;
Options /Action=ConvertCnvSegmentOsobj /OutputFolder="/output/Mutation";
End;

Notes:

  • Input file is OmicSoft Object file. This object file can be easily generated from segment files in ArrayStudio.

ConvertAffymetrixCel converts Affymetrix .CEL files

This action can convert Affymetrix expression values in CEL files directly to land ALV Expression_Intensity_Probes files .

Begin LandTools /Namespace=NgsLib;
SearchFiles "/InternalData/CELFolder" /Pattern=*.CEL /Recursive=True;
Reference Human.B37.3;
GeneModel OmicsoftGene20130723;
Options /Action=ConvertAffymetrixCel /SampleIDColumn="SampleID" /CelFileMappingFileName="/temp/MetaData_CellLine.txt"
/OutputFolder="/output/Affy_Expression_LinearSum";
End;

Notes:

  • CelFileMappingFileName is a design file (the first column needs to be the chipId or celname without .cel suffix)

Example:

ChipID	SampleID
A0028420	OmicSample_1
A0028423	OmicSample_2
A0030245	OmicSample_3
  • By default, it is using Omicsoft Affymetrix Microarray Preprocessing (OSE) method. If user adds /Method=Rma in the Options, it will use RMA processing method.
  • In addition to the possible OSE/RMA signal extraction, we will do a secondary median normalization (target median to 500) to make sure the affymetrix CEL based signals are comparable to the GEO based processed signals. Similar normalization will be applied to GEO derived signals.
  • For each ALV, a gene will store 0 to N probes, and each probe will contain 6 values
    • Standard signal (could be RMA, OSE, or GEO processed)
    • Second signal (e.g. MAS5 – we always calculate MAS5 for CEL based signal extraction)
    • Detection p-value
    • Percentile (0-100): this will be used to do cross-study and cross-platform comparison. The rank of the gene among all genes in the same sample.
    • Platform index
    • Probe index

ConvertSst convert .sst.txt file

This action can convert SST (Sample Signal Textdump) expression values in .sst.txt files directly to land ALV files.

Begin LandTools /Namespace=NgsLib;
Files 
"
\\TSDATA2\TSData2\GeoRawDownload\SST\GPL10558\GSM1295890.sst.txt
\\TSDATA2\TSData2\GeoRawDownload\SST\GPL10558\GSM1295762.sst.txt
\\TSDATA2\TSData2\GeoRawDownload\SST\GPL10558\GSM1295853.sst.txt
";
Reference Human.B37.3;
GeneModel OmicsoftGene20130723;
Options /Action=ConvertSst /SampleIDColumn="SampleID" ValueColumn="VALUE"   
/SstFileMappingFileName="\\ws03\\SST_MetaData.txt" 
/OutputFolder="\\ws03\SST_Alv";
End; 

Notes:

  • SstFileMappingFileName is a sample mapping file (the first column is sst file name, like GSM1295890)

Example:

ID	SampleID
GSM1295890	OmicSample_1
GSM1295762	OmicSample_2
GSM1295853	OmicSample_3
  • If SampleIDColumn is named as "SampleID" above. It will generate alv, like Expression_Intensity_Probes.OmicSample_1.alv

XXXX.sst.txt data format is tab-delimited:

SstFileFormat.jpg

The annotation part starts with "#". Platform and ValueScale (log2/intensity/loge) have to be defined in the header of SST file. ValueColumn should be defined as one of columns in SSTfile.

ConvertAffymetrixCnvCel converts Affymetrix SNP/CNV .CEL files

This action can convert Affymetrix CNV/SNP values in CEL files directly to land ALV files.

Begin LandTools /Namespace=NgsLib;
SearchFiles "/InternalData/CELFolder" /Pattern=*.CEL /Recursive=True;
Reference Human.B37.3;
GeneModel OmicsoftGene20130723;
Options /Action=ConvertAffymetrixCnvCel /SampleIDColumn=SampleID /CelFileMappingFileName="/temp/MetaData_CellLine.txt"
/OutputFolder="/output/Affy_CNV";
End;

Notes:

  • The action cannot be used to combine 250k CNV cel
  • CelFileMappingFileName is a design file as described above.

ConvertAffymetrix500KCnvCel convert/combine Affymetrix SNP/CNV .CEL files

This action can convert/combine Affymetrix CNV/SNP values in 250k CEL files directly to land ALV files.

Begin LandTools /Namespace=NgsLib;
Files 
"
/CNV/dna_16/EA06028_0199-08A_Sty250_SS356912@PCRS.CEL
/CNV/dna_15/EA06028_0198-08A_Nsp250_SS356912@PCRN.CEL
";
Reference Human.B37.3;
GeneModel OmicsoftGene20130723;
Options /Action=ConvertAffymetrix500KCnvCel /SampleIDColumn=SampleID /CelFileMappingFileName="/temp/MetaData_CellLine.txt"
/OutputFolder="/output/Affy_CNV";
End;

Notes:

  • CelFileMappingFileName is a design file with information for pairing and sampleID
FileName	SampleID
EA06028_0199-08A_Sty250_SS356912@PCRS	test1
EA06028_0198-08A_Nsp250_SS356912@PCRN	test1

The files must be listed in pairs (1/2 are a pair, 3 and 4 are a pair, etc.)

ConvertExpression2Txt converts expression values from TWO text files

This action can convert expression files to ALV based on expression values and calls. Requires two input files:

  1. Expression data with probesets/probes in rows;
  2. Present/Absent text file with 1 indicating present and 0 indicating absent.
Begin LandTools /Namespace=NgsLib;
Files
"
/path/Expression.txt 
/path/PresentAbsent.txt
";
Reference Human.B37.3;
GeneModel OmicsoftGene20130723;
Options /Action=ConvertExpression2Txt /SampleIDColumn=SampleID /MappingID=Affymetrix.HG-U133_Plus_2_Human.B37.3
/IsRatio=False /MedianNormalization=False /TargetMedian=0 /IsLog10=False /OutputFolder="/output/Expression";
End;

ConvertExpressionTxt converts expression values from a text file with probe/probeset mapping file

This action can convert expression value file to ALV.

Begin LandTools /Namespace=NgsLib;
Files
"
/path/Expression.txt 
";
Reference Mouse.B38;
GeneModel Ensembl.R78; 
Options 
  /Action=ConvertExpressionTxt 
  /SampleIDMappingFileName="/test/expression_data_with_reportorID.txt"
  /SampleIDColumnInMappingFile=SampleID
  /ProbeMappingFileName="/test/probeInfo.mapping"
  /RowsAreObservations=True
  /IsRatio=True
  /IsLog10=True
  /MedianNormalization=False
  /TargetMedian=0
  /ThreadNumber=1 
  /OutputFolder="/test/ALV";
End;
  • ProbeMappingFileName can be generated by LandTools BuildProbeMapping action.
  • If a sample contains all missing values, ALV file will not be generated and a line will be appended to the log.

ConvertExpressionTxt converts expression values from a text file with SynonymousID

Begin LandTools /Namespace=NgsLib;
Files "/test/ExpressionData.txt";
Reference Human.B37.3;
GeneModel OmicsoftGene20130723; 
Options 
      /Action=ConvertExpressionTxt 
      /SynonymousID=Entrez_Human.B37.3_OmicsoftGene20130723_20150416
      /SkipColumnCount=1
      /RowsAreObservations=False
      /IsRatio=False
      /IsLog10=False
      /MedianNormalization=False
      /TargetMedian=0
      /ThreadNumber=1 
      /OutputFolder="/test/ALV";
End;
  • If SynonymousID is specified, mapping ID and mapping file are not needed, and we will assume that the synonymousId contains the mapping from the ID in the data (in this case Entrez ID) to the target gene model ID. For converting Nanostring data, user can use SynonymousID=HumanGenes_20150416. Here is Omicsoft Synonymous mapping file location: http://omicsoft.com/downloads/synonymous/
  • SkipColumnCount allows us to skip the unused second column in the data. This number can be more than 1.
  • If IsRatio=True, the alv is in Expression_Probes data mode; If IsRatio=False, the alv is in Expression_Intensity_Probes data mode.

ConvertPgdSomaticMutation converts somatic mutation from a PGD excel report file

Begin LandTools /Namespace=NgsLib;
Files "/test/PGD_CancerXome_Report.xlsx";
Reference Human.B37.3;
GeneModel OmicsoftGene20130723; 
Options 
      /Action=ConvertPgdSomaticMutation
      /SampleIDMappingFileName="/test/SampleMappingFile.txt"
      /SampleID=SampleID
      /OutputFolder="/test/ALV";
End;
  • Input file is PGD (Personal Genome Diagnostic, Inc.) report (such as CancerXome report) in excel format.
  • SampleIDMappingFileName is a text file with two columns. One column should list file names without .xlsx extension. Another column should list sample ID. Column headers are required.
  • SampleID should be the header of sample ID column in SampleIDMappingFileName.

Example:

FileName	SampleID
PGD_CancerXome_Report	Sample01
PGD_CancerXome_Report_2	Sample02

ConvertPgdCopyNumber converts copy number variants from a PGD excel report file

Begin LandTools /Namespace=NgsLib;
Files "/test/PGD_CancerXome_Report.xlsx";
Reference Human.B37.3;
GeneModel OmicsoftGene20130723; 
Options 
      /Action=ConvertPgdCopyNumber
      /SampleIDMappingFileName="/test/SampleMappingFile.txt"
      /SampleID=SampleID
      /OutputFolder="/test/ALV";
End;
  • Input file is PGD (Personal Genome Diagnostic, Inc.) report (such as CancerXome report) in excel format.
  • SampleIDMappingFileName is a text file with two columns. One column should list file names without .xlsx extension. Another column should list sample ID. Column headers are required.
  • SampleID should be the header of sample ID column in SampleIDMappingFileName.

Example:

FileName	SampleID
PGD_CancerXome_Report	Sample01
PGD_CancerXome_Report_2	Sample02

ConvertExpressionOsobj converts Expression OmicData in osobj format

This action can convert expression intensity or expression ratio (probe level or gene level).

Begin LandTools /Namespace=NgsLib;
Files "/InternalData/Expression_MicroArrayData.osobj";
Reference Human.B37.3;
GeneModel OmicsoftGene20130723;
Options /Action=ConvertExpressionOsobj /SampleIDColumn=SampleID /MappingID=Affymetrix.HG-U133_Plus_2_Human.B37.3
/IsRatio=False /MedianNormalization=False /TargetMedian=0 /OutputFolder="/output/Expression";
End;

Notes:

  • Input file is OmicSoft Object file. This object file can be easily generated from cel/other expression files in ArrayStudio
  • SampleIDColumn will rename the .alv files appropriately (such as using cell line name instead of expression file name). This will use the design column of the OmicData
  • MappingID by default was set to AnnotationID_ReferenceLibraryID so it is not needed by default. However, if the input data does not have managed annotation (or annotation not attached), then setting this value essentially tells Oshell/ArrayServer how to map the probes to genes. If the input file is gene level data already, then do not set this value.
  • Use IsRatio=True for Agilent ratio data.
  • User can choose to normalize values to have median=0. This is to protect the batch effects when performing normalization.

ConvertNgsmut3 converts RNA-Seq/DNA-Seq mutation files

Begin LandTools /Namespace=NgsLib;
SearchFiles "/InternalData/RnaSeq/Mutation" /Pattern=*.ngsmut3 /Recursive=True;
Reference Human.B37.3;
GeneModel OmicsoftGene20130723;
Options /Action=ConvertNgsmut3 /ThreadNumber=12 /BamFileMappingFileName="/InternalData/RnaSeq/BamFileMapping.txt" 
/SampleIDColumn="SampleID" /bamFileNameColumn="BamFileNameColumn" /MinimalTotalHit=10 /MinimalMutationHit=5 /MinimalMutationFrequency=0.20 
/OutputFolder="/output/RnaSeq_Mutation" /RnaMode=False /DataMode=DnaSeq_Mutation;
End;

Notes:

  • BamFile mapping file and SampleIDColumn must be provided to provide a mapping between bam file name (first column) and sample ID. The same sample ID will be used to link CNV, Affy, and DNA mutation data.

Example:

BamFileName	SampleID
FullPathToBAM\xxxx1.bam	OS001
FullPathToBAM\xxxx2.bam	OS002
FullPathToBAM\xxxx3.bam	OS003
  • Note the new syntax SearchFiles. This will be used if Files statement is not present. It is convenient to search for a folder and get the files with certain pattern.
  • ThreadNumber will accelerate the process here because inputs are multiple files

ConvertNgsm2s convert Mutation+SNP file

Begin LandTools /Namespace=NgsLib;
SearchFiles "/InternalData/RnaSeq/Mutation2Snp" /Pattern=*.ngsm2s /Recursive=True;
Reference Human.B37.3;
GeneModel OmicsoftGene20130723;
Options /Action=ConvertNgsm2s /ThreadNumber=12 /BamFileMappingFileName="/InternalData/RnaSeq/BamFileMapping.txt" 
/SampleIDColumn="SampleID" /bamFileNameColumn="BamFileNameColumn" /MinimalTotalHit=10 /MinimalMutationHit=5 /MinimalMutationFrequency=0.20 
/OutputFolder="/output/RnaSeq_Mutation" /RnaMode=False /DataMode=RnaSeq_Mutation;
End;

Notes:

  • BamFile mapping file and SampleIDColumn must be provided to provide a mapping between bam file name (first column) and sample ID. The same sample ID will be used to link CNV, Affy, and DNA mutation data.

Example:

BamFileName	SampleID
FullPathToBAM\xxxx1.bam	OS001
FullPathToBAM\xxxx2.bam	OS002
FullPathToBAM\xxxx3.bam	OS003

ConvertExonJunction convert ExonJunction ngsexj2 file

Begin LandTools /Namespace=NgsLib /RunOnServer=True;
SearchFiles /IData/ExonJunction" /Pattern=*.ngsexj2 /Recursive=True;
Reference Human.B37.3;
GeneModel OmicsoftGene20130723;
Options 
    /Action=ConvertExonJunction 
    /ThreadNumber=2 
    /BamFileMappingFileName="/IData/ExonJunction/BamFileMappingFile.txt" 
    /SampleIDColumn="SampleID" 
    /BamFileNameColumn="BamFilePath" 
    /OutputFolder="/IData/Output/LandALV";
End;

Notes:

  • BamFile mapping file and SampleIDColumn must be provided to provide a mapping between bam file name (first column) and sample ID. The same sample ID will be used to link CNV, Affy, and DNA mutation data.

Example:

BamFileName	SampleID
FullPathToBAM\xxxx1.bam	OS001
FullPathToBAM\xxxx2.bam	OS002
FullPathToBAM\xxxx3.bam	OS003

ConvertNgsmpv converts RNA-Seq/DNA-Seq Matched variation files

Begin LandTools /Namespace=NgsLib;
SearchFiles "/InternalData/RnaSeq/Mpv" /Pattern=*.ngsmpv /Recursive=True;
Reference Human.B37.3;
GeneModel OmicsoftGene20130723;
Options /Action=ConvertNgsmpv /ThreadNumber=12 /BamFileMappingFileName="/InternalData/RnaSeq/BamFileMapping.txt" 
/SampleIDColumn="SampleID" /bamFileNameColumn="BamFileNameColumn"  /OutputFolder="/output/RnaSeq_SomaticMutation" /IsRnaSeq=True;
End;


When generating ngsmpv files using Summarize Matched Pair Variation, a design file with Tumor/Normal Sample pair columns are used. The tumor sample name is used as the .ngsmpv file name. Internally, we use the tumor sample bam file name as the Tumor/Normal pair columns. Therefore, the result ngsmpv files are usually with the following pattern "abcde.bam.ngsmpv". The ConvertNgsmpv action will look at the BamFileMappingFile and convert it to alv file with sample ID:

BamFileName	SampleID
FullPathToBAM\xxxx1.bam	OS001
FullPathToBAM\xxxx2.bam	OS002
FullPathToBAM\xxxx3.bam	OS003

If you have generated ngsmpv file with abcde.ngsmpv filename (without "bam" in the middle), you can use a fake BamFileMappingFile:

BamFileName	SampleID
abcde	OS001
xxxx2	OS002
xxxx3	OS003

BamFileMappingFile is only a mapping of your file to sample ID.

ConvertGeneBas converts RNA-Seq gene BAS file

The land files are for gene level genome browser support.

Begin LandTools /Namespace=NgsLib;
SearchFiles "/InternalData/RnaSeq/LandBas" /Pattern=*.bas /Recursive=True;
Reference Human.B37.3;
GeneModel OmicsoftGene20130723;
Options /Action=ConvertGeneBas /ThreadNumber=12 /BamFileMappingFileName="/InternalData/RnaSeq/BamFileMapping.txt" /SampleIDColumn="SampleID" /bamFileNameColumn="BamFileNameColumn" 
/OutputFolder="/output/RnaSeq_GeneBas";
End;


ConvertRnaSeqGeneTxt converts RNA-Seq gene level FPKM files

Begin LandTools /Namespace=NgsLib;
Files "
/IData_Users/gary/TestCases/20160408_ConvertRnaSeqGeneTxt/20Samples.FPKM.txt";
Reference Human.B37.3;
GeneModel OmicsoftGene20130723;
Options /Action=ConvertRnaSeqGeneTxt /SampleIDMappingFileName="/IData_Users/gary/TestCases/20160408_ConvertRnaSeqGeneTxt/20Samples.FPKM_Design.txt"  
/SampleIDColumnInMappingFile="SampleID" /OutputFolder="/IData_Users/gary/TestCases/20160408_ConvertRnaSeqGeneTxt/ALV" /UseGeneIDForMapping=True;
End;

This action allow users to convert RNASeq gene level expression text files into Land.

  • It assumes input values are FPKM/RPKM in linear scale
  • SampleIDMappingFileName: design file to change column names in input file to SampleIDColumnInMappingFile
  • UseGeneIDForMapping: if true, it will match row IDs in your input file to the Gene Model's geneID. If false, this function will use the Gene Symbol (Gene Name) for gene mapping ID.

If you have both transcript and gene level counting file, please use Action ConvertRnaSeq2Txt.

ConvertRnaSeq2Txt converts RNA-Seq gene/Transcript level FPKM files

Begin LandTools /Namespace=NgsLib;
Files "
/Users/test/fpkm.transc_modified.txt
/Users/test/fpkm.gene.txt";
Reference Human.B37.3;
GeneModel OmicsoftGene20130723;
Options /Action=ConvertRnaSeq2Txt /SampleIDMappingFileName="/Users/test/DesignTableData.txt" 
/SampleIDColumnInMappingFile=ID /OutputFolder="/Users/test/alv_expr";
End;

ConvertNgs2tex converts RNA-Seq transcript/gene level counting files

Begin LandTools /Namespace=NgsLib;
SearchFiles "/InternalData/RnaSeq/TxCount" /Pattern=*.ngs2tex /Recursive=True;
Reference Human.B37.3;
GeneModel OmicsoftGene20130723;
Options /Action=ConvertNgs2tex /ThreadNumber=12 /BamFileMappingFileName="/InternalData/RnaSeq/BamFileMapping.txt" 
/SampleIDColumn="SampleID" /BamFileNameColumn="BamFileNameColumn" 
/PerformNormalization=True /TargetThirdQuantile=10  /OutputFolder="/output/RnaSeq_Transcript";
End;

ConvertFusion converts RNA-Seq fusion results

Begin LandTools /Namespace=NgsLib;
SearchFiles "/InternalData/RnaSeq/FusionSE" /Pattern=*.ngsfse2 /Recursive=True;
Reference Human.B37.3;
GeneModel OmicsoftGene20130723;
Options /Action=ConvertFusion /ThreadNumber=12 /BamFileMappingFileName="/InternalData/RnaSeq/BamFileMapping.txt" /SampleIDColumn="SampleID" /bamFileNameColumn="BamFileNameColumn"
AlignmentReportFileName="/InternalData/RnaSeq/AlignmentReport.txt" /TargetThirdQuantile=10 /BamFolder="/InternalData/RnaSeq/BAMFiles"
/OutputFolder="/output/RnaSeq_Fusion" /FusionSEPattern="FusionSE" /FusionPEPattern="FusionPE";
End;

Notes:

  • Alignment report file is required to calculate fusion RPKM. It contains read length and the total number of mapped reads (Read# column) to calculate fusion RPKM. Example:
BamFile	ReadLength	Read#
G25201.MKN74.1	101	165701329
G25207.NCI-H524.1	101	137641784
G25209.NCI-H1299.1	101	158652692
G25211.NCI-H1963.1	101	146010669
G25212.NCI-H661.1	101	107848960
  • Alignment report file can be ignored if all BAM files are in the same location and BamFolder option is provided.
  • If paired end needs to be incorporated, then the matching .ngspse2 file has to be in a parallel folder of single end fusion, with the FusionSE part of the path changed to FusionPE by default (in this case it has to be in /InternalData/RnaSeq/FusionPE). If you have different folder name pattern, please specify it using FusionSEPattern and FusionSEPattern options.

ConvertPairedEndFusion converts RNA-Seq paired end fusion results

Convert .ngspse2 paired end fusion results to ALV files

Begin LandTools /Namespace=NgsLib;
SearchFiles "/InternalData/RnaSeq/FusionPE" /Pattern=*.ngsfpe2 /Recursive=True;
Reference Human.B37.3;
GeneModel OmicsoftGene20130723;
Options /Action=ConvertPairedEndFusion /ThreadNumber=12 /BamFileMappingFileName="/InternalData/RnaSeq/BamFileMapping.txt"
/SampleIDColumn="SampleID" /bamFileNameColumn="BamFileNameColumn" /OutputFolder="/output/RnaSeq_PEFusion";
End;

ConvertCombinedFusion converts combined RNA-Seq fusion results

Begin LandTools /Namespace=NgsLib;
SearchFiles "/InternalData/RnaSeq/Fusion" /Pattern=*.ngsfspe2 /Recursive=True;
Reference Human.B37.3;
GeneModel OmicsoftGene20130723;
Options /Action=ConvertCombinedFusion /ThreadNumber=12 /BamFileMappingFileName="/InternalData/RnaSeq/BamFileMapping.txt" /SampleIDColumn="SampleID" /bamFileNameColumn="BamFileNameColumn"
AlignmentReportFileName="/InternalData/RnaSeq/AlignmentReport.txt" /TargetThirdQuantile=10 /BamFolder="/InternalData/RnaSeq/BAMFiles"
/OutputFolder="/output/RnaSeq_CombinedFusion";
End;

Notes:

  • Alignment report file is required to calculate fusion RPKM. It contains read length and the total number of mapped reads (Read# column) to calculate fusion RPKM. Example:
BamFile	ReadLength	Read#
G25201.MKN74.1	101	165701329
G25207.NCI-H524.1	101	137641784
G25209.NCI-H1299.1	101	158652692
G25211.NCI-H1963.1	101	146010669
G25212.NCI-H661.1	101	107848960
  • Alignment report file can be ignored if all BAM files are in the same location and BamFolder option is provided.


ConvertBamTo xxxx converting .BAM to .ALV directly

These actions simplify/streamline the RNA-Seq processing by converting .BAM to .ALV directly.

ConvertBamToExonJunction

Begin LandTools /Namespace=NgsLib;
Files "/IData/Test.bam";
Reference Human.B37.3;
GeneModel OmicsoftGene20130723; 
Options /Action=ConvertBamToExonJunction /BamFileMappingFileName="/IData/Test/BamFileMapping.txt" 
/SampleIDColumn=Barcode /BamFileNameColumn=BamFileName /Bam99Folder="/IData/Test/ngsexj" 
/ThreadNumber=1 /OutputFolder="/IData/Test/output/junctions";
End;
  • Exon junction and novel exon junctions are converted to ALV file directly.
  • Bam99Folder is not required. When Bam99Folder is set, additional novel exon junctions are included in ALV files. It is used to add additional junctions results to ALV files from ngsexj files. ngsexj files can be generated by re-scanning BAM files. This option is also required to generate land BAS from .BAM+.BAM99 using BamTools /Action=LandBas.

ConvertBamToMutation

Convert BAM to Mutation ALV (either DNA-Seq mutation or RNA-Seq Mutation):

Begin LandTools /Namespace=NgsLib;
Files "/IData/Test.bam";
Reference Human.B37.3;
GeneModel OmicsoftGene20130723; 
Options 
              /Action=ConvertBamToMutation 
              /BamFileMappingFileName="/IData/Test/BamFileMapping.txt"  
              /SampleIDColumn=Barcode 
              /MinimalTotalHit=10 
              /MinimalMutationHit=5 
              /MinimalMutationFrequency=0.20
              /ExcludeSingltones=True
              /ExcldueMultipleReads=True
               /DataMode=RnaSeq_Mutation 
              /BamFileNameColumn=BamFileName 
              /ThreadNumber=1 
              /OutputFolder="/IData/Test/mutation";
End;
  • Mutation results are summarized using default parameters as shown in genome browser track properties (same as LandBas and Bas).
  • DataMode is either DnaSeq_Mutation, or RnaSeq_Mutation
  • For DnaSeq_Mutation and DnaSeq_SomaticMutation, mutations in intronic regions and mutations that’s within 1000 bp from either end of any transcript region are also stored. Inter-transcript and inter-gene mutations (except for those that’s within 1000bp) are still ignored. When calling mutation, duplicates marked in the BAM file are excluded by default.
  • RNA-Seq and DNA-Seq somatic mutation requires two BAM files as input, so they are not streamlined.

ConvertBamToCount

Convert BAM to Count ALV (RNA-Seq)

Begin LandTools /Namespace=NgsLib;
Files "/IData/Test.bam";
Reference Human.B37.3;
GeneModel OmicsoftGene20130723; 
Options /Action=ConvertBamToCount /BamFileMappingFileName="/IData/Test/BamFileMapping.txt" /SampleIDColumn=Barcode /PerformNormalization=True 
/TargetThirdQuantile=10 /BamFileNameColumn=BamFileName /ThreadNumber=1 /OutputFolder="/IData/Test/output/Count";
End;
  • Count, RPKM and normalized RPKM are summarized.

ConvertBamToFusion

Begin LandTools /Namespace=NgsLib;
Files "/IData/Test.bam";
Reference Human.B37.3;
GeneModel OmicsoftGene20130723; 
Options /Action=ConvertBamToFusion /MinimalFusionAlignmentLength=0 /BamFileMappingFileName="/IData/Test/BamFileMapping.txt"
/SampleIDColumn=Barcode /BamFileNameColumn=BamFileName /ThreadNumber=1 /OutputFolder="/IData/Test/output/Fusion";
End;

Take a bam file, the module will first infer if it is single end and paired end.

  • For single end input files, the following files will be generated:
  1. Single end fusion ALV file
  2. Single end fusion bam file, along with the .bim
  3. Single end fusion bas file, along with the .bim
  4. Single end fusion ngsfse2 file
  • For paired end input files, the following files will be generated
  1. Single end fusion ALV file (this automatically contains paired end fusion info; it will also automatically infer the MappedReadCount and ReadLength to normalize fusion RPKM)
  2. Single end fusion bam file, along with the .bim
  3. Single end fusion bas file, along with the .bim
  4. Single end fusion ngsfse2 file
  5. Paired end fusion ALV file
  6. Paired end fusion bam file, along with the .bim
  7. Paired end fusion bas file, along with the .bim
  8. Paired end fusion ngsfpe2 file
  • Output files will be stored in OuputFolder\SubFolder where SubFolder are SingleEndFusionBam, SingleEndFusionBas, RnaSeq_Fusion, ngsfse2, PairedEndFusionBam, PairedEndFusionBas, RnaSeq_PairedEndFusion, ngsfpe2.
  • MinimalFusionAlignmentLength should be set to 0 (then it will use recommend formula) in practice.

ConvertBamToPairedEndFusion

BAM->PairedEnd fusion. It should not be commonly used unless user want to get paired fusion results only from BAM files.

Begin LandTools /Namespace=NgsLib;
Files "/IData/Test.bam";
Reference Human.B37.3;
GeneModel OmicsoftGene20130723; 
Options /Action=ConvertBamToPairedEndFusion /BamFileMappingFileName="/IData/Test/BamFileMapping.txt" 
/SampleIDColumn=Barcode /BamFileNameColumn=BamFileName /ThreadNumber=1 /OutputFolder="/IData/Test/output/PairedEndFusion";
End;

ConvertRnaSeqBamToAlv converts RNAseq Bam files

This action can convert RNA-seq Bam file to different .alv files, it can significantly improve the RNA-Seq processing performance:

Begin LandTools /Namespace=NgsLib;
      Files "/IData/test.bam";
      Reference Human.B37.3;
      GeneModel OmicsoftGene20130723;
      Options /Action=ConvertRnaSeqBamToAlv 
              /BamFileMappingFileName="/IData/Design.txt"
              /SampleIDColumn="SampleID"
              /BamFileNameColumn="BamFileName" 
              /CopyToLocal=False
              /PerformAlignmentQC=True
              /ConvertExonJunction=True 
              /ConvertMutation=True
              /ConvertCount=True 
              /ConvertFusion=True 
              /ConvertPairedEndFusion=True 
              /ConvertBas=True
              /ConvertExon=True
              /AutoTrimUtr=True  
              /LeftExclusion=3 
              /RightExclusion=3
              /MinimalTotalHit=10
              /MinimalMutationHit=5 
              /MinimalMutationFrequency=0.20 
              /MinimalFusionAlignmentLength=0 
              /PerformNormalization=True
              /TargetThirdQuantile=10 
              /ExcludeSingltones=True
              /ExcldueMultipleReads=True
              /ThreadNumber=1 
              /OutputFolder="/IData/Alv";
End;

Notes:

  • The default value for CopyToLocal is set to be false; set this to True if the BAM file is located on a remote drive (such as NAS).
  • If the output folder is "/IData/Alv", when the analysis is done, there would be sub-folders under the output folder containing different data types.
  • The possible data types are:
  1. RnaSeq_QC\AlignmentQC
  2. RnaSeq_ExonJunction
  3. RnaSeq_Mutation
  4. RnaSeq_Transcript
  5. RnaSeq_Fusion (subfolders for different types of results: ngsfpe2, PairedEndFusionBam, PairedEndFusionBas, RnaSeq_PairedEndFusion, etc. )
  6. RnaSeq_PairedEndFusion (just paired end alv)
  7. Bas
  8. RnaSeq_GeneBas

ConvertBamToMirnaCount converts miRNAseq Bam files

Begin LandTools /Namespace=NgsLib;
Files "/IData/test.bam";
Reference Human.B37.3;
GeneModel OmicsoftGene20130723;
Options /Action=ConvertBamToMirnaCount /MirnaGeneModelID=miRBase.R20.Mature /TranscriptLevel=True
/BamFileMappingFileName="/test/BamFileMapping_miRNA.txt"  /SampleIDColumn="SampleID" /bamFileNameColumn="BamFileNameColumn"
/BamFileNameColumn="BamFileName" /ThreadNumber=8 /CountStrandedReads=True /CountReverseStrandedReads=False
/OutputFolder="/test/MirnaSeq_Count";
End;

ConvertMethylation450 converts Methylation report files to ALV

Begin LandTools /Namespace=NgsLib;
Files 
"/Users/test/MethylationData/XX002_FinalReport.txt	
/Users/test/MethylationData/XX003_FinalReport.txt";
Reference Human.B37.3;
GeneModel OmicsoftGene20130723;
Options /Action=ConvertMethylation450 /MethylationFileMappingFileName="/Users/test/SampleMapppingFile.txt" 
/SampleIDColumn=SampleID /OutputFolder="/Users/test/ConvertMethylation450/output" /ThreadNumber=5;
End;
  • Input file “MethylationFileMappingFileName” is tab-delimited and should have two columns with header – the first column is Methylation file name without extension name, and the second column is SampleID
  • eg.
MethylationFileName	SampleID
XX002_FinalReport	XX002
XX003_FinalReport	XX002

ConvertVcfFile converts Vcf files to ALV

For one-sample-one-data Vcf

Begin LandTools /Namespace=NgsLib;
Files 
"/Users/test/data1/1_SNPs.vcf
/Users/test/data1/2_SNPs.vcf
/Users/test/data1/3_SNPs.vcf
/Users/test/data1/4_SNPs.vcf
/Users/test/data1/5_SNPs.vcf
/Users/test/data1/6_SNPs.vcf
";
Reference Human.B37.3;
GeneModel OmicsoftGene20130723;
Options /Action=ConvertVcfFile
	/DataMode=DnaSeq_Mutation
	/VcfFormat=FirstSample
	/VcfFileMappingFileName="/Users/test/data/TestMapping1.txt"	
	/VcfFileNameColumn="VcfFileName"  
	/SampleIDColumn="SampleID"  
	/ThreadNumber=1 
	/OutputFolder="/Users/test/data/alv1";
End;

For one-file-multiple-samples Vcf

Begin LandTools /Namespace=NgsLib;
Files 
"/Users/test/data/test_vcf.vcf";
Reference Human.B37.3;
GeneModel OmicsoftGene20130723;
Options /Action=ConvertVcfFile
	/DataMode=DnaSeq_Mutation
	/VcfFormat=AllSamples
	/VcfFileMappingFileName="/Users/test/data/TestMapping2.txt"	
	/VcfSampleNameColumn="SampleName"  
	/SampleIDColumn="SampleID"  
	/ThreadNumber=3 
	/OutputFolder="/Users/test/data/alv2";
End;

Notes:

Required parameters:

  • DataMode must be specified.
  • VcfFormat must be specified as "FirstSample" or "AllSamples". Note, if "AllSamples" are specified, the optional columns for sample ID mapping is "VcfSampleNameColumn" instead of "VcfFileNameColumn".

Optional parameters:

  • VcfFileMappingFileName (for VcfFormat="FirstSample"): VcfFileNameColumn(\tab)SampleIDColumn
  • VcfFileMappingFileName (for VcfFormat="AllSamples": VcfSampleNameColumn(\tab)SampleIDColumn
  • QualityCutoff (default = 0, /QualityCutoff=0): can be used to filter out low quality variations
  • RemoveChrM (default = False, /QualityCutoff=False): can be used to filter out chrM mutations (e.g. the M chromosome comes from hg19 but we are targeting Human.B37.3)
  • StoreAllVariants (default=False, /StoreAllVariants=False): if set to True, will try very hard to assign the variation to a gene, even if it is far ever from the gene (default is 1000bp upstream/downstream).
  • Input file "VcfFileMappingFileName" is tab-delimited and should have two columns with header – the first column is Vcf file name without extension name (For one-sample-one-data Vcf) or SampleName (For one-file-multiple-samples Vcf), and the second column is SampleID
  • eg.
VcfFileName	SampleID
1_SNPs.vcf	Test_001
2_SNPs.vcf	Test_002
3_SNPs.vcf	Test_003
or 

SampleName	SampleID
Sample1	Test_001
Sample2	Test_002
Sample3	Test_003
Sample4	Test_004

Note: Option FirstSample is deprecated since it requires AF in INFO. User can run VcfFormat="AllSamples" multiple times on single sample VCF if there is an issue for them with FirstSample option.

MergeVirusCounts and Merge16SMicrobialCounts generate Data Matrix file for land

Begin LandTools /Namespace=NgsLib;
SearchFiles "/Virus/Quantification" /Pattern="*.ngsexp";
Reference Virus.RefSeq20140619;
Options 
               /Action=MergeVirusCounts 
               /MetaDataObjectFileName="/Virus/MetaData.osobj"
               /GenerateServerMatrixFile=True 
               /SortByMaxValue=True
               /OutputFolder="/Virus/DataMatrix";
End;
  • The action will generate both normalized osobj and normalized .mat file
  • Action: Can be MergeVirusCounts or Merge16SMicrobialCounts
  • By default (SortByMaxValue=True), it will also automatically sort the virus by maximal values
  • MetaDataObjectfileName: Should be a metadata table (sample IDs in the first column), saved as an .osobj file.
  • eg. of MetaDataObjectfileName:

MyLandTools1.png

or

MyLandTools2.png

  • MappedReads: A column labeled MappedReads or RNASeq Mapped Read Count must be present in the meta data file. This should be the total number of mapped reads to the (Human, mouse, etc) genome, not the number of mapped reads to the virus or bacteria genome.
  • BamFileName: A column labeled as BamFileName must be present in the meta data file. The content should just show the name of the bam file, without the file path.


After generating the mat file, this should be placed in the LandName/DataMatrix subfolder within your LandDirectory. If no DataMatrix directory is found, please create one.

The ArrayServer admin user should also add the following lines to the appropriate land cfg file (change the AnnotationID to match the viral/bacterial reference used)

AnnotationID.Virus=Virus.RefSeq20150614.Sequences
AnnotationID.16SMicrobial=16SMicrobial.Ncbi20150127.Sequences

BuildServerMatrix generate Data Matrix file for land

Begin LandTools /Namespace=NgsLib;
Files
“file_to_osobj_file”;
Options /Action=BuildServerMatrix /Name=Virus /Description=Virus Description;
End;

The mat file should be placed in LandName/DataMatrix folder for each land. It is a general function for Data Matrix.

Others

GenerateGistic2Markers generate marker files for GISTIC analysis

Begin LandTools /Namespace=NgsLib;
Files "/Users/test/CNV_CBSmod.hg19.txt";
Options /Action=GenerateGistic2Markers /OutputFolder="/Users/test" /ChromosomeColumn=chrom /StartColumn=loc.start /EndColumn=loc.end /ThreadNumber=8;
End;

Renamealv to change ALV sampleID

Begin LandTools /Namespace=NgsLib;
Files "/Users/test/Sample1.alv";
Options /Action=RenameAlv /SampleIDMappingFileName="/Users/test/RenameAlvInput.txt" 
/ThreadNumber=12 /OutputFolder="/Users/test/";
End;
  • Input file “SampleIDMappingFileName” should have two columns (no column header) – the first column is old SampleID, and the second column is new SampleID
  • If OutputFolder was not specified, the old ALV files will be renamed

BuildProbeMapping to build mapping files for Affy/Agilent probes

Begin LandTools /Namespace=NgsLib;
Files "/Users/test/probeInfoFromVendor.txt";
Reference Mouse.B38;
GeneModelID Ensembl.R79;
Options /Action=BuildProbeMapping /ProbeIDColumn=Input /ProbeSequenceColumn="Probe Sequence" /ExportTable=True /ThreadNumber=12 /OutputFolder="/Users/test";
End;
  • probeInfoFromVendor.txt is a tab delimited text file that contains a probeID column and a Sequence column. Here is an example:
ProbeID	Sequence
ID1_1	CCAGTGGATGACACAACGGACTGAACACAACAAAGAAAAAACAGAGTCTGGGACTCATC
ID1_2	CACATGTCCAGGCCCAAGGCCTCAATGTAGGCTTCTGTGAGCAGGAGTTTGAACAGACCT
ID1_3	CCTTACCTCTTGGAAGCCAGGGGCATTTTAGGATTAAAGAGAAACAAGGAAACCCGTTT
ID1_4	AAAGTGCTCTCAAGGGCCAGTTTCTGGATTTCTCCACAATGCAAGCTACAGAGCTGGGG
  • ExportTable=True will export a text table (.mapping.txt) file in the same output folder. Note this is set to True by default.
  • Gene model is the same the one used for RNA-Seq mapping to generate the mapping file
  • Penalty was set to 7%. For 25bp Affymetrix, the penalty is actually 1 (consistent with previous mapping for Affymetrix), but for 60bp agilent probe, the maximal penalty is 4.
  • We allow both novel exon junction and long deletion (10,000 bp)
  • We always choose the best hit based on alignment. If there are multiple locations (mapped to multiple genes) with equally best score (tie), we keep all of them.
  • Note: there are gene level view and also probeset (in Affy)/Probe (in Agilent) view in land.

FPKM or TPM in land

By default, actions ConvertNgs2tex, ConvertBamToCount, ConvertRnaSeqBamToAlv, will generated upper quantile normalized FPKM/RPKM values for land. If user want to generate TPM values, user can add option /PerformTpmNormalization=True. For example:

Begin LandTools /Namespace=NgsLib;
      Files "/IData/test.bam";
      Reference Human.B37.3;
      GeneModel OmicsoftGene20130723;
      Options /Action=ConvertRnaSeqBamToAlv 
              /BamFileMappingFileName="/IData/Design.txt"
              /SampleIDColumn="SampleID" /bamFileNameColumn="BamFileNameColumn"
              /BamFileNameColumn="BamFileName" 
              /CopyToLocal=False
              /PerformAlignmentQC=True
              /ConvertExonJunction=True 
              /ConvertMutation=True
              /ConvertCount=True 
              /ConvertFusion=True 
              /ConvertPairedEndFusion=True 
              /ConvertBas=True
              /AutoTrimUtr=True  
              /LeftExclusion=3 
              /RightExclusion=3
              /MinimalTotalHit=10
              /MinimalMutationHit=5 
              /MinimalMutationFrequency=0.20 
              /MinimalFusionAlignmentLength=0 
              /PerformTpmNormalization=True
              /TargetThirdQuantile=10 
              /ThreadNumber=1 
              /OutputFolder="/IData/Alv";
End;

BuildClinical to build clinical subsystem for a land

Begin LandTools /Namespace=NgsLib;
Files 
"/test/ClinicalTriplets.txt
 /test/ClinicalCategory.txt";
Options 
 /Action=BuildClinical
 /IsTriplets=True
 /LandName=TCGA2015
 /OutputFolder="/test/output";
End;
  • The input file can be table data (each row is a sample, and each column is a clinical variable) or triplets (no column header – always assume Sample, Variable, Value). This is controlled by IsTriplets (default is false).
  • Category file must be provided. Two columns (no column header) assuming variable name and category. Category can be separated by “\” for multi-level structure.
  • LandName must be specified so that the output .cli file has a correct name.
  • Sample IDs must match those in the Land's Sample MetaData or the identifier in the Integration column of the Land's Sample MetaData.
  • See Sample MetaData details for a list of valid characters in variable names and sample IDs.
  • An example of standard table data format table.txt
  • An example of triplet file: triplet.txt
  • An example of category file: category.txt

with this category file, you will have a tree structure in your clinical data like this:

Clinical tree.png

MergeGeneLevelNumericData generate gene level data from ALV files

Begin LandTools /Namespace=NgsLib;
ListFiles "/LandFolder/RawData/LandName_AlvList.txt";
Reference Human.B37.3;
GeneModel OmicsoftGene20130723; 
Options 
               /Action=MergeGeneLevelNumericData
               /DataMode=RnaSeq_Transcript
               /MetaDataObjectFileName="/LandFolder/MetaData.osobj"
               /GenerateLargeMicroArrayData=False
               /OutputFolder="/LandFolder/Output";
End;
  • Note the new syntax ListFiles – it will load the content in the file and use it for Files.
  • If GenerateLargeMicroArrayData=True, large microarray data and .marray files will be generated in the folder.
  • For RnaSeq_Transcript, it will generate RnaSeq_Transcript.count, RnaSeq_Transcript.rpkm, RnaSeq_Transcript.rpkm_log2. Expression values on gene level.
  • For RPPA and RPPA_RBN, it will generate expression value on gene level.
  • For General_Expression, it will generate General_Expression.osobj file

Also read