From Array Suite Wiki

Jump to: navigation, search

In all NGS modules, including raw data QC and alignment, the Grouping File will allow user to input multiple files for the same sample.

For example (tab delimited),

/filepath/MyData_1.fastq.gz	TestDataA
/filepath/MyData_2.fastq.gz	TestDataA
/filepath/MyTest_1.fastq.gz	TestDataA
/filepath/MyTest_2.fastq.gz	TestDataA
/filepath/SRR065521.1.fastq.gz	TestDataB
/filepath/SRR065521.2.fastq.gz	TestDataB

Based on the group file

  • MyData_1+MyTest_1 will be read1 files for sample TestDataA
  • MyData_2+MyTest_2 will be read2 files for sample TestDataA
  • Reads from two files are read sequentially during the analysis
  • The output file will use observation name "TestDataA"
  • Although there is not multiple files for sequencing run SRR065521, the output file will use observation name "TestDataB", including the BAM file name.
Tips.png When specifying a GroupingFile as part of an OScript, you may simplify the first column to only contain the file name without the folder path as the path will come from the Files statement (or alternatives like SearchFiles or ListFiles). This assumes uniqueness across all the file names in the column - if this constraint is violated, you will need to specify the full paths to ensure each file is attributed the correct Sample.

In Sentieon TNseq Analysis

In the Sentieon TNSeq Pipeline, a special Grouping File is required, with three columns (no header):

FilePath PatientID TumorNormalStatus

That is, for a given subject (patient, labeled in column 2), all files from the Tumor sample should be marked "Tumor" in column 3, and all files from the Normal sample should be labeled "Normal".


/filepath/MyData_1.fastq.gz	TestDataA Tumor
/filepath/MyData_2.fastq.gz	TestDataA Tumor
/filepath/MyTest_1.fastq.gz	TestDataA Tumor
/filepath/MyTest_2.fastq.gz	TestDataA Tumor
/filepath/SRR065521.1.fastq.gz	TestDataA Normal
/filepath/SRR065521.2.fastq.gz	TestDataA Normal

Also Read

How to use multiple sequence files for one sample?