From Array Suite Wiki
DNA-Seq Pipeline module allows users to finish the whole DNA-Seq analysis in a single click. Based on user's selection, it will run the following pipeline:
After alignment, it will load BAM files once and finish all analyses.
The pipeline module is particularly better than individual modules when enable cloud based analysis with less file transfers and saving EC2 machine setup time.
Accepted file formats include FASTQ, FASTA, QSEQ, and AUTO, BAM
If user is using BAM files as input, the module will use AddGenomeMappedDnaSeqReads to add alignment file as NgsData directly for all downstream analysis. The pipeline has been tested briefly for external BAM files generated by other aligners (outside of Omicsoft). We are recommending users starting from raw (fastq/fasta) files. There is a ConvertNgsFiles module for user to convert BAM back to fastq.gz.
In the Basic section, the user has a number of options:
- The user can choose whether this is a paired end sequencing analysis, and if so, the reads will automatically be paired using a numbering logic (e.g. _1, _2 or .1, .2).
- Choose a Genome and Gene model
- File format in Gzip, Bzip2 or None
- Replace existing BAM files in the output folder or skip the alignment step for samples already having BAM files in the output folder
- Number of thread for each alignment and number of jobs/samples running in parallel.
- Additional alignment settings can be added using Oscript syntax. User can get all possible alignment parameter options from MapDnaSeqReads.
Check box to run each individual pipeline module.
Users are required to specify output name and folder.
There are multiple result data objects in Omicsoft Project:
- NgsData object for DNA-Seq alignment
- Table report for raw data QC, alignment report, alignment QC, and mutation detection report.
All result files in the output folders