From Array Suite Wiki
NGS Raw Data QC -- Basic Statistics
The Basic Statistics module generates some simple composition statistics for the files analyzed.
To access the module, please choose Analysis | NGS | Raw Data QC | Basic Statistics
Input Data Requirements
The "Input format" include FASTQ, QSEC, SFF, SAM, BAM and AUTO (AUTO allows the use of any combination of the listed file types).
Add files to menu
- Add button will add samples by selections
- Add Folder will add all samples in the selected folder (local project only)
- Search will find files based on sample registration (server project only)
- Add list will allow users to add files from a list (even add a grouping file for alignment functions).
- The Adapter Stripping window appears after selecting the "Customize" button.
- The adapter stripping section allows the user to specify either no adapter stripping, to strip adapters from the 3’ end of the read, or right adapters (at middle or end of the reads) by specifying the adapter sequence.
- The user can choose to exclude unmatched reads in the Basic Statistics
- Job number - The total number of parallel jobs to run.
- Zip format - Select which format is used in compressing the files (default is "None").
- Output Name - The user can choose to name the output file.
- Output folder - The output files will be stored in output folder.
- Include only mapped entries - Selecting this option will only incorporate sequences that have been mapped to the reference sequence.
- Preview mode (test first 1M reads) - Will only run the module on the first 1 million reads, and should be used as a quick indicator of quality for especially large raw data files.
- Calculate per sequence GC distribution - The overall %GC of all bases in all sequences.
- Calculate sequence length distribution - Provides the length of the shortest and longest sequence in the set. If all sequences are the same length only one value is reported.
- Calculate overall quality distribution - Provides the distribution of overall quality.
The results include the BasicStatistics Report, PerSequenceGCReport, SequenceLength and Overall Quality plot:
- BasicStatistics Report