Ngs NgsBasicStatistics.pdf

From Array Suite Wiki

Jump to: navigation, search


NGS Raw Data QC -- Basic Statistics


The Basic Statistics module generates some simple composition statistics for the files analyzed.

To access the module, please choose Analysis | NGS | Raw Data QC | Basic Statistics

BasicStats Menu.png

Input Data Requirements

The "Input format" include FASTQ, QSEC, SFF, SAM, BAM and AUTO (AUTO allows the use of any combination of the listed file types).

[back to top]

General Options

Basic Stats.png

Add file

Add files to menu

  • Add button will add samples by selections
  • Add Folder will add all samples in the selected folder (local project only)
  • Search will find files based on sample registration (server project only)
  • Add list will allow users to add files from a list (even add a grouping file for alignment functions).
[back to top]

Adapter Stripping

  • The Adapter Stripping window appears after selecting the "Customize" button.
  • The adapter stripping section allows the user to specify either no adapter stripping, to strip adapters from the 3’ end of the read, or right adapters (at middle or end of the reads) by specifying the adapter sequence.
  • The user can choose to exclude unmatched reads in the Basic Statistics
[back to top]


  • Job number - The total number of parallel jobs to run.
  • Zip format - Select which format is used in compressing the files (default is "None").
  • Output Name - The user can choose to name the output file.
  • Output folder - The output files will be stored in output folder.
  • Include only mapped entries - Selecting this option will only incorporate sequences that have been mapped to the reference sequence.
  • Preview mode (test first 1M reads) - Will only run the module on the first 1 million reads, and should be used as a quick indicator of quality for especially large raw data files.
  • Calculate per sequence GC distribution - The overall %GC of all bases in all sequences.
  • Calculate sequence length distribution - Provides the length of the shortest and longest sequence in the set. If all sequences are the same length only one value is reported.
  • Calculate overall quality distribution - Provides the distribution of overall quality.
[back to top]

Output Results

The results include the BasicStatistics Report, PerSequenceGCReport, SequenceLength and Overall Quality plot:

  • BasicStatistics Report


Basic Statistics Table

  • PerSequenceGCReport


Per Sequence GC Table

  • SequenceLength


Sequence Length Table

  • OverallQualityReport

BasicStats OverallQuality.png

Overall Quality Table

[back to top]



Related Articles

[back to top]