CompressBam

From Array Suite Wiki

Jump to: navigation, search

If True, generate compressed BAM to save 40-60% of storage space:

  • Compress the quality to binary (low/high based on CompressBamQualityCutoff) and the .BAM file will be reduced by factor of 2x-3x. This won’t affect most RNA-Seq based analysis (at least those provided by Omicsoft) and provides a significantly smaller size of datasets for delivery. The fact is that a lot of bytes in BAM files were used to store the base qualities. We assign any base above CompressBamQualityCutoff to have phred quality score 30.
  • Change read name to integer based. This could significantly decrease the file size.
  • For compressed BAM file from Bam tools, the read name is ordered alphabetically, while for compressed BAM from alignment the read name is based on their orders in the fastq file.
  • There is a restore utility to restore the full BAM files from compressed BAM generated from alignment by using the original fastq files. There is no way to restore full BAM if compressed BAM is generated from Bam tools.

For Oscript option

Add the two options in your alignment:

/CompressBam=True /CompressBamQualityCutoff=12

For Array Server Configuration Option

The GUI of Map RNA-seq to Genome and Map Long Rna-Seq Reads modules have an option Optimize bam files for storage. This CompressBam option, if added in server configuration file, can specify whether Optimize bam files for storage is checked or not in GUI by default.

[Option]
CompressBam=True

If set to True, the option Optimize bam files for storage will be checked by default when users are running server projects.

CompressBam.jpg

  • The option is based on each analytic server’s configuration (not master server’s configuration), so this option should be added to AnalyticServer.cfg file. If there is only one server (not distributed server), this option can be added to ArrayServer.cfg file.
  • No GUI option to set CompressBamQualityCutoff is 12 by default.
  • Requires server/studio 7.0.2.79 or above.