Convert Expression Table to ALV

From Array Suite Wiki

Jump to: navigation, search

Contents

An Escript to convert Expression tables to Land ALV files

This script was contributed by Revonda Mehovic for OmicSoft community use. Designed to import Nanostring data from a text file, this can be used to import any expression matrix as log2-scale "General Expression" Land files (ALV), to be published in an Internal Land.

It was written to be exposed to GUI, allowing other Array Studio users to convert expression data to ALV files.

This script demonstrates how to call Mageck with command lines wrapped up in PScript. The external command is a Linux command, the output would be two tables, after integrating the PScript into ArrayServer, users could load input files by mouse clicking and get the output tables loaded into Omic server project directly.

An alternative approach would be to (1) run Mageck in Linux OS; (2) import the data matrices; (3) download the matrices to local disk; (4) use Array Studio add table functions to import the data.

Overall Design

  • If “Log2Transform=True” (meaning input data are linear scale), then a bash script will log2-transform the data and reate a new file (“InputFile.mod”)
    • Two requirements:
      • The input file needs to be in a location where ArrayServer can write, so that the “.mod” file can be made
      • You need to set the full physical path of “Log2Transform.bsh” in the pscript section (and then not move “Log2Transform.bsh”)
  • If “Log2Transform=False” (meaning it’s already log2 data) then the input data will be copied to a new file (“InputFile.mod”)
  • The "InputFile.mod" will be used as input to create the ALV files

Log2TransformExpressionToALV.pscript

<Info>
//Converts nanostring or other expression txt data into omicsoft alv files, with option to Log2-transform linear-scale data
//Original version by R. Mehovic
//last updated 4/2/2018 by J.Pearson
//updated to support log2 transformation of data
//Be sure to set proper full path to bash script
//Requires that input files be in a writable location.
 
Label=Expression_to_ALV
Description=Final Converts expression tables (e.g. nanostring) data into land ALV file, optionally after log2 transforming with Bash script
Category=Pipeline\TestScripts\Expression
 
<Input>
//Input Files--allow script to be run on samples in array studio
ExternalScriptInputType=Files
ExternalScriptMenuText=Optionally Log2 Transform, then convert omicsoft formated txt files into Land ALV files
ExternalScriptMenuStructure=Pipeline\Expression
ExternalScriptFileFilter= txt files|*.txt
 
//Should data be log2 transformed?
@Log2Transform@=True
~@Log2Transform@=Specify if the data needs to be log2 transformed
~@Log2Transform@Levels=True,False
 
//Sample ID Mapping FileName
@SampleIDMappingFileName
@SampleIDMappingFileName@=
~@SampleIDMappingFileName@= Specify the sample name mapping file for expression data
~@SampleIDMappingFileName@Type=FileName
 
//Specify sample id column
@SampleIDColumn@=SampleID
~@SampleIDColumn@=Specify column name of Sample ID in mapping file
 
//Output Location 
@OutputFolderName@=
~@OutputFolderName@Type=FilePath
~@OutputFolderName@=Output folder for generated ALV files 
 
//Running options
@ThreadNumber@=8
~@ThreadNumber@=Number of threads to use
~@ThreadNumber@Levels=1,2,3,4,5,6,7,8
 
@ReferenceName@=Human.B37.3
~@ReferenceName@=Genome Reference for Land data
~@ReferenceName@Levels=Human.B37.3,Human.B38,Mouse.B38
 
@GeneModelName@=OmicsoftGene20130723
~@GeneModelName@=Gene model for Land data. Select a compatible gene model for the Reference!
~@GeneModelName@Levels=OmicsoftGene20130723,Ensembl.R75,Ensembl.R76,OmicsoftGenCode.V27,OmicsoftGenCode.V24,OmicsoftGenCode.V15,OmicsoftGenCode.V13
 
<Script>
//log-transform by escript
Begin RunEScript /Run=(@Log2Transform@);
EScriptName logTransform;
Files 
"@FileNames@";
Command echo "Log-transforming %FilePath% to %FilePath%.mod";
Command bash "/Path/To/Log2Transform.bsh" "%FilePath%" > "%FilePath%.mod";
Options "";
End;
 
//If already log-transformed, just copy the file
Begin RunEScript /Run=(!@Log2Transform@);
EScriptName NoLogTransform;
Files 
"@FileNames@";
Command echo "copying %FilePath% to %FilePath%.mod, no transformation";
Command cp "%FilePath%" "%FilePath%.mod";
Options "";
End;
 
//Convert to ALV using LandTools
Begin LandTools /Namespace=NgsLib;
Files
"@FileNames@.mod";
Reference @ReferenceName@;
GeneModel @GeneModelName@;
Options
/Action=ConvertExpressionTxt
/SampleIDMappingFileName="@SampleIDMappingFileName@"
/SampleIDColumnInMappingFile="@SampleIDColumn@"
/RowsAreObservations=False
/IsRatio=True
/IsLog10=False
/MedianNormalization=False
/TargetMedian=0
/ThreadNumber=@ThreadNumber@
/OutputFolder="@OutputFolderName@";
End;

Log2Transform.bsh

Be sure to set /Path/To/Log2Transform.bsh in the pscript!

This script takes an input file as its only parameter, and will perform a log2-transformation on all rows (except the header row), for all columns (except the header column), and output it to standart output.

#!/bin/bash
#Requires bash in default location
awk 'BEGIN {FS="\t";OFS="\t"};{if (NR>1) {ORS="\t";print $1;for (i=2;i<=NF;i++){print log($i+.1)/log(2)};ORS="\n";print ""} else print $0}' "$1"
[back to top]

EnvelopeLarge2.png