External Script Updates

From Array Suite Wiki

(Difference between revisions)
Jump to: navigation, search
Line 87: Line 87:
== Kallisto ==
== Kallisto ==
More info here: Kallisto on EScript.
More info here: [[Kallisto_on_EScript|Kallisto on EScript]].
== STAR ==
== STAR ==

Revision as of 14:36, 10 April 2020


External Tool run reference

The external tool is a form of escript designed to run pipelines/workflows using public bioinformatics tools, which are not included in the Omicsoft distribution.

To allow the users to define their own tools, but also give them the possibility to use predefined tools without the need of complicated environment configurations, the current update has integrated support for both cloud and docker, adding to the old syntax a higher amount of flexibility.

EScript can be now run on:

  • client (ArrayStudio) +/-Docker
  • server (ArrayServer > SendToQueue, oshell) +/-Docker
  • cluster (ArrayServer > SendToQueue, oshell) +/-Docker
  • cloud (ArrayServer > SendToQueue, oshell) +/-Docker


To be able to use docker following conditions have to be met:

  • AMI/Server: Docker (recommended version: 19.03.8)
  • ECR: EC2s must have permission to ECR
  • InstanceType: AMI must be set depending on the type of the instance that will be run


The External Tool syntax can be found here:

General steps one should follow when building an EScript.

Resources (optional)

The script might use for a certain processing step the result from a previous analysis. This result must be provided under the Resources section. The results from the previous analysis might not be ready when the script is first read, so there was a need to distinguish between these files and the input files. More details about the syntax of the Resource section can be found in the syntax page. Resources

Resources Section



Most scripts require several input files in order to be able to run. These input files are provided in the section Files and are read depending on the Mode provided in the options. Supported read modes are single, paired and multiple. The input files must be entered between quotes and the section is always finished with a semicolon.

Files Section



After the files, the user has to provide a name for the Escript, under EScriptName. This name will later be used to gather the output results, present possible error logs in the solution project. Just like before, the section ends with a semicolon.

EScriptName Section
EScriptName KallistoQuant;


The user can enter several commands. Each command must be prefixed with the keyword command. In the background each command will be converted into a docker command and linked with a docker input & output directory. In this way, the user can seamlessly use the tool without worrying about docker parameters. Just like before, each command has to be terminated with a semicolon.

Commands Section
Command kallisto quant -i %Resource1% -o "%OutputFolder%" -b 100 %FilePath1% %FilePath2%;

Command kallisto version;


The options section contains some parameters which have the same function like a regular escript ParallelJobNumber, ThreadNumberPerJob, Mode, ErrorOnStdErr, ErrorOnMissingOutput and the possibility of using a dev environment to update the EC2 instance UseDev2=true. The parameter RunOnDocker=true is mandatory to be able to use the script with external tool support. ImageName dictates which tool will be deployed on the EC2 instance. The available tools are presented in Usage. The OutputFolder is required in the options because it is different from the scripts' output folder. The scripts' output folder is a virtual path, while the outputFolder here in options is the outputFolder on a EC2 instance, which can't access the virtual pahts defined by the user. More details about the options section are found here.

Options Section
Options /ParallelJobNumber=1 /ThreadNumberPerJob=8 /Mode=Single /ErrorOnStdErr=False /ErrorOnMissingOutput=True /RunOnDocker=True /ImageName="quay.io/biocontainers/star:2.7.3a--0" /UseCloud=True /UseDev2=True /OutputFolder="/GhindariuCloudFolder/Output/Results/star" /InstanceType=m4.4xlarge /VolumeSize=50;


Finally, in the section Output, the ExternalTool escripts supports transforming the result of an analysis. This is required because running the same command on multiple input files will produce files with the same output name. To make sure these files are not overwritten, the transformation of the output files was implemented.

Output Section
Output "/GhindariuCloudFolder/Output/Abundances/abundance.tsv => /GhindariuCloudFolder/Output/Abundances/%PairName%_abundance.tsv" /Type=tsv;

Frequent errors & Troubleshooting

  • Files must have the path with no free space in it
  • Options > /OutputFolder parameter doesn't accept Global Macros, only Reserved Macros (eg: placeholders from the input /resource files: %PairName%, %FileName%, %ResourceFolder%)
  • Be careful with the Reserved Macros! Macros are different depending on the mode (ex: FilePath and FileName macros are not supported for multiple mode)
  • Global Macros should work everywhere in the EScript except for the OutputFolder
  • Both kallisto index and kallisto quant scripts display their output in the error stream. This is a limitation of bioconda kallisto tool itself.


More info here: Kallisto on EScript.


More info: STAR on EScript.


The External Tool script can also be run from the GUI by importing the pscript attached in jira: ARRS-1003 - Authenticate to see issue details It exposes the same parameters like above:

ParallelJobNumber, ThreadNumberPerJob, Mode, UseCloud, ErrorOnStdErr and ErrorOnMissingOutput have standard predefined values. The rest of the fields are fully editable by the user. The script cannot run without input files, an output folder and having a solution open in ArrayStudio.

Reference: http://www.arrayserver.com/wiki/index.php?title=Expose_Script_in_GUI