EScript Syntax Updates
From Array Suite Wiki
With the addition of Cloud support and Docker support, several additional parameters have been introduced to Escripts.
AMI/Server: Docker (recommended version: 19.03.8)
ECR: EC2s must have permission to ECR if you wish to use private Docker images
If running on Cloud, make sure your AMI has Docker installed on it, or use the latest OmicSoft Docker-compatible AMI.
Minimal Escript Skeleton for Docker runs
Begin RunEScript /RunOnServer=True; Resources " (Any Resource Files you need. They need to be in the same folder, but you can list multiple files) "; Files " (Any input files. Depending on whether each file should be processed independently, as read pairs according to OmicSoft's pairing logic, or all files in one analysis, /Mode should be set to Single, Paired, or Multiple) "; EScriptName AnyNameYouLike; Command (the exact command you would like to run, with parameters specified as literals or macros); Options /Mode=(Single|Paired|Multiple) /RunOnDocker=True /ImageName="Repo/Image:Version" /UseCloud=(True|False) /OutputFolder=(OutputFolderPath); End;
Updates to Escript syntax
To support Docker and Cloud runs, many additional parameters to External Script Syntax were introducted.
Files and Folders
Most scripts require several input files in order to be able to run. These input files are provided in the section Files and are read depending on the Mode provided in the options. Supported read modes are single, paired and multiple. The input files must be entered between quotes and the section is always finished with a semicolon.
- Resources section - Allows specification of one or more files to be used as a "resource" for all samples analyzed
- All files specified in a Resources section must be in the same folder
- Files may be referred to with %Resource1%, %Resource2%, etc.
- Use %ResourceFolder% to refer to the folder (useful for STAR and other commands that need to know where a folder is)
- Files section - all files specified in the Files section must be inside the same folder (won't apply for mode single)
- Usage in Command:
- /Mode=Single %FilePath% - links to the input file
- /Mode=Paired %FilePath1% and %FilePath2% - links to every 2 paired input files
- /OutputFolder=some-path (in Options section)
- This Option is required; will be used as %OutputFolder% inside the specified Command to be interpreted contextually
Additional EScript Options
These options will be specified in the Options section of the command.
- /RunOnDocker=True - required - indicates whether script should be run in a Docker container (cloud or locally)
- /ImageName=myDockerImage:v1 - required - indicates the docker image to be used by the command
- /DockerArgs=–-rm -i -t (optional) - additional docker run arguments (e.g. --rm tells docker to remove the container after job is finished)
default value, if not specified is: --rm
- /UseCloud=True: dictates if the analysis should be performed on EC2 or on the local server. UseCloud=true means it will be performed on the EC2 instance.
- /InstanceType=c5.xlarge for specifying custom instance types
- Default is OSummaryInstanceType defined in ArrayServer.cfg, or m4.large if not specified
- /VolumeRatio: as a factor of input-size (e.g. 4 x input-size , /VolumeRatio=4)
- /VolumeSize: specific GB value (e.g. /VolumeSize=1000 )
- default is 4 x input-size which will be attached only if (4 x input-size) < 5GB
- specific size >= 0 will always be added
Image Repository Access
- /DockerRegistry=DockerHubPrivate|DockerHubPublic|ECR - specifies type of registry
- default value, if not specified is: DockerHubPublic
- ECR: support only on cloud
- DockerHubPrivate: not yet supported (to be added)
Image Repository Types
Public Docker hub repository: no additional configuration needed
AWS ECR: docker-login minimum required policies "GetAuthorizAwsRegistryRegionationToken" is required in your AWS policy.
- docker-login command will be run before running the command
- only valid for 12h on that instance
- /AWSRegistryRegion=us-east-1 (https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html)
- specifies the region of the ECR
- specified only when run with ECR
- default value, if not specified is: the region of the current EC2
Other Options (generally useful)
- /ParallelJobNumber = number of analysis which can be run in parallel (on different machines)
- /ThreadNumberPerJob = number of threads running for each analysis
- /Mode = Input file mode. It can be single, paired or multiple. It determines how the input files are read. If paired, for example, the files are grouped in pairs of 2 files and submitted together to the command. If multiple, all input files will be run on a single command (e.g. if merging many files together).
- /ErrorOnStdErr = Throw an error on output to Standard Error
- /ErrorOnMissingOutput = error on missing output - If no output files were generated, an Error flag will be generated
A complicated Escript example for Docker runs
This Escript example demonstrates many powerful aspects of External Scripts with Docker images:
- Macros are specified at the beginning, allow quick tweaks of parameters within the Oscript
- A custom InstanceType is specified using a macro
- Multiple Resources are specified (although only one is used)
- Four input files are specified, as two file pairs.
- Two parallel jobs are specified, so each file pair will be run simultaneously.
- The Command uses %FilePath1% and %FilePath2% to specify the input pairs
- The Output generated by Kallisto is transformed from the generic "abundance.tsv" to %PairName%_abundance.tsv".
Begin Macro; @ThreadNumberPerJob@ 2; @Bootstrap@ 100; @ParallelJobNumber@ 2; @Mode@ Paired; @InstanceType@ "m4.xlarge"; @ErrorOnMissingOutput@ True; @OutputFolderName@ "/path/to/files/OutputFolder"; @UseCloud@ True; End;
Begin RunEScript /RunOnServer=True; Resources " /path/to/resources/MyResource.fastq /path/to/resources/AnotherResource.idx /path/to/resources/OneMoreResource.tsv "; Files " /path/to/files/File1a.fastq /path/to/files/File1b.fastq /path/to/files/File2a.fastq /path/to/files/File2b.fastq "; EScriptName MyEscriptName; Command kallisto quant -i "%Resource1%" -t @ThreadNumberPerJob@ -o "%OutputFolder%" -b @Bootstrap@ %FilePath1% %FilePath2%; Options /ParallelJobNumber=@ParallelJobNumber@ /ThreadNumberPerJob=@ThreadNumberPerJob@ /Mode=@Mode@ /InstanceType=@InstanceType@ /ErrorOnStdErr=False /ErrorOnMissingOutput=@ErrorOnMissingOutput@ /RunOnDocker=True /ImageName="omicdocker/kallisto:testing" /UseCloud=True /OutputFolder="@OutputFolderName@/%PairName%"; Output "@OutputFolderName@/%PairName%/abundance.tsv => @OutputFolderName@/%PairName%_abundance.tsv" /Type=tsv; End;
Command syntax: Command python user-script.py %FilePath% %OutputFolder% NOTE: user-script.py can be an actual command, and not a script file command tells (pre-existing) python to run script user-script.py with arguments %FilePath% and %OutputFolder% command is expanded to docker run /DockerArgs -v /input_file_path:/app/_Input_ -v /OutputFolder:/app/_Output_ /ImageName python user-script.py "/app/_Input_/file" "/app/_Output_" command breakdown -v /input_file_path:/app/_Input_ - maps the local input file path to a docker internal path, so these files are accessible from within docker -v /OutputFolder:/app/_Output_ - maps the local output folder path to a docker internal path, so these files are accessible from within docker the additional options are added to the command user's script is also added to the command file-paths and other macros are parsed into docker internal paths
before running user's command, the /app/_Input_, /app/_Output_ and /app/_Resource_ folders will be automatically created inside the docker container docker will automatically create any directory mapped with -v command, inside the container if it does not exist input paths that are passed down from docker to user's script will always belong to /app/_Input_ or /app/_Output_ when running docker on-premise, the actual file-input-paths and OutputFolder path will be mapped to /app/_Input_ and /app/_Output_ when running docker in the cloud temporary paths are created on the virtual machine (EC2), e.g. /opt/temp/_Input_ and /opt/temp/_Output_ cloud input files are downloaded into /opt/temp/_Input_ the two temporary paths are mapped to docker's /app/_Input_ and /app/_Output_ results from running the script inside docker will be stored into /app/_Output_ which is /opt/temp/_Output_ all files in the temporary path /opt/temp/_Output_ are uploaded to the user specified OutputFolder, which is a cloud-path