External Script Syntax

From Array Suite Wiki

Jump to: navigation, search


Contents

Updates to existing Escript syntax

Minimal Escript Skeleton for Docker runs
Begin RunEScript /RunOnServer=True;

Resources
"
(Any Resource Files you need. They need to be in the same folder, but you can list multiple files)
";
Files
"
(Any input files. Depending on whether each file should be processed independently, as read pairs according to OmicSoft's pairing logic, or all files in one analysis, /Mode should be set to Single, Paired, or Multiple)
";
EScriptName AnyNameYouLike;
Command (the exact command you would like to run, with parameters specified as literals or macros);
Options /Mode=(Single|Paired|Multiple) /RunOnDocker=True /ImageName="Repo/Image:Version" /UseCloud=(True|False) /OutputFolder=(OutputFolderPath);
End;

Updates on existing syntax:

  • Files:
    • Preconditions - all resources must be inside the same folder (won't apply for mode single)
    • Usage in command: External Script Integration
  • /OutputFolder=some-path
    • is now required; will be used as %OutputFolder% inside the command to be interpreted contextually (cloud/docker)
    • no support for Global Macro values, support for reserved macros of the input files (eg: %PairName%)
  • /Output
    • Transformation Section
    • no support for  %OutputFolder%

Additional EScript parameters

Complicated Escript Skeleton for Docker runs
Begin Macro;

@ThreadNumberPerJob@ 2;
@Bootstrap@ 100;
@ParallelJobNumber@ 2;
@Mode@ Paired;
@InstanceType@ "m4.xlarge";
@ErrorOnMissingOutput@ True;
@OutputFolderName@ "/path/to/files/OutputFolder";
@UseCloud@ True;
End;

Begin RunEScript /RunOnServer=True;
Resources
"
/path/to/resources/MyResource.fastq
/path/to/resources/AnotherResource.idx
/path/to/resources/OneMoreResource.tsv
";
Files
"
/path/to/files/File1a.fastq
/path/to/files/File1b.fastq
/path/to/files/File2a.fastq
/path/to/files/File2b.fastq
";
EScriptName MyEscriptName;
Command kallisto quant -i "%Resource1%" -t @ThreadNumberPerJob@ -o "%OutputFolder%" -b @Bootstrap@ %FilePath1% %FilePath2%;
Options /ParallelJobNumber=@ParallelJobNumber@ /ThreadNumberPerJob=@ThreadNumberPerJob@ /Mode=@Mode@ /InstanceType=@InstanceType@ /ErrorOnStdErr=False /ErrorOnMissingOutput=@ErrorOnMissingOutput@ /RunOnDocker=True /ImageName="omicdocker/kallisto:testing" /UseCloud=True /OutputFolder="@OutputFolderName@/%PairName%";
Output "@OutputFolderName@/%PairName%/abundance.tsv => @OutputFolderName@/%PairName%_abundance.tsv" /Type=tsv;
End;

New input parameter

  • Resources
    • optional - additional files needed when running the command
    • preconditions - all resources must be inside the same folder
    • Usage in command:
      •  %Resource{No}% - links with the first resource in the resource section. %Resource3% would link with the 3rd.
      •  %ResourceName{No}% - replaces with the no resource name.
      •  %ResourceFolder% - replaces with the common folder of all resources
      • description: The script might use for a certain processing step the result from a previous analysis. This result must be provided under the Resources section. The results from the previous analysis might not be ready when the script is first read, so there was a need to distinguish between these files and the input files. More details about the syntax of the Resource section can be found here.

Additional EScript Options

Docker Support

  • /RunOnDocker=True - required - indicates whether script should be run in a Docker container (cloud or locally)
  • /ImageName=myDockerImage:v1 - required - indicates the docker image to be used by the command
  • /DockerArgs=–-rm -i -t - optional - additional docker run arguments (e.g. --rm tells docker to remove the container after job is finished)
    • default value, if not specified is: --rm
    • -u root --privileged=true can be useful if a Docker was configured to run under a different user name, which can lead to write permission issues
  • Image Repository Access
    • /DockerRegistry=DockerHubPublic|ECR
      • specifies type of registry
      • default value, if not specified is: DockerHubPublic
      • ECR: private Docker images stored in AWS Elastic Container Registry, supported on cloud
      • DockerHubPrivate: not yet supported, future improvement to support private GitHub repositories
    • /AWSRegistryRegion: Specifies the location of the ECR registry, otherwise will be pulled from the EC2 instance location.
Example EScript with ECR private registry

Cloud-based Docker analyses can use private images stored in ECR. Your AWS policy must include GetAuthorizationToken.

Begin RunEScript /RunOnServer=True;
Files "/GhindariuCloudFolder/ArrayServer/Input/Transcripts/transcripts.fasta.gz";
EScriptName KallistoIndex;
Command kallisto version;
Options /ParallelJobNumber=1 /ThreadNumberPerJob=8 /Mode=Single /ErrorOnStdErr=False /ErrorOnMissingOutput=True /RunOnDocker=True /AWSRegistryRegion=us-west-1 /DockerRegistry=ECR /ImageName="[aws-id].dkr.ecr.[aws-region].amazonaws.com/kallisto:latest" /UseCloud=True /OutputFolder="/GhindariuCloudFolder/Output/Transcripts";
End;

Cloud Support

  • /UseCloud=True: dictates if the analysis should be performed on EC2 or on the local server. UseCloud=true means it will be performed on the EC2 instance.

Warning.png WARNING: When /UseCloud=True, all input files must be located in cloud (i.e. Files and Resources) as does the /OutputFolder Option which must be specified (i.e. even if you don't expect any output files and don't require an output directory, you must specify a cloud path)


Warning.png WARNING: When /UseCloud=True, files written to %OutputFolder% will be uploaded from the compute node to the specified cloud folder (/OutputFolder) at the end of the EScript and while sub-folders of %OutputFolder% will be included in the upload, at least 1 file must be present directly in the %OutputFolder% (i.e. if all files are located in sub-folders, the upload will fail)

Instance Type
  • supports option /InstanceType=c5.xlarge for specifying custom instance types
  • default is OSummaryInstanceType defined in ArrayServer.cfg, or m4.large if not specified
Volume Size/Ratio
  • Supports the option to specify additional volume size
    • /VolumeRatio: as a factor of input-size (e.g. 4 x input-size , /VolumeRatio=6)
    • /VolumeSize: specific GB value (e.g. /VolumeSize=1000 )
    • default is 4 x input-size which will be attached only if (4 x input-size) < 5GB
    • specific size >= 0 will always be added

Other considerations

External Tool explained

  • before running user's command, the /app/_Input_, /app/_Output_ and /app/_Resource_ folders will be automatically created inside the docker container
  • docker will automatically create any directory mapped with -v command, inside the container if it does not exist
  • input paths that are passed down from docker to user's script will always belong to /app/_Input_, /app/_Resource_ or /app/_Output_
  • when running docker on-premise, the actual file-input-paths and OutputFolder path will be mapped to /app/_Input_, /app/_Resource_ or /app/_Output_
  • when running docker in the cloud
    • temporary paths are created on the virtual machine (EC2), e.g. /opt/temp/_Input_ and /opt/temp/_Output_
    • cloud input files are downloaded into /opt/temp/_Input_
    • the two temporary paths are mapped to docker's /app/_Input_, /app/_Resource_ and /app/_Output_
    • results from running the script inside docker will be stored into /app/_Output_ which is /opt/temp/_Output_
    • all files in the temporary path /opt/temp/_Output_ are uploaded to the user specified OutputFolder, which is a cloud-path (S3)

Expanded example

  • Command syntax: Command python user-script.py %FilePath% %Resource1% %OutputFolder%
  • NOTE: user-script.py can be an actual command, and not a script file
  • command tells (pre-existing) python to run script user-script.py with arguments %FilePath% %Resource1% and %OutputFolder%
  • command is expanded to
    • docker run /DockerArgs -v /input_file_path:/app/_Input_ -v /resource_file_path:/app/_Resource_ -v /OutputFolder:/app/_Output_ /ImageName python user-script.py "/app/_Input_/file" "/app/_Output_"
    • command breakdown
      • -v /input_file_path:/app/_Input_ - maps the local input folder path to a docker internal path, so the input files are accessible from within docker
      • -v /resource_file_path:/app/_Resource_ - maps the local resource folder path to a docker internal path, so the resource files are accessible from within docker
      • -v /OutputFolder:/app/_Output_ - maps the local output folder path to a docker internal path, so these files are accessible from within docker
  • the additional options are added to the command
  • user's script is also added to the command
  • file-paths and other macros are parsed into docker internal paths