External Script Syntax

From Array Suite Wiki

Revision as of 01:19, 29 October 2021 by Andrew (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


Updates to existing Escript syntax

Minimal Escript Skeleton for Docker runs
Begin RunEScript /RunOnServer=True;

(Any Resource Files you need. They need to be in the same folder, but you can list multiple files)
(Any input files. Depending on whether each file should be processed independently, as read pairs according to OmicSoft's pairing logic, or all files in one analysis, /Mode should be set to Single, Paired, or Multiple)
EScriptName AnyNameYouLike;
Command (the exact command you would like to run, with parameters specified as literals or macros);
Options /Mode=(Single|Paired|Multiple) /RunOnDocker=True /ImageName="Repo/Image:Version" /UseCloud=(True|False) /OutputFolder=(OutputFolderPath);

Updates on existing syntax:

  • Files:
    • Preconditions - all resources must be inside the same folder (won't apply for mode single)
    • Usage in command: External Script Integration
  • /OutputFolder=some-path
    • is now required; will be used as %OutputFolder% inside the command to be interpreted contextually (cloud/docker)
    • no support for Global Macro values, support for reserved macros of the input files (eg: %PairName%)
  • /Output
    • Transformation Section
    • no support for  %OutputFolder%

Additional EScript parameters

Complicated Escript Skeleton for Docker runs
Begin Macro;

@ThreadNumberPerJob@ 2;
@Bootstrap@ 100;
@ParallelJobNumber@ 2;
@Mode@ Paired;
@InstanceType@ "m4.xlarge";
@ErrorOnMissingOutput@ True;
@OutputFolderName@ "/path/to/files/OutputFolder";
@UseCloud@ True;

Begin RunEScript /RunOnServer=True;
EScriptName MyEscriptName;
Command kallisto quant -i "%Resource1%" -t @ThreadNumberPerJob@ -o "%OutputFolder%" -b @Bootstrap@ %FilePath1% %FilePath2%;
Options /ParallelJobNumber=@ParallelJobNumber@ /ThreadNumberPerJob=@ThreadNumberPerJob@ /Mode=@Mode@ /InstanceType=@InstanceType@ /ErrorOnStdErr=False /ErrorOnMissingOutput=@ErrorOnMissingOutput@ /RunOnDocker=True /ImageName="omicdocker/kallisto:testing" /UseCloud=True /OutputFolder="@OutputFolderName@/%PairName%";
Output "@OutputFolderName@/%PairName%/abundance.tsv => @OutputFolderName@/%PairName%_abundance.tsv" /Type=tsv;

New input parameter

  • Resources
    • optional - additional files needed when running the command
    • preconditions - all resources must be inside the same folder
    • Usage in command:
      •  %Resource{No}% - links with the first resource in the resource section. %Resource3% would link with the 3rd.
      •  %ResourceName{No}% - replaces with the no resource name.
      •  %ResourceFolder% - replaces with the common folder of all resources
      • description: The script might use for a certain processing step the result from a previous analysis. This result must be provided under the Resources section. The results from the previous analysis might not be ready when the script is first read, so there was a need to distinguish between these files and the input files. More details about the syntax of the Resource section can be found here.

Additional EScript Options

Docker Support

  • /RunOnDocker=True - required - indicates whether script should be run in a Docker container (cloud or locally)
  • /ImageName=myDockerImage:v1 - required - indicates the docker image to be used by the command
  • /DockerArgs=–-rm -i -t - optional - additional docker run arguments (e.g. --rm tells docker to remove the container after job is finished)
    • default value, if not specified is: --rm
    • -u root --privileged=true can be useful if a Docker was configured to run under a different user name, which can lead to write permission issues
  • Image Repository Access
    • /DockerRegistry=DockerHubPublic|ECR
      • specifies type of registry
      • default value, if not specified is: DockerHubPublic
      • ECR: private Docker images stored in AWS Elastic Container Registry, supported on cloud
      • DockerHubPrivate: not yet supported, future improvement to support private GitHub repositories
    • /AWSRegistryRegion: Specifies the location of the ECR registry, otherwise will be pulled from the EC2 instance location.
Example EScript with ECR private registry

Cloud-based Docker analyses can use private images stored in ECR. Your AWS policy must include GetAuthorizationToken.

Begin RunEScript /RunOnServer=True;
Files "/GhindariuCloudFolder/ArrayServer/Input/Transcripts/transcripts.fasta.gz";
EScriptName KallistoIndex;
Command kallisto version;
Options /ParallelJobNumber=1 /ThreadNumberPerJob=8 /Mode=Single /ErrorOnStdErr=False /ErrorOnMissingOutput=True /RunOnDocker=True /AWSRegistryRegion=us-west-1 /DockerRegistry=ECR /ImageName="[aws-id].dkr.ecr.[aws-region].amazonaws.com/kallisto:latest" /UseCloud=True /OutputFolder="/GhindariuCloudFolder/Output/Transcripts";

Cloud Support

  • /UseCloud=True: dictates if the analysis should be performed on EC2 or on the local server. UseCloud=true means it will be performed on the EC2 instance.

Warning.png WARNING: When /UseCloud=True, all input files must be located in cloud (i.e. Files and Resources) as does the /OutputFolder Option which must be specified (i.e. even if you don't expect any output files and don't require an output directory, you must specify a cloud path)

Warning.png WARNING: When /UseCloud=True, files written to %OutputFolder% will be uploaded from the compute node to the specified cloud folder (/OutputFolder) at the end of the EScript and while sub-folders of %OutputFolder% will be included in the upload, at least 1 file must be present directly in the %OutputFolder% (i.e. if all files are located in sub-folders, the upload will fail)

Instance Type
  • supports option /InstanceType=c5.xlarge for specifying custom instance types
  • default is OSummaryInstanceType defined in ArrayServer.cfg, or m4.large if not specified
Volume Size/Ratio
  • Supports the option to specify additional volume size
    • /VolumeRatio: as a factor of input-size (e.g. 4 x input-size , /VolumeRatio=6)
    • /VolumeSize: specific GB value (e.g. /VolumeSize=1000 )
    • default is 4 x input-size which will be attached only if (4 x input-size) < 5GB
    • specific size >= 0 will always be added

Other considerations

External Tool explained

  • before running user's command, the /app/_Input_, /app/_Output_ and /app/_Resource_ folders will be automatically created inside the docker container
  • docker will automatically create any directory mapped with -v command, inside the container if it does not exist
  • input paths that are passed down from docker to user's script will always belong to /app/_Input_, /app/_Resource_ or /app/_Output_
  • when running docker on-premise, the actual file-input-paths and OutputFolder path will be mapped to /app/_Input_, /app/_Resource_ or /app/_Output_
  • when running docker in the cloud
    • temporary paths are created on the virtual machine (EC2), e.g. /opt/temp/_Input_ and /opt/temp/_Output_
    • cloud input files are downloaded into /opt/temp/_Input_
    • the two temporary paths are mapped to docker's /app/_Input_, /app/_Resource_ and /app/_Output_
    • results from running the script inside docker will be stored into /app/_Output_ which is /opt/temp/_Output_
    • all files in the temporary path /opt/temp/_Output_ are uploaded to the user specified OutputFolder, which is a cloud-path (S3)

Expanded example

  • Command syntax: Command python user-script.py %FilePath% %Resource1% %OutputFolder%
  • NOTE: user-script.py can be an actual command, and not a script file
  • command tells (pre-existing) python to run script user-script.py with arguments %FilePath% %Resource1% and %OutputFolder%
  • command is expanded to
    • docker run /DockerArgs -v /input_file_path:/app/_Input_ -v /resource_file_path:/app/_Resource_ -v /OutputFolder:/app/_Output_ /ImageName python user-script.py "/app/_Input_/file" "/app/_Output_"
    • command breakdown
      • -v /input_file_path:/app/_Input_ - maps the local input folder path to a docker internal path, so the input files are accessible from within docker
      • -v /resource_file_path:/app/_Resource_ - maps the local resource folder path to a docker internal path, so the resource files are accessible from within docker
      • -v /OutputFolder:/app/_Output_ - maps the local output folder path to a docker internal path, so these files are accessible from within docker
  • the additional options are added to the command
  • user's script is also added to the command
  • file-paths and other macros are parsed into docker internal paths