ArrayServer Configuration with Cloud

From Array Suite Wiki

Jump to: navigation, search


This page describes the basic configuration for enabling cloud-based NGS analysis with the Array Server Cloud add-on.

To learn more about quickly deploying the OmicSoft Server application in an AWS virtual machine, find OmicSoft Server AMIs for your region here.

To learn more about the compute instances used for cloud-based NGS analysis, see OmicSoft Cloud Analysis AMIs.

Admin has to add two sections to ArrayServer.cfg file to enable Cloud integration to ArrayServer. It will expand ArrayServer with unlimited storage and computing in cloud.

If the Master-Analytic server is used, [Cloud] section has to be specified for each analytic server AnalyticServer.cfg separately. The [Cloud] options can be different for each analytic or master server. However, in order for all master or analytic servers to be able to access the same cloud folders, the same [CloudFolder] definitions need to be added in every AnalyticServer.cfg and ArrayServer.cfg, including the scenario with multiple S3 accounts.

Add [Cloud] section to each AnalyticServer.cfg. Add the same [CloudFolder] sections to each AnalyticServer.cfg and ArrayServer.cfg.

Contents

Example using Amazon Cloud

[Cloud] section defines Cloud Preferences.

[Cloud]
Provider=Amazon
Region=us-east-1
AccessKey=xxxxxxxxxxxxxxxxxxxxxxxxxxx
SecretKey=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
UseHttp=False
OmicsoftCloudDirectory=s3://east.dev.omicsoft/ArrayServerOmicsoftHome
MaxInstanceCount=5
MaxInstanceCountPerJob=3
UseReducedRedundancy=False
EnableDataEncryption=False
DefaultCloudJobNumber=50
InstanceProfileArn=arn:aws:iam::123xxxxxxxx4:instance-profile/project/omicsoft-xxxxxxxxxxxx
SubnetID=subnet-348b0743
EnableAWSSpot=False
//We recommend specifying one of the following latest AMIs within your ArrayServer.cfg Cloud Section,
Ami=ami-0bde320b171e25978
AmiSnapshot=snap-01721b011efb794bd

Note: The old UseHttpForSQS setting was removed starting with Omicsoft Server v12.1 (please remove it from the config files, if present)

CloudFolder

[CloudFolder] section defines folder mapping to ArrayServer file system.

[CloudFolder]]
GaryCloudFolder=/omicsoft.test.gary/gary
SGECloudFolder=/east.dev.omicsoft/SGECloudFolder

For example, GaryCloudFolder and SGECloudFolder will be two folder in ArrayServer file root folder along with your other folder mapping using ArrayServer folder mapping and management|[Folder] section]]:

CloudFoldersinArrayServer.png

Multiple S3 Accounts

Administrators can configure multiple S3 accounts, each with their own credentials. Each separate S3 account should have its own [CloudFolder] section, along with an AccessKey and SecretKey for that S3 account.

Considerations:

  • This feature only works when UseCli is set to False, in the [Cloud] section.
  • Any [CloudFolder] section that does not have an AccessKey and SecretKey, will be assumed to belong to the default AWS account, with credentials defined in the [Cloud] section (see SGECloudFolder in the example below)
  • any number of [CloudFolder] sections can be defined, for every separate S3 account
  • any number of folders can be defined within a single [CloudFolder] section
  • all folders defined in a [CloudFolder] section must belong to the same S3 account
  • for each S3 bucket not accessible by the default specified AWS account, an AWS user account should be defined (with Access Key and Secret Key) with an attached policy to give permissions to list buckets, and manipulate objects in those buckets. Specifically, the user account should have a policy that includes the permissions outlined in
SID: AllowGroupToSeeBucketListInTheConsole 
SID: AllowRootAndHomeListingOfOmicsoftBucket 

in the Example AWS policy


Example: SGECloudFolder (bucket east.dev.omicsoft) belongs to the AWS root account, while GaryCloudFolder (bucket omicsoft.test.gary) belongs to a different account, with different credentials.

[Cloud]
Provider=Amazon
Region=us-east-1
AccessKey=access_key_root_account
SecretKey=secret_key_root_account
UseHttp=False
OmicsoftCloudDirectory=s3://east.dev.omicsoft/ArrayServerOmicsoftHome
UseCli=False

[CloudFolder]
SGECloudFolder=/east.dev.omicsoft/SGECloudFolder

[CloudFolder]
AccessKey=access_key_additional_account
SecretKey=secret_key_additional_account
GaryCloudFolder=/omicsoft.test.gary/gary

Starting with v12.1, the AccessKey and SecretKey can be encrypted, as described at EncryptedAWSKeys setting.

VPC

For OmicSoft Server configuration with VPC, admin needs additional Cloud options: instance profile and VPC subnet.

Example:

[Cloud]
Provider=Amazon
Region=us-east-1
AccessKey=xxxxxxxxxxxxxxxxxxxxxxxxxxx
SecretKey=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
UseHttp=False
OmicsoftCloudDirectory=s3://east.dev.omicsoft/ArrayServerOmicsoftHome
MaxInstanceCount=5
MaxInstanceCountPerJob=3
UseReducedRedundancy=False
EnableDataEncryption=True
DefaultCloudJobNumber=50
InstanceProfileArn=arn:aws:iam::123xxxxxxxx4:instance-profile/project/omicsoft-xxxxxxxxxxxx
SubnetID=subnet-348b0743


AWS - bypass proxy for S3 requests

This setup is needed only when all instances must pass trough a proxy server and there are jobs making a lot of parallel requests to Amazon S3 (over ~50), and the proxy server can't handle the load. (the errors might look like: Failed to download a chunk of Amazon S3 file ... from bucket: ...: A WebException with status Timeout was thrown.. Download will be retried ) In that case, it is necessary to allow a direct access from Amazon EC2 instance to the AWS S3 endpoints, without going trough proxy.

After this setup has been completed, also the HttpProxyBypassList must be set in default.proxy file (setting supported only from oshell and Array Server v11.3.1.1.)

Note: All the actions require elevated privileges like admin in AWS console. All instances must use a role who has AmazonS3FullAccess permissions.

Create VPC endpoint which points to S3 service as Gateway (NOT INTERFACE)

  • Go to VPC → Endpoints → Create endpoint
  • Select AWS services → Type Gateway
  • Select the VPC your VMs are/will be on and subnets. These VPC/subnets will be able to access S3
  • Select Policy Full Access unless you have specific access permissions you want to enforce
  • Click Create Endpoint

Create outbound rule HTTPS → Prefix list in the security groups you will use for the ec2 instances, prefix list can be seen here

  • Find prefix created by endpoint in step 3 here

You should see the s3 address in the prefix list name, take a note of the prefix list ID, we will use it in the outbound rules next

  • Go to Security Groups and select the security group your VMs reside on
  • Open the security group → Outbound Rules → Edit outbound rules
  • Create the HTTPS rule that point to the prefix list ID you noted earlier
  • Click Save rules

Create outbound rule on Master server to its own Public IP, ex all traffic → Master Public IP

  • Go to Security Groups and select the security group your VMs reside on
  • Open the security group → Outbound Rules → Edit outbound rules
  • Create a rule to allow all the traffic going out to the master servers public IP