Setup ArrayServer in Cluster

From Array Suite Wiki

Revision as of 21:53, 27 August 2019 by Joseph (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Contents

Overview

ArrayServer has a built-in scheduling system supports SGE, PBS/Torque and LSF Platform, accelerating the analysis of tremendous amounts of NGS data.

Setup cluster

Below is example for SGE. For both PBS and LSF, please contact omicsoft.support@qiagen.com for more details.

Parallel environment

Omicsoft does not use MPI, but we do require a parallel environment to be setup in order to specify multiple threads per job (multiple slots).

Tips.pngArrayServer does not limit the number of threads requested for a cluster job; it is the user's responsibility to not exceed the cluster limit on Thread # Per Job


Example of setting up the parallel environment:

qconf -ap peomics
pe_name            peomics
slots              99
user_lists         NONE
xuser_lists        NONE
start_proc_args    /bin/true
stop_proc_args     /bin/true
allocation_rule    $pe_slots
control_slaves     FALSE
job_is_first_task  TRUE
urgency_slots      min
accounting_summary FALSE

Add this PE to all.q queue

qconf -aattr queue pe_list peomics all.q

ArrayServer requirement

It is recommend to have a shared storage/drive to store required software applications and ArrayServer files so that every node on cluster has access to them using the same file path. Mono 4.8.1 (Mono 2.10.9 or Mono 4.0.4 for earlier Array Server versions than v10.1), Libgdiplus and sqlite3 must be accessible by every node of the cluster. These have to be built and setup in a regular way as described in Array Server Requirements.

ArrayServer executables

In cluster, each job will use oalign.exe or osummary.exe to do most of jobs. These executables must be accessible by every node of the server. This can be the same directory as you used to setup the Linux Array Server instance, or if that doesn't work for your setup, a different directory by copying these files there. Administrator have to create two shell script files wrapping the two executables:

ArrayServer files

User specify the following folder in ArrayServer.cfg (or AnalyticServer.cfg):

These folders must be accessible by every node in the cluster using the same file path.

Setup ArrayServer

ArrayServer cluster options

Set the following options in ArrayServer.cfg file on the Linux instance of ArrayServer (using AnalyticServer.cfg if its just an analytical node)


Example of CFG file

ClusterGridEngine

[License]
CompanyName=Omicsoft
LicenseNumber=xxxx
ExpirationDate=June 9,xxxx
LicenseKey=xxxxxxxxxxx
 
[Option]
MonoPath=/IData/App/mono/mono-4.8.1/bin/mono
MonoJobPath=/IData/App/mono/mono-4.8.1/bin/mono
MonoSummaryPath=/IData/App/mono/mono-4.8.1/bin/mono
 
History=50
SmtpServer=mail.omicsoft.com
Port=8064
Port2=8065
Port3=8066
MultiThreadedFtpConnectionNumber=16
BaseDirectory=/IData/ArrayServerFile
TempDirectory=/IData/temp
OmicsoftDirectory=/IData/Omicsoft
 
MultiThreadedFtp=True
MultiThreadedFtpConnectionNumber=4
DataPortBegin=60066
DataPortEnd=60088
PassiveMode=True
 
UserAuthorization=False
AutoCreateNewUsers=True
AdminUserID=admin
 
HostedBy=Omicsoft Corporation Testing
SampleSetTabs=General,Platform,NewContact
UseDatabaseForIndexing=False
 
EnableCluster=True
ClusterAlignmentPath=/IData/App/ArrayServerLinuxBeta/ClusterAlignment.sh
ClusterSummaryPath=/IData/App/ArrayServerLinuxBeta/ClusterSummary.sh
ClusterGridEngine=SGE
ClusterQueueName=all.q
DefaultClusterJobNumber=10
ClusterParallelEnvironment=peomics
ClusterParallelRatioFactor=1
 
[Folder]
SharedData=/IData/SharedData

Please read ArrayServer.cfg for more details on ArrayServer configuration.