Build Project Land from your own OmicSoft Project.pdf

From Array Suite Wiki

Jump to: navigation, search

OmicSoft 'Project Lands' are a new, lighter-weight way to manage and access 'Omics data projects using the Land metadata framework, without reprocessing the data.

Users can identify projects of interest in Lands by exploring the curated metadata, and choose to download selected projects to the analysis tab.

Your OmicSoft Server administrator can build your own “Project Lands” to store and organize internal projects, as part of or instead of standard OmicLands.

This requires an onsite OmicSoft Server installation, and cloud-based project storage requires Server Cloud support, and at least one project to store. These steps should be performed by an OmicSoft Server administrator.

Tips.pngBuilding your own "Project Lands" requires the Server Cloud add-on for cloud-based storage of projects on an S3 bucket.


Contents

General workflow to create a Project Land

  1. Create a 'Project Land'
  2. Map a folder to ServerLandProjects or CloudLandProjects, depending on the preferred location of the projects
  3. Package up some OmicSoft Studio projects (.osprj files + matching folder) into a .zip file
  4. Upload zipped projects to the mapped Cloud or Server folder
  5. Add metadata that matches your project names

1. Create a 'Project Land'

  1. See Create and publish a Land for details
    • Name it whatever you like, e.g. InternalProjectLand
    • Specify ReferenceLibraryID and GeneModelID parameters. This doesn't need to be consistent with the libraries/models used in the individual projects, but must be compatible with any ALVs/TLVs you add.
    • Specify that this is a "Project Land" (note: a single Land cannot use both Cloud-based and Server-based project storage; only specify one)
      • Cloud-based project storage (Requires Server Cloud add-on): In your land.cfg or land.cfg2 file, include the parameter "EnableCloudProjectDownload=True"
      • Server-based project storage: In your land.cfg or land.cfg2 file, include the parameter "EnableServerProjectDownload=True"

Example Land configuration file:

ReferenceLibraryID=Human.B37.3 
EnableServerProjectDownload=True
GeneModelID=OmicsoftGene20130723
DefaultViewID.Default=ProjectDistribution
Tips.pngIf you want the Project Distribution View to be the default View, set DefaultViewID to be ProjectDistribution in the Land.cfg file


2. Map a Cloud or Server folder in OmicSoft Server

You will map an accessible server or cloud location to a virtual OmicSoft Server location, where each set of zipped projects for a Project Land is contained within a subfolder that is named to match the Land name.

Cloud-based project storage

If you'd like to store zipped projects on an S3 bucket, use Manage Cloud Folder Mapping to map an accessible S3 bucket folder to CloudLandProjects. The mapped folder must be named CloudLandProjects, but the actual folder name can be whatever you like.

MapCloudLandProjects.png

Server-based project storage

If you'd like to store zipped projects on an S3 bucket, use Manage Server Folder Mapping to map an accessible server folder to ServerLandProjects. The mapped folder must be named ServerLandProjects, but the actual folder name can be whatever you like.

MapServerLandProjects.png

3. Package OmicSoft projects as zip files

OmicSoft projects are named with the following pattern: If a project's name is "MyRNAseqProject_07292020", there is a file "MyRNAseqProject_07292020.osprj" and a subfolder "MyRNAseqProject_07292020/". These files can usually be renamed without problem, as long as both the OSPRJ and subfolder are named in the same way.

In ProjectLands, the naming logic expects a zip file in the mapped storage location, in the pattern ProjectName_Platform.zip, where ProjectName and Platform correspond to Project-level metadata fields (described below). This zip file should contain both the .OSPRJ and subfolder, matching the name of the zip file.

So project "MyRNAseqProject_07292020" should be stored as "MyRNAseqProject_07292020.zip", containing "MyRNAseqProject_07292020.osprj" and a subfolder "MyRNAseqProject_07292020/".

MyRNAseqProject_07292020.zip contains

MyRNAseqProject_07292020.osprj
MyRNAseqProject_07292020/CompareLGMtoDEseq2Norm.osobj
MyRNAseqProject_07292020/ControlTest.DESeq2ContrastTest.osobj
MyRNAseqProject_07292020/ControlTest.DispersionTable.osobj
MyRNAseqProject_07292020/Counts.Rounded.osobj
MyRNAseqProject_07292020/DEseq2.Server.DESeq2ContrastTest.osobj
MyRNAseqProject_07292020/DEseq2.Server.DispersionTable.osobj
...

Batch-packaging projects in a folder

Server administrators can navigate to the ServerProjects directory (by default under BaseDirectory) and find the user folder of interest. Then the following Linux command will package up each project in a zip file (as long as there aren't spaces):

for MyProject in test*.osprj; do MyProjectName=${MyProject%%.osprj}; zip -r $MyProjectName.zip $MyProjectName.osprj $MyProjectName/;done

What if my projects don't fit the ProjectName_Platform pattern?

Future improvements will support specifying a different metadata column for a "Secondary Platform Identifier".

In cases where the projects are named with a single word, have Spaces or other symbols that aren't compatible with the "ProjectName" field in Lands, you will need to rename the projects and corresponding subfolders to match the pattern.

4. Upload Zipped projects to the mapped folder

Remember the name of your Land, specified in Step 1? For example "InternalProjectLand"? Create a folder with this same name in your mapped "ServerLandProjects" or "CloudLandProjects" folder, and upload your zipped projects to this folder.

For example, if mapped directory ServerLandProjects=/mnt/Scratch/ZippedProjects, and a Land was created called InternalProjectLand, then you should upload the zipped projects into /mnt/Scratch/Zipped/Projects/InternalProjectLand/:

/mnt/Scratch/Zipped/Projects/InternalProjectLand/GSE78220_UGM_Practice080817.zip  
/mnt/Scratch/Zipped/Projects/InternalProjectLand/GSE85534_GPL17021.zip  
/mnt/Scratch/Zipped/Projects/InternalProjectLand/SRP073767_GPL16791.zip
...

When a user wants to download a project from the Land InternalProjectLand, the server will look in this folder for a matching project, and open it for the user.

5. Add Project and Sample Metadata to your Project Land

Project Metadata

Three Project Metadata fields are required for Project metadata in a ProjectLand:

  • ProjectName: The unique identifier for a given project, using only alphanumeric characters and '_'
  • Platform: In OmicSoft Lands, is used for identifying the GEO GPL. In ProjectLands, is just used as a secondary identifier to provide a unique identifier.
  • TherapeuticArea: A field that should describe the overall theme of the project. In OmicSoft Lands, this can include "Oncology", "Dermatology", "Neurology", etc.

Additional columns can be specified to provide additional context of the projects.

When a user selects a project from a ProjectLand, the ProjectName and Platform fields are combined with '_', and a file named ProjectName_Platform.zip is searched in the mapped CloudLandProjects or ServerLandProjects folder, in the subfolder matching the Land's name (see step 4). So it is essential that these fields correspond to a project that has been zipped and stored in the appropriate location. Notice that the ProjectName and Platform fields don't have to contain information in the same patterns, but it is best practice to use a consistent scheme where possible.

projectname Platform TherapeuticArea
GSE85534 GPL17021 BodyMap
SRP073767 GPL16791 Development
GSE78220_UGM Practice080817 Oncology

Sample Metadata

Although sample-level data is not essential for a Project Land, it is strongly recommended that you include sample-level information to help users discover projects that contain data relevant to their research.

  • SampleName: Unique identifiers for samples.
  • ProjectName: Should match the ProjectName in the Project metadata.

Additional columns are just helpful for finding data of interest. As long as ALVs will not be built into the Lands, a variety of projects, across genomes, gene models, and even species can be managed in a single Land:

SampleID Tissue DiseaseState CellType ProjectName Organism
SampleA fetal kidney normal control kidney cell SRP073767 Homo sapiens
SampleB fetal kidney normal control kidney cell SRP073767 Homo sapiens
SampleC fetal kidney normal control kidney cell SRP073767 Homo sapiens
SRX2379968_prepReads.AGCAAT colon normal control NA GSE85534 Mus musculus
SRX2379969_prepReads.AGTTGC colon normal control NA GSE85534 Mus musculus
SRX2379970_prepReads.CCAGTT colon normal control NA GSE85534 Mus musculus


Testing your Project Land

If properly configured, your Project Land should open up and display either the Sample Distribution View or Project Distribution View, depending on whether you set DefaultViewID=ProjectDistribution in the Land.cfg file.

If you select one of the bars from the distribution View and explore the Project metadata, you should see a list of projects.

Click on the Project ID, then select "Download Project" to download the full project to your Server Projects collection. See How_to_open_a_project_from_a_ProjectLand.pdf for more details.

DownloadLandProject.png