Land R API

From Array Suite Wiki

Jump to: navigation, search

Contents

Introduction

The Land R API functions are provided to users who want to query land data using R. The current version is using Oshell API functions to do the following:

  • Connect to ArrayServer
  • Run Land Text Dump on a list of genes or/and on a list of samples

User can run additional analysis based on the land dump data in R and creating more data views. This land R API is not designed to dump all data (all genes from all samples) from the land which can potentially crash the ArrayServer.

Load Land R API

Load R API from Omicsoft website

load(url("http://omicsoft.com/downloads/land/Rapi/Land_R_API.Rda"))
Land.Help() 
####################################################################################################
#################################### Omicsoft Land R API ###########################################
                        Welcome to use Land R API of Omicsoft!
To begin with, you should initiate Oshell environment and Land environment.

Initiate Oshell environment:
Land.InitiateOshell(
        MonoPath = MonoPath,
 OshellDirectory = OshellDirectory,
   BaseDirectory = BaseDirectory,
   TempDirectory = TempDirectory
   );
eg.
#For Windows:
Land.InitiateOshell(
    OshellDirectory = "E:/Oshell/",
     BaseDirectory  = "C:/Users/Leon/Documents/Omicsoft",
      TempDirectory = "C:/Users/Leon/Documents/Omicsoft/Temp"
     );

#For Linux:
Land.InitiateOshell(
          MonoPath  = "/IData/App/mono/mono-2.10.9/bin/mono",
    OshellDirectory = "/IData/Users/leon/landRApi/R/oshell/",
      BaseDirectory = "/IData/ArrayServerFile/ServerTest05WithLSF/",
      TempDirectory = "/IData/temp/Test05LSFServerTmp"
        );


Initiate Land environment:
Land.InitiateLand(Server = Server, UserID = UserID, Password = Password, LandName = LandName)
eg.
Land.InitiateLand(Server = "192.168.1.106:8065", UserID = "userName",Password = "password", LandName = "TCGA2015")
####################################################################################################

or from local file (Suppose you downloaded Land_R_api.R to Z:/Users/leon/landRApi/R/)

load("Z:/Users/leon/landRApi/R/Land_R_API.Rda")
Land.Help()
[back to top]

DEV Land R API

Occasionally, we introduce new features into the R API, or update the output format. These features will be incorporated into the DEV R API. To retrieve the DEV version:

load(url("http://omicsoft.com/downloads/land/Rapi/Land_R_API_DEV.Rda"))

License Usage

Be aware that connections to ArrayServer/Land via the R API requires an Array Studio license to be available.

If your queries result in the following message:

Error in value[[3L]](cond) : unused argument (cond)

All licenses may be occupied. You can check details of the error message by going to the TempDirectory and opening the log.txt file in the relevant sub-directory.

e.g.

[00:00:00] Error occurred in module::api
Error=Currently the server has connected 3 users. Please wait for some users to log out.

Initiate / Install / Update

Initiate Oshell environment

#Windows: define oshell directory
OshellDirectory = "E:/oshell"
#Initiate Oshell environment on Windows
Land.InitiateOshell(
          OshellDirectory = OshellDirectory,
            BaseDirectory = "C:/Users/Leon/Documents/Omicsoft",
            TempDirectory = "C:/Users/Leon/Documents/Omicsoft/Temp"
          );
Notes: BaseDirectory is the same as OmicsoftDirectory in ArrayServer.cfg.
OshellDirectory(eg. E:/oshell) should exist in advance. Otherwise, type dir.create("E:/oshell") to create the folder. 
If ArrayStudio has been installed, normally its configuration files, including GenomeBrowser and GenomeIndex, are placed in C:/Users/UserName/Documents/Omicsoft (eg. C:/Users/Leon/Documents/Omicsoft).  

or

#Linux or Mac: define oshell directory
OshellDirectory="/IData/Users/leon/landRApi/R/oshell/"
#Initiate Oshell environment on Linux
Land.InitiateOshell(
                 MonoPath = "/IData/App/mono/mono-2.10.9/bin/mono",
          OshellDirectory = "/IData/Users/leon/landRApi/R/oshell/",
            BaseDirectory = "/IData/ArrayServerFile/ServerTest05WithLSF/",
            TempDirectory = "/IData/temp/Test05LSFServerTmp"
          );
Notes: BaseDirectoory, OshellDirectory, and TempDirectory should all be folders on your local machine, not on a server setup. 
OshellDirectory(eg. /IData/Users/leon/landRApi/R/oshell/) should exist in advance. Otherwise, type dir.create("/IData/Users/leon/landRApi/R/oshell/") to create the folder. 
If ArrayStudio server has been installed, check out ArrayServer configure file (ArrayServer.cfg). Users are recommended to use the same BaseDirectory and TempDirectory, 
to avoid redundant downloads of genome and gene model-related files. 

After initiating the oshell environment, users can start to install or update oshell, and check oshell installation and its version. If oshell has been installed, users can skip to initiating the Land environment.

Install oshell

#Install oshell to OshellDirectory
Land.InstallOShell()
# start to download OmicsoftUpdater.exe ..... 
 
# trying URL http://omicsoft.com/software_update/OmicsoftUpdater.exe
# Content type 'application/octet-stream' length 20480 bytes (20 Kb)
# opened URL
# downloaded 20 Kb
...
# OmicsoftUpdater.exe has been downloaded! 
...
# start to install oshell to E:/Oshell, please wait a couple of minutes .....
# OmicsoftUpdater sucessfully finished.
# Congratuation, ohsell has been successfully installed to E:/Oshell 

# check Installation of Oshell. Return: logical 
Land.CheckInstall()
#[1] TRUE

# Check oshell version
Land.CheckVersion()
#OShell version=7.2.4.16
#Program started at Thursday, November 06, 2014 3:07:33 PM
#Perform initialization...
#
#[00:00:00] Windows PDF library is used
#[00:00:00] Windows OS detected...Initialization done...
#version=7.2.4.16
#Program finished at Thursday, November 06, 2014 3:07:38 PM

Update oshell

Land.OshellUpdate() 
#Or update oshell to a particular development verion
Land.OshellUpdate("dev")
#Or
Land.OshellUpdate("dev3")
Note, you will be prompted to confirm decision to update to a development instance

Initiate Land environment

#Initiate Land environment
Land.InitiateLand(Server = "192.168.1.106:8065", UserID = "userName", Password = "password", LandName = "TCGA2015")

The Variable (Land.CurrentLand) stores current land environment parameters. All Land R APIs recalll Land parameters from the variable.

Land.CurrentLand
#Current Land environment is:  
#
#              Server               UserID             Password             LandName 
#"192.168.1.106:8065"               "userName"           "password"     "TCGA2015" 

So, user can edit the variable to set parameters of land environment. eg.

Land.CurrentLand["LandName"] = "TCGA2015"


Alternatively, uses can pre-define their own land environment objects and assign it to current land environment when necessary.

land1<-c(Server = "192.168.1.106:8065", UserID = "userName", Password = "password", LandName = "TCGA2015")
land2<-c(Server = "192.168.1.106:8065", UserID = "userName", Password = "password", LandName = "TCGA2014")
Land.CurrentLand = land1
or
Land.CurrentLand = land2

Test connecting to server

#Test connecting to server. Return: logical
Land.ConnectServer()
# [1] TRUE

View change log of Land R API

#View change log of Land R API
Land.ChangeLog()

Listing

List Lands

#List all lands (except GeneticsLands). Return: character   
ListLands = Land.ListLands()
eg.
head(ListLands)
#[1] "CCLE2014"   "TCGA2014"
#To List Genetic lands. Return: character   
ListLands = Land.ListLands(ListGeneticsLand=T)
eg.
head(ListLands)
#[1] "GeneticLand_Tutorial"

List data availability

#List land data availability. Return: data.frame Note, this function currently does not work for GeneticsLands
ListDataAvailability = Land.ListDataAvailability()
head(ListDataAvailability)

Download

Download meta data

#Download meta data to a variable. Return: data.frame
DownloadMetaData = Land.DownloadMetaData()
head(DownloadMetaData)
# ID	Age At Initial Pathologic Diagnosis	BamFileName	Bcr Patient Uuid	Bcr Sample Uuid	Disease	Gender	Group	Histological Type	History Of Neoadjuvant Treatment	Icd 10	Icd O 3 Histology	Icd O 3 Site	Land Sample Type	Land Tissue	Neoplasm Histologic Grade	Pathologic M	Pathologic N	Pathologic Stage	Pathologic T	Race	Sample Type	Study Source	SubjectID	Survival Days	Survival Status	Tumor Necrosis Percent	Tumor Nuclei Percent	Tumor Type
# TCGA-OR-A5J1-01A	.	UNCID_2203647.37e88158-0743-45b8-87cf-1d7fe878527f.bam		E4038EBB-6E6D-44B1-84AD-E35AAFCA7B70	Adrenocortical carcinoma		ACC.Primary Tumor						Primary Tumor	Adrenal Gland							Primary Tumor	TCGA	TCGA-OR-A5J1	.		0	90	ACC
#Download sampleID in a sample set (e.g. accRNAseq). Return: data.frame
sampleSetID=Land.DownloadSampleSet('accRNAseq')
#subset metadata of a sampleset (e.g. accRNAseq). Return: data.frame
MetaData.sampleSet=DownloadMetaData[DownloadMetaData$ID %in% sampleSet[,1],]

Text-dump land data by genes and samples

Users can text-dump different land data types (DataModes) by genes and samples, including Expression_Ratio, CNV, RPPA, DnaSeq_Mutation, "DnaSeq_SomaticMutation, RnaSeq_Transcript,RnaSeq_GeneBas,RnaSeq_Fusion,RnaSeq_Mutation, RnaSeq_SomaticMutation, Methylation450_B37.

First, specify two vectors containing genes and samples of interest,

eg.
#Create a gene vector
genes = c("MET","egfr","braf","KRas")
#Create a sample vector
samples = c("TCGA-AR-A1AR-01A","TCGA-BH-A1EO-01A","TCGA-BH-A1EO-11A","TCGA-BH-A1ES-01A","TCGA-BH-A1ET-01A","TCGA-BH-A1ET-11B","TCGA-BH-A1EU-01A","TCGA-BH-A1EU-11A","TCGA-BH-A1EV-01A","TCGA-BH-A1EW-01A","TCGA-BH-A1EW-11B","TCGA-BH-A1F0-01A","TCGA-BH-A1F0-11B","TCGA-C8-A1HF-01A","TCGA-C8-A1HG-01A","TCGA-C8-A1HI-01A","TCGA-C8-A1HL-01A","TCGA-C8-A1HM-01A","TCGA-C8-A1HN-01A","TCGA-E2-A14N-01A","TCGA-E2-A15I-01A","TCGA-E2-A15I-11A","TCGA-A2-A0CX-01A","TCGA-A2-A0D0-01A","TCGA-A2-A0D4-01A","TCGA-A7-A0CD-01A","TCGA-A7-A0CE-01A","TCGA-A7-A0CE-11A","TCGA-A7-A0CG-01A","TCGA-A7-A0CH-01A","TCGA-A7-A0CH-11A","TCGA-A7-A0CJ-01A","TCGA-A7-A0DB-01A","TCGA-A7-A0DB-11A","TCGA-A8-A06N-01A","TCGA-A8-A06O-01A","TCGA-A8-A06P-01A","TCGA-A8-A06R-01A","TCGA-A8-A06T-01A","TCGA-A8-A06U-01A")

Then, submit the query, specifying the DataMode to query:
Note, the Land.TextDumpArrayLandData function is for Array Lands (OncoLand, DiseaseLand, etc) and does not work for GeneticsLands.

#Text-dump by genes and samples
TextDumpArrayLandData = Land.TextDumpArrayLandData(Genes = genes, Samples = samples, DataMode = "Expression_Ratio")
#Text-dump by all genes and samples
TextDumpArrayLandData = Land.TextDumpArrayLandData(Genes = "(all)", Samples = samples, DataMode = "Expression_Ratio")
or
TextDumpArrayLandData = Land.TextDumpArrayLandData(Samples = samples, DataMode = "Expression_Ratio")
#Text-dump by genes and all samples
TextDumpArrayLandData = Land.TextDumpArrayLandData(Genes = genes, Samples = "(all)", DataMode = "Expression_Ratio")
#Text-dump by genes and all samples and return data by gene level
TextDumpArrayLandData = Land.TextDumpArrayLandData(Genes = genes, Samples = "(all)", DataMode = "Expression_Ratio", DownloadGeneLevelData = TRUE)
# Notes: default DownloadGeneLevelData  is False

Land.TextDumpArrayLandGeneData return type by different DataModes

DataMode Return type Return type (DownloadGeneLevelData=FALSE, Default) Return type ((DownloadGeneLevelData=TRUE)
CNV data.list $CNV $CNV,$CNV_ByGeneLevel
CNVCall data.list $CNVCall NA
DnaSeq_Mutation data.list $DnaSeq_Mutation,$DnaSeq_Mutation.Annotation NA
DnaSeq_Mutation_Exome data.list $DnaSeq_Mutation_Exome,$DnaSeq_Mutation_Exome.Annotation $DnaSeq_Mutation_Exome,$DnaSeq_Mutation_Exome.Annotation,$DnaSeq_Mutation_Exome_ByGeneLevel
DnaSeq_SomaticMutation data.list $DnaSeq_SomaticMutation,$DnaSeq_SomaticMutation.Annotation $DnaSeq_SomaticMutation,$DnaSeq_SomaticMutation.Annotation,$DnaSeq_SomaticMutation_ByGeneLevel
Expression_Ratio data.list $Expression_Ratio $Expression_Ratio,$Expression_Ratio_ByGeneLevel
Expression_Intensity_Probes data.list $Expression_Intensity_Probes $Expression_Intensity_Probes,$Expression_Intensity_Probes_ByGeneLevel
General_Expression data.list $General_Expression $General_Expression,$General_Expression_ByGeneLevel
Methylation450_B37 data.list $Methylation450_B37 NA
RnaSeq_Exon data.list $RnaSeq_Exon NA
RnaSeq_ExonJunction data.list $RnaSeq_ExonJunction NA
RnaSeq_GeneBas data.list $RnaSeq_GeneBas NA
RnaSeq_Fusion data.list $RnaSeq_Fusion,$RnaSeq_Fusion.Annotation NA
RnaSeq_Mutation data.list $RnaSeq_Mutation,$RnaSeq_Mutation.Annotation $RnaSeq_Mutation,$RnaSeq_Mutation.Annotation,$RnaSeq_Mutation_ByGeneLevel
RnaSeq_SomaticMutation data.list $RnaSeq_Mutation,$RnaSeq_Mutation.Annotation $RnaSeq_Mutation,$RnaSeq_Mutation.Annotation,$RnaSeq_Mutation_ByGeneLevel
RnaSeq_PairedEndFusion data.list $RnaSeq_PairedEndFusion,$RnaSeq_PairedEndFusion.Annotation NA
RnaSeq_Transcript data.list $RnaSeq_Transcript $RnaSeq_Transcript,$RnaSeq_Transcript_ByGeneLevel
RPPA_RBN data.list $RPPA_RBN $RPPA_RBN,$RPPA_RBN_ByGeneLevel

Warning.png WARNING: Depending on the platform used by each study, microarray data may be stored in "Expression_Intensity_Probes", or in "Expression_Ratio". For example, most ImmunoLand studies have "Expression_Intensity_Probes", while TCGALand samples have "Expression_Ratio". Using the wrong DataMode will usually lead to an error such as "unused argument".


Tips.pngGenerally, the DataMode underlying a Land View can be identified simply by hovering the mouse over the View of interest in "Select View".

Land FindDataMode Menu.png


Please note, R would return an data frame with automatic recognized data types. For example, if a Sample ID is named by digits plus single letter 'E', R would consider it as a numerical value instead of character value. To avoid this issue, our developers implemented another parameter ColClasses, as shown in the screenshot below: when passing ColClasses = "character" to the textdump function, it will return every column as characters (tmp1$DnaSeq_Mutation); by default that parameter would be disabled, and the txtdump function would return a table with recognized character/int/num etc.

Land r api dev.png

Download Comparison Land Data

The following functions only work for Comparison Land, such as ImmunoLand2015Q2. Here we use ImmunoLand2015Q2 as example to show how to retrieve data from Comparison Land.

Land.CurrentLand["LandName"] = "ImmunoLand2015Q2"

Download ArrayLand Gene Comparison

Download ArrayLand gene comparison data by Genes/GeneSet and Comparisons/ComparisonSet, including Comparison.Genes, Comparison, and FullComparisonMetaData.

eg.
#Create a gene vector
genes = c("MET","egfr","braf","KRas")
#Create a Comparisons vector
comparisons = c('GSE13887.GPL570.test1','GSE13849.GPL570.test1','GSE13849.GPL570.test2')
#Download ArrayLand Gene Comparison by Genes and Comparisons
DownloadArrayLandGeneComparison = Land.DownloadArrayLandGeneComparison(Genes = genes, Comparisons = comparisons)
#Download ArrayLand Gene Comparison by Genes and ComparisonSet
comparisonSet = "Leon_Sub_ComparisonSet2" 
DownloadArrayLandGeneComparison = Land.DownloadArrayLandGeneComparison(Genes = genes, ComparisonSet = comparisonSet)


#Download ArrayLand Gene Comparison by GeneSet and Comparisons
geneSet = "GeneSet01" 
DownloadArrayLandGeneComparison = Land.DownloadArrayLandGeneComparison(GeneSet = geneSet, ComparisonSet = comparisons)

#Download ArrayLand Gene Comparison by GeneSet and ComparisonSet
geneSet = "GeneSet01" 
DownloadArrayLandGeneComparison = Land.DownloadArrayLandGeneComparison(GeneSet = geneSet, ComparisonSet = comparisonSet)

Download ArrayLand Comparison Data

Download ArrayLand comparison data by Comparisons/ComparisonSet, including Design, Comparison.Genes, Comparison, and FullMetaData.

eg.
#Download ArrayLand Comparison Data by Comparisons
DownloadArrayLandComparisonData = Land.DownloadArrayLandComparisonData(Comparisons = comparisons)
#Download ArrayLand Comparison Data by ComparisonSet
DownloadArrayLandComparisonData = Land.DownloadArrayLandComparisonData(ComparisonSet = comparisonSet)

Text-Dump ArrayLand Gene Comparison

Text-Dump ArrayLand Gene Comparison by Genes/GeneSet and Comparisons/ComparisonSet, including Comparison.Matrix, Comparison, and FullMetaData.

eg.
#Text-Dump ArrayLand Gene Comparison by Genes and Comparisons
TextDumpArrayLandGeneComparison = Land.TextDumpArrayLandGeneComparison(Genes = genes, Comparisons = comparisons)
#Text-Dump ArrayLand Gene Comparison by Genes and ComparisonSet
TextDumpArrayLandGeneComparison = Land.TextDumpArrayLandGeneComparison(Genes = genes, ComparisonSet = comparisonSet)


#Text-Dump ArrayLand Gene Comparison by GeneSet and Comparisons
TextDumpArrayLandGeneComparison = Land.TextDumpArrayLandGeneComparison(GeneSet = geneSet, Comparisons = comparisons)

#Text-Dump ArrayLand Gene Comparison by GeneSet and ComparisonSet
TextDumpArrayLandGeneComparison = Land.TextDumpArrayLandGeneComparison(GeneSet = geneSet, ComparisonSet = comparisonSet)

Text-Dump ArrayLand Comparison Data

Text-Dump ArrayLand Comparison Data by Comparisons or ComparisonSet, including Comparison.Matrix, Comparison, and FullMetaData.

eg.
#Text-Dump ArrayLand Comparison Data by Comparisons
TextDumpArrayLandComparisonData = Land.TextDumpArrayLandComparisonData(Comparisons = comparisons)
#Text-Dump ArrayLand Comparison Data by ComparisonSet
TextDumpArrayLandComparisonData = Land.TextDumpArrayLandComparisonData(ComparisonSet = comparisonSet)

Download ComparisonSet

Retrieve ComparisonIDs from a ComparisonSet

eg.
#Download ComparisonSet
DownloadComparisonSet = Land.DownloadComparisonSet(ComparisonSet=comparisonSet)


Arguments and Returned Value of ComparisonLand Download/Text-dump

Method Arguments Return type Included data.frame
Land.DownloadArrayLandGeneComparison Genes/GeneSet, Comparisons/ComparisonSet List $Comparison.Genes, $Comparison, $FullComparisonMetaData
Land.DownloadArrayLandComparisonData Comparisons/ComparisonSet List $Comparison.Genes, $Comparison, $Design, $FullMetaData
Land.TextDumpArrayLandGeneComparison Genes/GeneSet, Comparisons/ComparisonSet List $Comparison, $Comparison.Matrix, $FullMetaData
Land.TextDumpArrayLandComparisonData Comparisons/ComparisonSet List $Comparison, $Comparison.Matrix, $FullComparisonMetaData
Land.DownloadComparisonSet ComparisonSet Vector

IfDeleteResult (optional parameter)

By default, most Land R commands will automatically delete the temporary folders that were generated. If you would like to keep the folders (e.g. to see the underlying Oscripts), you can set the parameter "IfDeleteResult=FALSE"; e.g.

Land.TextDumpArrayLandGeneData(Genes = geneIDs, Samples = sampleIDs, DataMode = "DnaSeq_Mutation",IfDeleteResult=FALSE)


GeneticsLand

Searching

Search for gene, phenotype, association report, region, or RS_ID

#Initializes Search
search <- "egfr"
#Sets view
view <- "Gene.CodingSnpListing"
#Performs the search
result <-Land.SearchGeneticLand(Search=search, View=view)

Search by Samples

#Sets the samples to be searched
samples <- c("HG01887","HG01895","HG02258","HG01959","HG02478","HG01888","NA19702","NA20297","NA20345","HG02436","NA20319","NA20300","NA20313","NA19120","NA20295","NA20333","NA19396","NA19985","NA19249","NA20302","NA20128","NA18911")
#Performs the search 
result <- Land.SearchGeneticLand(Search=search, View=view, Samples=samples)

Grouping

#Some views tabulate values based on a grouping factor
#Sets the group from the meta data to be searched
group <- "Country of Origin"
samples <- c("HG01887","HG01895","HG02258","HG01959","HG02478","HG01888","NA19702","NA20297","NA20345","HG02436","NA20319","NA20300","NA20313","NA19120","NA20295","NA20333","NA19396","NA19985","NA19249","NA20302","NA20128","NA18911")
result <- Land.SearchGeneticLand(Search=search, View="Gene.GroupedArraySnpSummary", Group=group, Samples=samples)

Append Annotations

#By default, some table views will have fewer columns than the corresponding search result in ArrayStudio since annotations are not included
#Set AppendAnnotation to True to have these joined to match the results from ArrayStudio
result <- Land.SearchGeneticLand(Search="FADS1", View="Gene.AssociationTable", OtherOptions="/AppendAnnotation=True")

Set Column Types

#Data tables will be read into R using the read.table function. You may set the ColClasses option for this function to specify what types each column should be read as
#Read all columns as strings
result <- Land.SearchGeneticLand(Search="FADS1", View="Gene.AssociationTable", ColClasses="character", OtherOptions="/AppendAnnotation=True")
#Read PValue columns as strings and let R infer types for all other columns
result <- Land.SearchGeneticLand(Search="FADS1", View="Gene.AssociationTable", ColClasses=c(PValue="character", PValueHeterogeneity="character"), OtherOptions="/AppendAnnotation=True")

Search by SampleFilter

search <- "egr1"
view   <- "Gene.GroupedArraySnpSummary"

# Single quotes should be used for multiple filter conditions. otherwise, it would not be parsed correctly.
sampleFilter <- "'BMI > 20' & 'Cohort = ACB'"
result <- Land.SearchGeneticLand(Search=search, View=view, SampleFilter=sampleFilter)
# For one filter condition, single quotes are not required:
sampleFilter <- "'BMI > 20'"
sampleFilter <- "BMI > 20"  #This works too

Notes: Columns in SampleFilter should come from the columns of sample metadata.

SampleFilter Syntax

Best practice is to qualify the operator with the OP: prefix to ensure proper parsing. Without this qualification, if your column name or value contains an operator, you may get an error. For example:

Exclusion Criteria = BMI < 20

may parse on < as the operator and look for a column named Exclusion Criteria = BMI and error when that column is not found.

Comparison Operator Description Example(s)
< Numeric column values are less than the specified threshold BMI OP:< 30
> Numeric column values are greater than the specified threshold BMI OP:> 30
<= Numeric column values are less than or equal to the specified threshold BMI OP:<= 30
>= Numeric column values are greater than or equal to the specified threshold BMI OP:>= 30
= Numeric or character column values match the specification (exact match) BMI OP:= 30

Ancestry OP:= Asian

 != Numeric or character column values do not match the specification (exact match) BMI OP:!= 30

Ancestry OP:!= Asian

~ Character column values contain the specified string (case insensitive) Ancestry OP:~ Asian
MATCH Same as ~ Ancestry OP:MATCH Asian
CSMATCH Case-sensitive version of MATCH Country OP:CSMATCH US
IN Character column values exact match any of the strings in parentheses Ancestry OP:IN (Asian, South Asian, East Asian)

Sub-expressions should be enclosed in single quotes to ensure proper parsing

Logical Operator Description Example
& Both conditions must be met 'BMI OP:> 30' & 'Ancestry OP:~ Asian'
| Either condition must be met 'BMI OP:> 30' | 'Ancestry OP:~ Asian'
 !() The condition must not be met !('Ancestry OP:~ Asian')

The OP: qualifier does not work on logical operators. Parentheses can be used to create complex nested queries like:

('condition 1' & 'condition 2') | ('condition 3' & 'condition 4')

Search by AssociationFilter

#Complementary to filtering on sample attributes, you may also filter genetic association results based on analysis metadata (same query syntax as SampleFilter)
view <- "AssociationTableView"
associationFilter <- "'EffectType OP:= Beta' & 'MeSH Unique ID OP:= D000428'"
result <- Land.SearchGeneticLand(Search="", View=view, AssociationFilter=associationFilter, OtherOptions="/AppendAnnotation=True")

Search by VariableFilter

#Some views allow you to filter which variables (columns) are returned
view <- "ClinicalTableView"
sampleFilter  <- "Clinical M Stage=M0"
variableFilter <- "Clinical M Stage" 
result <- Land.SearchGeneticLand(Search="", View=view, SampleFilter=sampleFilter, VariableFilter=variableFilter, OtherOptions="/AppendAnnotation=True")


View Types

Generally, available views can be found in Array Studio by hovering over the view name like this:

ViewHover.png

Samples, Projects, etc

  • TableView - sample details
  • ProjectTableView
  • ClinicalTableView
  • AssociationTableView

RS_ID or Region (single base)

Region searches in the form of 7:1000-2000 or chr7:1000-2000

  • Variant.ArraySnpGenotypes
  • Variant.AssociationTable
  • Variant.CovariateVariableView
  • Variant.EqtlTable
  • Variant.Grasp2Table
  • Variant.ImputedSnpDoseGenotypes
  • Variant.SnpAnnotation
  • Variant.SnpGenotypes
  • Variant.VcfSnpGenotypes
  • Variant.AlleleFrequency
  • Variant.GenotypeFrequency

Gene

  • Gene.AllSnpListing
  • Gene.CodingSnpListing
  • Gene.ArraySnpSummary
  • Gene.VcfSnpSummary
  • Gene.ImputedSnpDoseSummary
  • Gene.GroupedArraySnpSummary
  • Gene.GroupedVcfSnpSummary
  • Gene.GroupedImputedSnpDoseSummary
  • Gene.Grasp2Table
  • Gene.AssociationTable
  • Gene.EqtlTable

Region

  • Region.ArraySnpSummary
  • Region.AssociationTable
  • Region.CodingSnpListing
  • Region.EqtlTable
  • Region.GeneListing
  • Region.Grasp2Table
  • Region.GroupedArraySnpSummary
  • Region.GroupedImputedSnpDoseSummary
  • Region.GroupedVcfSnpSummary
  • Region.ImputedSnpDoseSummary
  • Region.VcfSnpSummary
  • Region.AllSnpListing

Phenotype or Association Report

  • Grasp2Association.TopHits
  • Association.TopHits
  • Association.RegionPlot
  • Grasp2Association.GenomePlot
  • Association.GenomePlot

Special Views

There are two special views for querying association results

CountAssociationTopHits takes the additional MaxPValue parameter which can contain a list of p-value thresholds and returns a count of the number of results below each threshold like:

Counts <- Land.SearchGeneticLand(Search="Dermatology Hair Male Pattern Baldness ChrX PMID28196072", View="CountAssociationTopHits", OtherOptions="/MaxPValue=0.5,0.1,0.01,1e-3,1e-4")
Counts
                                            AssociationID Cutoff Count
1 Dermatology Hair Male Pattern Baldness ChrX PMID28196072  5e-01  7585
2 Dermatology Hair Male Pattern Baldness ChrX PMID28196072  1e-01  1907
3 Dermatology Hair Male Pattern Baldness ChrX PMID28196072  1e-02   476
4 Dermatology Hair Male Pattern Baldness ChrX PMID28196072  1e-03   248
5 Dermatology Hair Male Pattern Baldness ChrX PMID28196072  1e-04   183

SearchAssociationTopHits takes three additional parameters

  1. MaxPValue to only return results with p-values below this threshold
  2. MaxRows to truncate the results to this maximum (set to 0 or -1 to get all results pursuant to MaxTopHitsN)
  3. NoAnnotation to indicate whether results should be annotated

Note that R may convert extremely low PValues, such as <1E-325, to 0. When this is a concern, use the ColClasses option to force these to be read as strings

NoAnnotResults <- Land.SearchGeneticLand(Search="Dermatology Hair Male Pattern Baldness ChrX PMID28196072", View="SearchAssociationTopHits",OtherOptions="/MaxPValue=0.01; /MaxRows=1500; /NoAnnotation=True")
head(NoAnnotResults)
  ID      SnpID Chromosome Position Reference Alternative PValue PValueHeterogeneity AlleleFrequency EffectSize EffectSize_LB
1  1 rs12558842          X 66481800         C           A 1E-325                  NA              NA     0.5427            NA
2  2  rs4827528          X 66335096         A           G 1E-325                  NA              NA     0.5823            NA
3  3  rs2497938          X 66563018         T           C 1E-325                  NA              NA    -0.5284            NA
4  4  rs6625163          X 66510984         G           A 1E-325                  NA              NA     0.5319            NA
5  5   rs775366          X 65998455         A           G 1E-325                  NA              NA     0.5360            NA
6  6 rs73221556          X 65933285         C           A 1E-325                  NA              NA    -0.5266            NA
 EffectSize_UB EffectSize_SE ImputationQuality PooledSampleSize Direction_Up Direction_Down Uncertain
1            NA            NA                NA            52874            0              0        NA
2            NA            NA                NA            52874            0              0        NA
3            NA            NA                NA            52874            0              0        NA
4            NA            NA                NA            52874            0              0        NA
5            NA            NA                NA            52874            0              0        NA
6            NA            NA                NA            52874            0              0        NA

Quick Start

Scatter plot of expression value vs. CN log2ratios within specific genes and samples

This is an example to draw scatter plot of gene expression vs. CN log2ratios for genes MDM,BRAF, EGFR, and FGF12
#initate oshell environment
Land.InitiateOshell(
OshellDirectory = "E:/Oshell/",
  BaseDirectory = "C:/Users/Leon/Documents/Omicsoft",
  TempDirectory = "C:/Users/Leon/Documents/Omicsoft/Temp"
   );

#initiate Land environment
Land.InitiateLand(Server = "192.168.1.106:8065", UserID = "userName",Password = "password", LandName = "TCGA2015")

#Create a gene vector  
genes = c("MDM2", "BRAF", "EGFR", "FGF12")

#create a sample vector from sampleID.txt
#eg. sampleID.txt
# sampleid
# TCGA-BH-A1EV-01A
# TCGA-C8-A1HL-01A
# TCGA-C8-A1HN-01A
# TCGA-A7-A0CE-01A
# TCGA-A7-A0CG-01A
samples = c(
		"TCGA-BH-A1EV-01A",
		"TCGA-C8-A1HL-01A",
		"TCGA-C8-A1HN-01A",
		"TCGA-A7-A0CE-01A",
		"TCGA-A7-A0CG-01A",
		"TCGA-A8-A06R-01A",
		"TCGA-A8-A06Y-01A",
		"TCGA-A8-A07F-01A",
		"TCGA-A8-A07I-01A",
		"TCGA-A8-A07L-01A",
		"TCGA-A8-A08B-01A",
		"TCGA-A8-A08F-01A",
		"TCGA-A8-A08G-01A",
		"TCGA-A8-A08J-01A",
		"TCGA-A8-A08L-01A",
		"TCGA-A8-A099-01A",
		"TCGA-A8-A09B-01A",
		"TCGA-A8-A09C-01A",
		"TCGA-A8-A09E-01A"
		);
# or   samples = read.table("Z:/Users/leon/landRApi/R/sampleID.txt", header = TRUE, stringsAsFactors = FALSE, sep = "\t",quote = "")[,1]
## Test a small subset of samples
RnaSeqTxnExprData = Land.TextDumpArrayLandData(Genes = genes, Samples = samples, DataMode = "RnaSeq_Transcript")
RnaSeqTxnExprData$RnaSeq_Transcript[1:5,]
#retrieve expression ratio data from land for MDM,BRAF, EGFR, and FGF12 
ArrayExprData0 = Land.TextDumpArrayLandData(Genes = genes, Samples = samples, DataMode = "Expression_Ratio")
ArrayExprData=ArrayExprData0$Expression_Ratio
ArrayExprData[1:5, ]
#retrieve CNV data from land for the genes
CNVData0 = Land.TextDumpArrayLandData(Genes = genes, Samples = samples, DataMode = "CNV")
CNVData=CNVData0$CNV
CNVData$CNV[1:5, ]

#extract SampleID, GeneID, and expression ratio from Expression_Ratio_Data
ex = data.frame(SampleID = ArrayExprData$SampleID, GeneID = ArrayExprData$GeneID, ExpressionValue = ArrayExprData$Value)

#extract SampleID, GeneID, and CNVLog2Ratio from CNV
cn = data.frame(SampleID = CNVData$SampleID, GeneID = CNVData$GeneID, CNVLog2Ratio = CNVData$Value)

# merge expression ratio and CNVLog2Ratio
ArrayExprAndCNV = merge(ex, cn, by = c("SampleID","GeneID"))
ArrayExprAndCNV[1:5, ];

#scatter plot of expression value and CNVLog2Ratio
library(lattice) 
xyplot(CNVLog2Ratio~ExpressionValue | GeneID, ArrayExprAndCNV, grid = TRUE, group = GeneID, pch = 19,
	   main = "Integration view of Expression Ratio => CNV",
    ylab = "Values from Expression Array", xlab = "CNV Log2Ratio");
Land R API image4.png

Scatter plot of expression value vs. CN log2ratios with full sample meta data

## Test on all samples
#Retrieve CNV data for all samples
CNVData0=Land.TextDumpArrayLandData(Genes = genes, Samples = "(all)", DataMode = "CNV")
CNVData=CNVData0$CNV
#Retrieve ArrayExpression data for all samples
ArrayExprData0 = Land.TextDumpArrayLandData(Genes = genes, Samples = "(all)", DataMode = "Expression_Ratio")
ArrayExprData=ArrayExprData0$Expression_Ratio
CNVData[1:5, ]
ArrayExprData[1:5, ]
ex = data.frame(SampleID = ArrayExprData$SampleID, GeneID = ArrayExprData$GeneID, ExpressionValue = ArrayExprData$Value)
cn = data.frame(SampleID = CNVData$SampleID, GeneID = CNVData$GeneID, CNVLog2Ratio = CNVData$Value)
ArrayExprAndCNV = merge(ex, cn, by = c("SampleID","GeneID"))
ArrayExprAndCNV[1:5, ];
#Obtain full sample metadata
SampleFullMetaData = Land.DownloadMetaData();
head(SampleFullMetaData);
MetaDatasubset = data.frame(SampleID = SampleFullMetaData$ID, Tumor.Type = SampleFullMetaData$Tumor.Type, Sample.Type = SampleFullMetaData$Sample.Type);
MetaDatasubset[1:5, ];
#Merge ArrayExprAndCNV and sample metadata
ArrayExprAndCNV2 = merge(ArrayExprAndCNV, MetaDatasubset, by = c("SampleID"))
ArrayExprAndCNV2[1:5, ];
##Scatter plot of expression value and CNVLog2Ratio with full metadata
library(lattice) 
xyplot(CNVLog2Ratio~ExpressionValue | GeneID, ArrayExprAndCNV2, grid = TRUE, group = Tumor.Type, pch = 19,
      main = "Integration view of Expression Ratio => CNV",
      ylab = "Values from Expression Array", xlab = "CNV Log2Ratio", ylim = c(-5, 5),
      auto.key = list(pch = 19, columns = 8)
	     );
Land R API image5.png