Land R API with Omicsoft v12
From Array Suite Wiki

Contents
|
Introduction
The Land R API functions are provided to users who want to query land data using R. The current version is using Oshell API functions to do the following:
- Connect to ArrayServer
- Run Land Text Dump on a list of genes or/and on a list of samples
User can run additional analysis based on the land dump data in R and creating more data views. This land R API is not designed to dump all data (all genes from all samples) from the land which can potentially crash the ArrayServer.
Load Land R API
Load R API from Omicsoft website
load(url("https://resources.omicsoft.com/downloads/land/Rapi/Land_R_API_v12.RData")) Land.Help() #################################################################################################### #################################### Omicsoft Land R API ########################################### Welcome to use Land R API of Omicsoft! To begin with, you should initiate Oshell environment and Land environment. Initiate Oshell environment: Land.InitiateOshell( MonoPath = MonoPath, OshellDirectory = OshellDirectory, BaseDirectory = BaseDirectory, TempDirectory = TempDirectory ); eg. #For Windows: Land.InitiateOshell( OshellDirectory = "E:/Oshell/", BaseDirectory = "C:/Users/UserName/Documents/Omicsoft", TempDirectory = "C:/Users/UserName/Documents/Omicsoft/Temp" ); #For Linux: Land.InitiateOshell( MonoPath = "/IData/App/mono/mono-6.12.0/bin/mono", OshellDirectory = "/IData/Users/username/landRApi/R/oshell/", BaseDirectory = "/IData/ArrayServerFile/ServerTest/", TempDirectory = "/IData/temp/ServerTestTemp" ); Initiate Land environment: Land.InitiateLand(Server = Server, UserID = UserID, Password = Password, LandName = LandName) eg. Land.InitiateLand(Server = "test.omicsoft.com:8065", UserID = "userName",Password = "password", LandName = "TCGA2015") ####################################################################################################
or from local file (Suppose you downloaded Land_R_api.R to Z:/Users/username/landRApi/R/)
load("Z:/Users/username/R/Land_R_API_v12.RData") Land.Help()
License Usage
Be aware that connections to ArrayServer/Land via the R API requires an Array Studio license to be available.
If your queries result in the following message:
Error in value[[3L]](cond) : unused argument (cond)
All licenses may be occupied. You can check details of the error message by going to the TempDirectory and opening the log.txt file in the relevant sub-directory. e.g.
[00:00:00] Error occurred in module::api Error=Currently the server has connected 3 users. Please wait for some users to log out.
Initiate / Install / Update
Initiate Oshell environment
Notes: You should make sure that the directories such as OshellDirectory (eg. E:/oshell) exist in advance. If they don't exist, type dir.create("E:/oshell") to create the folder (or whatever the preferred path is).
If OmicSoft Studio has been installed, you can specify BaseDirectory to map to the OmicSoft Studio's "Omicsoft" folder, such as in C:/Users/UserName/Documents/Omicsoft.
#Windows: define oshell directory OshellDirectory = "E:/oshell" #Initiate Oshell environment on Windows Land.InitiateOshell( OshellDirectory = OshellDirectory, BaseDirectory = "C:/Users/UserName/Documents/Omicsoft", TempDirectory = "C:/Users/UserName/Documents/Omicsoft/Temp" );
or
#Linux or Mac: define oshell directory OshellDirectory="/IData/Users/username/landRApi/R/oshell/" #Initiate Oshell environment on Linux Land.InitiateOshell( MonoPath = "/IData/App/mono/mono-6.12.0/bin/mono", OshellDirectory = "/IData/Users/username/landRApi/R/oshell/", BaseDirectory = "/IData/ArrayServerFile/ServerTest/", TempDirectory = "/IData/temp/ServerTestTemp" );
Notes: BaseDirectory is the same as OmicsoftDirectory in ArrayServer.cfg. OshellDirectory(eg. /IData/Users/username/landRApi/R/oshell/) should exist in advance. Otherwise, type dir.create("/IData/Users/username/landRApi/R/oshell/") to create the folder. If ArrayStudio server has been installed, check out ArrayServer configure file (ArrayServer.cfg). Users are recommended to use the same BaseDirectory and TempDirectory, to avoid redundant downloads of genome and gene model-related files. After initiating the oshell environment, users can start to install or update oshell, and check oshell installation and its version. If oshell has been installed, users can skip to initiating the Land environment.
Install oshell
#Install oshell to OshellDirectory Land.InstallOShell() # start to download OmicSoftServiceUpdater.exe ..... # trying URL https://resources.omicsoft.com/software_update/OmicSoftServiceUpdater.exe # Content type 'application/octet-stream' length 20480 bytes (20 Kb) # opened URL # downloaded 20 Kb ... # OmicSoftServiceUpdater.exe has been downloaded! ... # start to install oshell to E:/Oshell, please wait a couple of minutes ..... # OmicSoftServiceUpdater sucessfully finished. # Congratuation, ohsell has been successfully installed to E:/Oshell # check Installation of Oshell. Return: logical Land.CheckInstall() #[1] TRUE # Check oshell version Land.CheckVersion() #OShell version=12.1... #Program started at Thursday, May 06, 2022 3:07:33 PM #Perform initialization... # #[00:00:00] Windows PDF library is used #[00:00:00] Windows OS detected...Initialization done... #version=12.1... #Program finished at Thursday, May 06, 2022 3:07:38 PM
Update oshell
Land.OshellUpdate()
#Or update oshell to a particular development verion Land.OshellUpdate("dev") #Or Land.OshellUpdate("dev3") Note, you will be prompted to confirm decision to update to a development instance
Initiate Land environment
#Initiate Land environment Land.InitiateLand(Server = "test.omicsoft.com:8065", UserID = "userName", Password = "password", LandName = "TCGA2015")
The Variable (Land.CurrentLand) stores current land environment parameters. All Land R APIs recalll Land parameters from the variable.
Land.CurrentLand #Current Land environment is: # # Server UserID Password LandName #"test.omicsoft.com:8065" "userName" "password" "TCGA2015"
So, user can edit the variable to set parameters of land environment. eg.
Land.CurrentLand["LandName"] = "TCGA2015"
Alternatively, uses can pre-define their own land environment objects and assign it to current land environment when necessary.
land1<-c(Server = "test.omicsoft.com:8065", UserID = "userName", Password = "password", LandName = "TCGA2015") land2<-c(Server = "test.omicsoft.com:8065", UserID = "userName", Password = "password", LandName = "TCGA2014") Land.CurrentLand = land1 or Land.CurrentLand = land2
Test connecting to server

#Test connecting to server. Return: logical Land.ConnectServer() # [1] TRUE
View change log of Land R API
#View change log of Land R API Land.ChangeLog()
Listing
List Lands
#List all lands (except GeneticsLands). Return: character ListLands = Land.ListLands() eg. head(ListLands) #[1] "CCLE2014" "TCGA2014" #To List Genetic lands. Return: character ListLands = Land.ListLands(ListGeneticsLand=T) eg. head(ListLands) #[1] "GeneticLand_Tutorial"
List data availability
#List land data availability. Return: data.frame Note, this function currently does not work for GeneticsLands ListDataAvailability = Land.ListDataAvailability() head(ListDataAvailability)
Download
Download meta data
#Download meta data to a variable. Return: data.frame DownloadMetaData = Land.DownloadMetaData() head(DownloadMetaData) # ID Age At Initial Pathologic Diagnosis BamFileName Bcr Patient Uuid Bcr Sample Uuid Disease Gender Group Histological Type History Of Neoadjuvant Treatment Icd 10 Icd O 3 Histology Icd O 3 Site Land Sample Type Land Tissue Neoplasm Histologic Grade Pathologic M Pathologic N Pathologic Stage Pathologic T Race Sample Type Study Source SubjectID Survival Days Survival Status Tumor Necrosis Percent Tumor Nuclei Percent Tumor Type # TCGA-OR-A5J1-01A . UNCID_2203647.37e88158-0743-45b8-87cf-1d7fe878527f.bam E4038EBB-6E6D-44B1-84AD-E35AAFCA7B70 Adrenocortical carcinoma ACC.Primary Tumor Primary Tumor Adrenal Gland Primary Tumor TCGA TCGA-OR-A5J1 . 0 90 ACC #Download sampleID in a sample set (e.g. accRNAseq). Return: data.frame sampleSetID=Land.DownloadSampleSet('accRNAseq') #subset metadata of a sampleset (e.g. accRNAseq). Return: data.frame MetaData.sampleSet=DownloadMetaData[DownloadMetaData$ID %in% sampleSet[,1],]
Text-dump land data by genes and samples
Users can text-dump different land data types (DataModes) by genes and samples, including Expression_Ratio, CNV, RPPA, DnaSeq_Mutation, "DnaSeq_SomaticMutation, RnaSeq_Transcript,RnaSeq_GeneBas,RnaSeq_Fusion,RnaSeq_Mutation, RnaSeq_SomaticMutation, Methylation450_B37.
First, specify two vectors containing genes and samples of interest,
eg. #Create a gene vector genes = c("MET","egfr","braf","KRas") #Create a sample vector samples = c("TCGA-AR-A1AR-01A","TCGA-BH-A1EO-01A","TCGA-BH-A1EO-11A","TCGA-BH-A1ES-01A","TCGA-BH-A1ET-01A","TCGA-BH-A1ET-11B","TCGA-BH-A1EU-01A","TCGA-BH-A1EU-11A","TCGA-BH-A1EV-01A","TCGA-BH-A1EW-01A","TCGA-BH-A1EW-11B","TCGA-BH-A1F0-01A","TCGA-BH-A1F0-11B","TCGA-C8-A1HF-01A","TCGA-C8-A1HG-01A","TCGA-C8-A1HI-01A","TCGA-C8-A1HL-01A","TCGA-C8-A1HM-01A","TCGA-C8-A1HN-01A","TCGA-E2-A14N-01A","TCGA-E2-A15I-01A","TCGA-E2-A15I-11A","TCGA-A2-A0CX-01A","TCGA-A2-A0D0-01A","TCGA-A2-A0D4-01A","TCGA-A7-A0CD-01A","TCGA-A7-A0CE-01A","TCGA-A7-A0CE-11A","TCGA-A7-A0CG-01A","TCGA-A7-A0CH-01A","TCGA-A7-A0CH-11A","TCGA-A7-A0CJ-01A","TCGA-A7-A0DB-01A","TCGA-A7-A0DB-11A","TCGA-A8-A06N-01A","TCGA-A8-A06O-01A","TCGA-A8-A06P-01A","TCGA-A8-A06R-01A","TCGA-A8-A06T-01A","TCGA-A8-A06U-01A")
Then, submit the query, specifying the DataMode to query:
Note, the Land.TextDumpArrayLandData function is for Array Lands (OncoLand, DiseaseLand, etc) and does not work for GeneticsLands.
#Text-dump by genes and samples TextDumpArrayLandData = Land.TextDumpArrayLandData(Genes = genes, Samples = samples, DataMode = "Expression_Ratio")
#Text-dump by all genes and samples TextDumpArrayLandData = Land.TextDumpArrayLandData(Genes = "(all)", Samples = samples, DataMode = "Expression_Ratio") or TextDumpArrayLandData = Land.TextDumpArrayLandData(Samples = samples, DataMode = "Expression_Ratio")
#Text-dump by genes and all samples TextDumpArrayLandData = Land.TextDumpArrayLandData(Genes = genes, Samples = "(all)", DataMode = "Expression_Ratio")
#Text-dump by genes and all samples and return data by gene level TextDumpArrayLandData = Land.TextDumpArrayLandData(Genes = genes, Samples = "(all)", DataMode = "Expression_Ratio", DownloadGeneLevelData = TRUE) # Notes: default DownloadGeneLevelData is False
Land.TextDumpArrayLandGeneData return type by different DataModes
DataMode | Return type | Return type (DownloadGeneLevelData=FALSE, Default) | Return type ((DownloadGeneLevelData=TRUE) |
CNV | data.list | $CNV | $CNV,$CNV_ByGeneLevel |
CNVCall | data.list | $CNVCall | NA |
DnaSeq_Mutation | data.list | $DnaSeq_Mutation,$DnaSeq_Mutation.Annotation | NA |
DnaSeq_Mutation_Exome | data.list | $DnaSeq_Mutation_Exome,$DnaSeq_Mutation_Exome.Annotation | $DnaSeq_Mutation_Exome,$DnaSeq_Mutation_Exome.Annotation,$DnaSeq_Mutation_Exome_ByGeneLevel |
DnaSeq_SomaticMutation | data.list | $DnaSeq_SomaticMutation,$DnaSeq_SomaticMutation.Annotation | $DnaSeq_SomaticMutation,$DnaSeq_SomaticMutation.Annotation,$DnaSeq_SomaticMutation_ByGeneLevel |
Expression_Ratio | data.list | $Expression_Ratio | $Expression_Ratio,$Expression_Ratio_ByGeneLevel |
Expression_Intensity_Probes | data.list | $Expression_Intensity_Probes | $Expression_Intensity_Probes,$Expression_Intensity_Probes_ByGeneLevel |
General_Expression | data.list | $General_Expression | $General_Expression,$General_Expression_ByGeneLevel |
Methylation450_B37 | data.list | $Methylation450_B37 | NA |
RnaSeq_Exon | data.list | $RnaSeq_Exon | NA |
RnaSeq_ExonJunction | data.list | $RnaSeq_ExonJunction | NA |
RnaSeq_GeneBas | data.list | $RnaSeq_GeneBas | NA |
RnaSeq_Fusion | data.list | $RnaSeq_Fusion,$RnaSeq_Fusion.Annotation | NA |
RnaSeq_Mutation | data.list | $RnaSeq_Mutation,$RnaSeq_Mutation.Annotation | $RnaSeq_Mutation,$RnaSeq_Mutation.Annotation,$RnaSeq_Mutation_ByGeneLevel |
RnaSeq_SomaticMutation | data.list | $RnaSeq_Mutation,$RnaSeq_Mutation.Annotation | $RnaSeq_Mutation,$RnaSeq_Mutation.Annotation,$RnaSeq_Mutation_ByGeneLevel |
RnaSeq_PairedEndFusion | data.list | $RnaSeq_PairedEndFusion,$RnaSeq_PairedEndFusion.Annotation | NA |
RnaSeq_Transcript | data.list | $RnaSeq_Transcript | $RnaSeq_Transcript,$RnaSeq_Transcript_ByGeneLevel |
RPPA_RBN | data.list | $RPPA_RBN | $RPPA_RBN,$RPPA_RBN_ByGeneLevel |
WARNING: Depending on the platform used by each study, microarray data may be stored in "Expression_Intensity_Probes", or in "Expression_Ratio". For example, most ImmunoLand studies have "Expression_Intensity_Probes", while TCGALand samples have "Expression_Ratio". Using the wrong DataMode will usually lead to an error such as "unused argument".

Please note, R would return an data frame with automatic recognized data types. For example, if a Sample ID is named by digits plus single letter 'E', R would consider it as a numerical value instead of character value. To avoid this issue, our developers implemented another parameter ColClasses, as shown in the screenshot below: when passing ColClasses = "character" to the textdump function, it will return every column as characters (tmp1$DnaSeq_Mutation); by default that parameter would be disabled, and the txtdump function would return a table with recognized character/int/num etc.
Download Land Clinical MetaData
The following function returns the clinical metadata for all samples or a subset of samples within the initiated Land
#to download all the clinical metadata for a land Land.DownloadClinicalData() = "ClinicalData" #or you can specify individual samples #Create a sample set using instructions from here #Download your sampleset of interest sampleSetID = Land.DownloadSampleSet('sampleset_of_interest') SampleSet_of_Subset =ClinicalData[ClinicalData$ID %in% sampleSetID[,1],]
Download Comparison Land Data
The following functions only work for Comparison Land, such as ImmunoLand2015Q2. Here we use ImmunoLand2015Q2 as example to show how to retrieve data from Comparison Land.
Land.CurrentLand["LandName"] = "ImmunoLand2015Q2"
Download ArrayLand Gene Comparison
Download ArrayLand gene comparison data by Genes/GeneSet and Comparisons/ComparisonSet, including Comparison.Genes, Comparison, and FullComparisonMetaData.
eg. #Create a gene vector genes = c("MET","egfr","braf","KRas") #Create a Comparisons vector comparisons = c('GSE13887.GPL570.test1','GSE13849.GPL570.test1','GSE13849.GPL570.test2')
#Download ArrayLand Gene Comparison by Genes and Comparisons DownloadArrayLandGeneComparison = Land.DownloadArrayLandGeneComparison(Genes = genes, Comparisons = comparisons)
#Download ArrayLand Gene Comparison by Genes and ComparisonSet comparisonSet = "Sub_ComparisonSet2" DownloadArrayLandGeneComparison = Land.DownloadArrayLandGeneComparison(Genes = genes, ComparisonSet = comparisonSet) #Download ArrayLand Gene Comparison by GeneSet and Comparisons geneSet = "GeneSet01" DownloadArrayLandGeneComparison = Land.DownloadArrayLandGeneComparison(GeneSet = geneSet, ComparisonSet = comparisons) #Download ArrayLand Gene Comparison by GeneSet and ComparisonSet geneSet = "GeneSet01" DownloadArrayLandGeneComparison = Land.DownloadArrayLandGeneComparison(GeneSet = geneSet, ComparisonSet = comparisonSet)
Download ArrayLand Comparison Data
Download ArrayLand comparison data by Comparisons/ComparisonSet, including Design, Comparison.Genes, Comparison, and FullMetaData.
eg. #Download ArrayLand Comparison Data by Comparisons DownloadArrayLandComparisonData = Land.DownloadArrayLandComparisonData(Comparisons = comparisons)
#Download ArrayLand Comparison Data by ComparisonSet DownloadArrayLandComparisonData = Land.DownloadArrayLandComparisonData(ComparisonSet = comparisonSet)
Text-Dump ArrayLand Gene Comparison
Text-Dump ArrayLand Gene Comparison by Genes/GeneSet and Comparisons/ComparisonSet, including Comparison.Matrix, Comparison, and FullMetaData.
eg. #Text-Dump ArrayLand Gene Comparison by Genes and Comparisons TextDumpArrayLandGeneComparison = Land.TextDumpArrayLandGeneComparison(Genes = genes, Comparisons = comparisons)
#Text-Dump ArrayLand Gene Comparison by Genes and ComparisonSet TextDumpArrayLandGeneComparison = Land.TextDumpArrayLandGeneComparison(Genes = genes, ComparisonSet = comparisonSet) #Text-Dump ArrayLand Gene Comparison by GeneSet and Comparisons TextDumpArrayLandGeneComparison = Land.TextDumpArrayLandGeneComparison(GeneSet = geneSet, Comparisons = comparisons) #Text-Dump ArrayLand Gene Comparison by GeneSet and ComparisonSet TextDumpArrayLandGeneComparison = Land.TextDumpArrayLandGeneComparison(GeneSet = geneSet, ComparisonSet = comparisonSet)
Text-Dump ArrayLand Comparison Data
Text-Dump ArrayLand Comparison Data by Comparisons or ComparisonSet, including Comparison.Matrix, Comparison, and FullMetaData.
eg. #Text-Dump ArrayLand Comparison Data by Comparisons TextDumpArrayLandComparisonData = Land.TextDumpArrayLandComparisonData(Comparisons = comparisons)
#Text-Dump ArrayLand Comparison Data by ComparisonSet TextDumpArrayLandComparisonData = Land.TextDumpArrayLandComparisonData(ComparisonSet = comparisonSet)
Download ComparisonSet
Retrieve ComparisonIDs from a ComparisonSet
eg. #Download ComparisonSet DownloadComparisonSet = Land.DownloadComparisonSet(ComparisonSet=comparisonSet)
Arguments and Returned Value of ComparisonLand Download/Text-dump
Method | Arguments | Return type | Included data.frame |
Land.DownloadArrayLandGeneComparison | Genes/GeneSet, Comparisons/ComparisonSet | List | $Comparison.Genes, $Comparison, $FullComparisonMetaData |
Land.DownloadArrayLandComparisonData | Comparisons/ComparisonSet | List | $Comparison.Genes, $Comparison, $Design, $FullMetaData |
Land.TextDumpArrayLandGeneComparison | Genes/GeneSet, Comparisons/ComparisonSet | List | $Comparison, $Comparison.Matrix, $FullMetaData |
Land.TextDumpArrayLandComparisonData | Comparisons/ComparisonSet | List | $Comparison, $Comparison.Matrix, $FullComparisonMetaData |
Land.DownloadComparisonSet | ComparisonSet | Vector |
IfDeleteResult (optional parameter)
By default, most Land R commands will automatically delete the temporary folders that were generated. If you would like to keep the folders (e.g. to see the underlying Oscripts), you can set the parameter "IfDeleteResult=FALSE"; e.g.
Land.TextDumpArrayLandGeneData(Genes = geneIDs, Samples = sampleIDs, DataMode = "DnaSeq_Mutation",IfDeleteResult=FALSE)
GeneticsLand
Searching
Search for gene, phenotype, association report, region, or RS_ID
#Initializes Search search <- "egfr" #Sets view view <- "Gene.CodingSnpListing" #Performs the search result <-Land.SearchGeneticLand(Search=search, View=view)
Search by Samples
#Sets the samples to be searched samples <- c("HG01887","HG01895","HG02258","HG01959","HG02478","HG01888","NA19702","NA20297","NA20345","HG02436","NA20319","NA20300","NA20313","NA19120","NA20295","NA20333","NA19396","NA19985","NA19249","NA20302","NA20128","NA18911") #Performs the search result <- Land.SearchGeneticLand(Search=search, View=view, Samples=samples)
Grouping
#Some views tabulate values based on a grouping factor #Sets the group from the meta data to be searched group <- "Country of Origin" samples <- c("HG01887","HG01895","HG02258","HG01959","HG02478","HG01888","NA19702","NA20297","NA20345","HG02436","NA20319","NA20300","NA20313","NA19120","NA20295","NA20333","NA19396","NA19985","NA19249","NA20302","NA20128","NA18911") result <- Land.SearchGeneticLand(Search=search, View="Gene.GroupedArraySnpSummary", Group=group, Samples=samples)
Append Annotations
#By default, some table views will have fewer columns than the corresponding search result in ArrayStudio since annotations are not included #Set AppendAnnotation to True to have these joined to match the results from ArrayStudio result <- Land.SearchGeneticLand(Search="FADS1", View="Gene.AssociationTable", OtherOptions="/AppendAnnotation=True")
Set Column Types
#Data tables will be read into R using the read.table function. You may set the ColClasses option for this function to specify what types each column should be read as #Read all columns as strings result <- Land.SearchGeneticLand(Search="FADS1", View="Gene.AssociationTable", ColClasses="character", OtherOptions="/AppendAnnotation=True") #Read PValue columns as strings and let R infer types for all other columns result <- Land.SearchGeneticLand(Search="FADS1", View="Gene.AssociationTable", ColClasses=c(PValue="character", PValueHeterogeneity="character"), OtherOptions="/AppendAnnotation=True")
Search by SampleFilter
search <- "egr1" view <- "Gene.GroupedArraySnpSummary" # Single quotes should be used for multiple filter conditions. otherwise, it would not be parsed correctly. sampleFilter <- "'BMI > 20' & 'Cohort = ACB'" result <- Land.SearchGeneticLand(Search=search, View=view, SampleFilter=sampleFilter)
# For one filter condition, single quotes are not required: sampleFilter <- "'BMI > 20'" sampleFilter <- "BMI > 20" #This works too
Notes: Columns in SampleFilter should come from the columns of sample metadata.
SampleFilter Syntax
Best practice is to qualify the operator with the OP: prefix to ensure proper parsing. Without this qualification, if your column name or value contains an operator, you may get an error. For example:
Exclusion Criteria = BMI < 20
may parse on < as the operator and look for a column named Exclusion Criteria = BMI and error when that column is not found.
Comparison Operator | Description | Example(s) |
---|---|---|
< | Numeric column values are less than the specified threshold | BMI OP:< 30 |
> | Numeric column values are greater than the specified threshold | BMI OP:> 30 |
<= | Numeric column values are less than or equal to the specified threshold | BMI OP:<= 30 |
>= | Numeric column values are greater than or equal to the specified threshold | BMI OP:>= 30 |
= | Numeric or character column values match the specification (exact match) | BMI OP:= 30
Ancestry OP:= Asian |
!= | Numeric or character column values do not match the specification (exact match) | BMI OP:!= 30
Ancestry OP:!= Asian |
~ | Character column values contain the specified string (case insensitive) | Ancestry OP:~ Asian |
MATCH | Same as ~ | Ancestry OP:MATCH Asian |
CSMATCH | Case-sensitive version of MATCH | Country OP:CSMATCH US |
IN | Character column values exact match any of the strings in parentheses | Ancestry OP:IN (Asian, South Asian, East Asian) |
Sub-expressions should be enclosed in single quotes to ensure proper parsing
Logical Operator | Description | Example |
---|---|---|
& | Both conditions must be met | 'BMI OP:> 30' & 'Ancestry OP:~ Asian' |
| | Either condition must be met | 'BMI OP:> 30' | 'Ancestry OP:~ Asian' |
!() | The condition must not be met | !('Ancestry OP:~ Asian') |
The OP: qualifier does not work on logical operators. Parentheses can be used to create complex nested queries like:
('condition 1' & 'condition 2') | ('condition 3' & 'condition 4')
Search by AssociationFilter
#Complementary to filtering on sample attributes, you may also filter genetic association results based on analysis metadata (same query syntax as SampleFilter) view <- "AssociationTableView" associationFilter <- "'EffectType OP:= Beta' & 'MeSH Unique ID OP:= D000428'" result <- Land.SearchGeneticLand(Search="", View=view, AssociationFilter=associationFilter, OtherOptions="/AppendAnnotation=True")
Search by VariableFilter
#Some views allow you to filter which variables (columns) are returned view <- "ClinicalTableView" sampleFilter <- "Clinical M Stage=M0" variableFilter <- "Clinical M Stage" result <- Land.SearchGeneticLand(Search="", View=view, SampleFilter=sampleFilter, VariableFilter=variableFilter, OtherOptions="/AppendAnnotation=True")
View Types
Generally, available views can be found in Array Studio by hovering over the view name like this:
Samples, Projects, etc
- TableView - sample details
- ProjectTableView
- ClinicalTableView
- AssociationTableView
RS_ID or Region (single base)
Region searches in the form of 7:1000-2000 or chr7:1000-2000
- Variant.ArraySnpGenotypes
- Variant.AssociationTable
- Variant.CovariateVariableView
- Variant.EqtlTable
- Variant.Grasp2Table
- Variant.ImputedSnpDoseGenotypes
- Variant.SnpAnnotation
- Variant.SnpGenotypes
- Variant.VcfSnpGenotypes
- Variant.AlleleFrequency
- Variant.GenotypeFrequency
Gene
- Gene.AllSnpListing
- Gene.CodingSnpListing
- Gene.ArraySnpSummary
- Gene.VcfSnpSummary
- Gene.ImputedSnpDoseSummary
- Gene.GroupedArraySnpSummary
- Gene.GroupedVcfSnpSummary
- Gene.GroupedImputedSnpDoseSummary
- Gene.Grasp2Table
- Gene.AssociationTable
- Gene.EqtlTable
Region
- Region.ArraySnpSummary
- Region.AssociationTable
- Region.CodingSnpListing
- Region.EqtlTable
- Region.GeneListing
- Region.Grasp2Table
- Region.GroupedArraySnpSummary
- Region.GroupedImputedSnpDoseSummary
- Region.GroupedVcfSnpSummary
- Region.ImputedSnpDoseSummary
- Region.VcfSnpSummary
- Region.AllSnpListing
Phenotype or Association Report
- Grasp2Association.TopHits
- Association.TopHits
- Association.RegionPlot
- Grasp2Association.GenomePlot
- Association.GenomePlot
Special Views
There are two special views for querying association results
CountAssociationTopHits takes the additional MaxPValue parameter which can contain a list of p-value thresholds and returns a count of the number of results below each threshold like:
Counts <- Land.SearchGeneticLand(Search="Dermatology Hair Male Pattern Baldness ChrX PMID28196072", View="CountAssociationTopHits", OtherOptions="/MaxPValue=0.5,0.1,0.01,1e-3,1e-4") Counts AssociationID Cutoff Count 1 Dermatology Hair Male Pattern Baldness ChrX PMID28196072 5e-01 7585 2 Dermatology Hair Male Pattern Baldness ChrX PMID28196072 1e-01 1907 3 Dermatology Hair Male Pattern Baldness ChrX PMID28196072 1e-02 476 4 Dermatology Hair Male Pattern Baldness ChrX PMID28196072 1e-03 248 5 Dermatology Hair Male Pattern Baldness ChrX PMID28196072 1e-04 183
SearchAssociationTopHits takes three additional parameters
- MaxPValue to only return results with p-values below this threshold
- MaxRows to truncate the results to this maximum (set to 0 or -1 to get all results pursuant to MaxTopHitsN)
- NoAnnotation to indicate whether results should be annotated
Note that R may convert extremely low PValues, such as <1E-325, to 0. When this is a concern, use the ColClasses option to force these to be read as strings
NoAnnotResults <- Land.SearchGeneticLand(Search="Dermatology Hair Male Pattern Baldness ChrX PMID28196072", View="SearchAssociationTopHits",OtherOptions="/MaxPValue=0.01; /MaxRows=1500; /NoAnnotation=True") head(NoAnnotResults) ID SnpID Chromosome Position Reference Alternative PValue PValueHeterogeneity AlleleFrequency EffectSize EffectSize_LB 1 1 rs12558842 X 66481800 C A 1E-325 NA NA 0.5427 NA 2 2 rs4827528 X 66335096 A G 1E-325 NA NA 0.5823 NA 3 3 rs2497938 X 66563018 T C 1E-325 NA NA -0.5284 NA 4 4 rs6625163 X 66510984 G A 1E-325 NA NA 0.5319 NA 5 5 rs775366 X 65998455 A G 1E-325 NA NA 0.5360 NA 6 6 rs73221556 X 65933285 C A 1E-325 NA NA -0.5266 NA EffectSize_UB EffectSize_SE ImputationQuality PooledSampleSize Direction_Up Direction_Down Uncertain 1 NA NA NA 52874 0 0 NA 2 NA NA NA 52874 0 0 NA 3 NA NA NA 52874 0 0 NA 4 NA NA NA 52874 0 0 NA 5 NA NA NA 52874 0 0 NA 6 NA NA NA 52874 0 0 NA
Quick Start
Scatter plot of expression value vs. CN log2ratios within specific genes and samples
This is an example to draw scatter plot of gene expression vs. CN log2ratios for genes MDM,BRAF, EGFR, and FGF12 #initate oshell environment Land.InitiateOshell( OshellDirectory = "E:/Oshell/", BaseDirectory = "C:/Users/UserName/Documents/Omicsoft", TempDirectory = "C:/Users/UserName/Documents/Omicsoft/Temp" ); #initiate Land environment Land.InitiateLand(Server = "test.omicsoft.com:8065", UserID = "userName",Password = "password", LandName = "TCGA2015") #Create a gene vector genes = c("MDM2", "BRAF", "EGFR", "FGF12") #create a sample vector from sampleID.txt #eg. sampleID.txt # sampleid # TCGA-BH-A1EV-01A # TCGA-C8-A1HL-01A # TCGA-C8-A1HN-01A # TCGA-A7-A0CE-01A # TCGA-A7-A0CG-01A samples = c( "TCGA-BH-A1EV-01A", "TCGA-C8-A1HL-01A", "TCGA-C8-A1HN-01A", "TCGA-A7-A0CE-01A", "TCGA-A7-A0CG-01A", "TCGA-A8-A06R-01A", "TCGA-A8-A06Y-01A", "TCGA-A8-A07F-01A", "TCGA-A8-A07I-01A", "TCGA-A8-A07L-01A", "TCGA-A8-A08B-01A", "TCGA-A8-A08F-01A", "TCGA-A8-A08G-01A", "TCGA-A8-A08J-01A", "TCGA-A8-A08L-01A", "TCGA-A8-A099-01A", "TCGA-A8-A09B-01A", "TCGA-A8-A09C-01A", "TCGA-A8-A09E-01A" ); # or samples = read.table("Z:/Users/UserName/landRApi/R/sampleID.txt", header = TRUE, stringsAsFactors = FALSE, sep = "\t",quote = "")[,1] ## Test a small subset of samples RnaSeqTxnExprData = Land.TextDumpArrayLandData(Genes = genes, Samples = samples, DataMode = "RnaSeq_Transcript") RnaSeqTxnExprData$RnaSeq_Transcript[1:5,]
#retrieve expression ratio data from land for MDM,BRAF, EGFR, and FGF12 ArrayExprData0 = Land.TextDumpArrayLandData(Genes = genes, Samples = samples, DataMode = "Expression_Ratio") ArrayExprData=ArrayExprData0$Expression_Ratio ArrayExprData[1:5, ] #retrieve CNV data from land for the genes CNVData0 = Land.TextDumpArrayLandData(Genes = genes, Samples = samples, DataMode = "CNV") CNVData=CNVData0$CNV CNVData$CNV[1:5, ] #extract SampleID, GeneID, and expression ratio from Expression_Ratio_Data ex = data.frame(SampleID = ArrayExprData$SampleID, GeneID = ArrayExprData$GeneID, ExpressionValue = ArrayExprData$Value) #extract SampleID, GeneID, and CNVLog2Ratio from CNV cn = data.frame(SampleID = CNVData$SampleID, GeneID = CNVData$GeneID, CNVLog2Ratio = CNVData$Value) # merge expression ratio and CNVLog2Ratio ArrayExprAndCNV = merge(ex, cn, by = c("SampleID","GeneID")) ArrayExprAndCNV[1:5, ]; #scatter plot of expression value and CNVLog2Ratio library(lattice) xyplot(CNVLog2Ratio~ExpressionValue | GeneID, ArrayExprAndCNV, grid = TRUE, group = GeneID, pch = 19, main = "Integration view of Expression Ratio => CNV", ylab = "Values from Expression Array", xlab = "CNV Log2Ratio");![]()
Scatter plot of expression value vs. CN log2ratios with full sample meta data
## Test on all samples #Retrieve CNV data for all samples CNVData0=Land.TextDumpArrayLandData(Genes = genes, Samples = "(all)", DataMode = "CNV") CNVData=CNVData0$CNV #Retrieve ArrayExpression data for all samples ArrayExprData0 = Land.TextDumpArrayLandData(Genes = genes, Samples = "(all)", DataMode = "Expression_Ratio") ArrayExprData=ArrayExprData0$Expression_Ratio CNVData[1:5, ] ArrayExprData[1:5, ] ex = data.frame(SampleID = ArrayExprData$SampleID, GeneID = ArrayExprData$GeneID, ExpressionValue = ArrayExprData$Value) cn = data.frame(SampleID = CNVData$SampleID, GeneID = CNVData$GeneID, CNVLog2Ratio = CNVData$Value) ArrayExprAndCNV = merge(ex, cn, by = c("SampleID","GeneID")) ArrayExprAndCNV[1:5, ];
#Obtain full sample metadata SampleFullMetaData = Land.DownloadMetaData(); head(SampleFullMetaData); MetaDatasubset = data.frame(SampleID = SampleFullMetaData$ID, Tumor.Type = SampleFullMetaData$Tumor.Type, Sample.Type = SampleFullMetaData$Sample.Type); MetaDatasubset[1:5, ];
#Merge ArrayExprAndCNV and sample metadata ArrayExprAndCNV2 = merge(ArrayExprAndCNV, MetaDatasubset, by = c("SampleID")) ArrayExprAndCNV2[1:5, ]; ##Scatter plot of expression value and CNVLog2Ratio with full metadata library(lattice) xyplot(CNVLog2Ratio~ExpressionValue | GeneID, ArrayExprAndCNV2, grid = TRUE, group = Tumor.Type, pch = 19, main = "Integration view of Expression Ratio => CNV", ylab = "Values from Expression Array", xlab = "CNV Log2Ratio", ylim = c(-5, 5), auto.key = list(pch = 19, columns = 8) );![]()