Archive For The “R” Category
Biomart is an Ensembl service for querying and retrieving data. Here I’ll use R/Bioconductor to get gene positions for a list of genes. We basically need a mart object which encapsulates the species, host and database information. Here I’m using the US west coast mirror. This defaults to the current build (GRCh38) of the human…
Dockerizing your bioconductor code is one way to ensure a reproducible workflow. It also enables easier deployment to cloud environments (e.g. AWS Batch) in case you want to scale your analysis. The official Bioconductor docker containers are hosted on Docker Hub. They are built with Rstudio-server installed so that you can run an interactive analysis….
Rstudio is a great tool but suppose you need to access your work computer to do some work from home? Use Rstudio-sever which allows you to basically connect to a remote machine and do your development there. I can’t find a definitive answer whether the free version of Rstudio-server encrypts the login information. Their website…
Spaces in column names are changed to the standard R friendly format with a “.” in place of the space.
1 |
setnames(DT, make.names(colnames(DT))) |
A good use for Ensembl’s biomart is the ability to pull a list of gene names linked to ensembl transcript ID’s, as when using Sleuth. We will need Bioconductor and the biomaRt library if you don’t have it already
1 2 3 |
source("http://bioconductor.org/biocLite.R") biocLite("biomaRt") library(biomaRt) |
The usual bioC way of doing things like this is to create an object that…
Typically, when you type the name of a function in R, it will show you the source code of that function. For example, to see the definition of the ls() function, simply type ls into the R IDE. Viewing S3 methods Some functions work by dispatch. An example is print(). When you call print()with some…
It’s not always easy to see the source code for functions in R/BioC. For S4 method dispatch, you need to figure out the function signature, then call selectMethod() or getMethod() getMethod = selectMethod For example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
> pileup standardGeneric for "pileup" defined from package "Rsamtools" function (file, index = file, ..., scanBamParam = ScanBamParam(), pileupParam = PileupParam()) standardGeneric("pileup") <environment: 0x79bf7d8> Methods may be defined for arguments: file Use showMethods("pileup") for currently available ones. > showMethods("pileup") Function: pileup (package Rsamtools) file="BamFile" file="character" > selectMethod("pileup", "BamFile") #shows code for method that takes BamFile as input (see the source code) |
For S3 dispatch, you can call
1 |
methods("function-name") |
to get a list of dispatch methods and then…
Just learned this and will forget it immediately. In R, the ‘L’ suffix explicitly makes the preceding number an integer. So 1L is the integer 1, whereas 1 is numeric whose value is equal to 1. The helps when you want to save some memory costs such as in a loop over say 1:100L Compare…
This is a nice tip I picked up from Bioconductor course: tab5rows = read.table(“data.txt”, header=TRUE, nrows=5) classes = sapply(tab5rows, class) tabAll = read.table(“datatable.txt”, header=TRUE, colClasses=classes) Basically, read in a few lines and determine the class type of each column. Then pass that information to read.table through the param colClasses. The important thing here is to…