Archive For The “bioconductor” Category

Getting gene attributes from biomart

By |

Biomart is an Ensembl service for querying and retrieving data. Here I’ll use R/Bioconductor to get gene positions for a list of genes. We basically need a mart object which encapsulates the species, host and database information. Here I’m using the US west coast mirror. This defaults to the current build (GRCh38) of the human…

Read more »

Bioconductor and Docker

By |

Dockerizing your bioconductor code is one way to ensure a reproducible workflow. It also enables easier deployment to cloud environments (e.g. AWS Batch) in case you want to scale your analysis. The official Bioconductor docker containers are hosted on Docker Hub. They are built with Rstudio-server installed so that you can run an interactive analysis….

Read more »

Accessing biomart from R

By |

A good use for Ensembl’s biomart is the ability to pull a list of gene names linked to ensembl transcript ID’s, as when using Sleuth. We will need Bioconductor and the biomaRt library if you don’t have it already

The usual bioC way of doing things like this is to create an object that…

Read more »

How to view function definitions in R (S3/S4)

By |

Typically, when you type the name of a function in R, it will show you the source code of that function. For example, to see the definition of the ls() function, simply type ls into the R IDE. Viewing S3 methods Some functions work by dispatch. An example is print(). When you call print()with some…

Read more »

Viewing source code for S3/S4 objects

By |

It’s not always easy to see the source code for functions in R/BioC. For S4 method dispatch, you need to figure out the function signature, then call selectMethod() or getMethod() getMethod = selectMethod   For example:

  For S3 dispatch, you can call

  to get a list of dispatch methods and then…

Read more »

1L

By |

Just learned this and will forget it immediately. In R, the ‘L’ suffix explicitly makes the preceding number an integer. So 1L is the integer 1, whereas 1 is numeric whose value is equal to 1. The helps when you want to save some memory costs such as in a loop over say 1:100L Compare…

Read more »

Using colClasses to speed up file reading

By |

This is a nice tip I picked up from Bioconductor course: tab5rows = read.table(“data.txt”, header=TRUE, nrows=5) classes = sapply(tab5rows, class) tabAll = read.table(“datatable.txt”, header=TRUE, colClasses=classes) Basically, read in a few lines and determine the class type of each column. Then pass that information to read.table through the param colClasses. The important thing here is to…

Read more »

reading several files into their own data frames

By |

Occasionally you need to read a few file into separate data frames. Each file is a matrix of numbers, 79×79.

The parse/eval steps allow you to convert a string into dynamically executed statement. parse() creates an expression from a string and eval() evaluates the expression.

Read more »

Useful bioconductor methods

By |

I always forget these sort of commands

A useful overview of all the “parts” of Bioconductor. http://bioconductor.org/help/course-materials/2013/GenentechMay2013/Genentech2013.pdf

Read more »

merging data frames

By |

Great resource here on merging data frames: http://rwiki.sciviews.org/doku.php?id=tips%3adata-frames%3amerge Suppose you had two data frames, df1 and df2, each with an “id” column. To merge we just merge(df1, df2, by.x=”id”, by.y=”id”, all=TRUE) The all=TRUE keeps all rows of both data frames. You can also do a cross join using data frames. Another way to think of…

Read more »