Bioconductor and Docker

Dockerizing your bioconductor code is one way to ensure a reproducible workflow. It also enables easier deployment to cloud environments (e.g. AWS Batch) in case you want to scale your analysis.

The official Bioconductor docker containers are hosted on Docker Hub. They are built with Rstudio-server installed so that you can run an interactive analysis. Official details are on the bioconductor docker site.

In my case, I want to dockerize Bioconductor so that the summarizeOverlaps() function from GenomicAlignments is available. You could start the container, make changes to it and then commit. But the recommended route is to just make a Dockerfile to build a docker container with the libraries you want to install. Here I will install GenomicAlignments and the hg19 human transcriptome annotation. Create a file called Dockerfile and add this

Now you can build the docker container and tag it. It will look for a file named Dockerfile by default.

When you type docker images, your new image should show up.

You can then run it locally with something like

The password bit is for Rstudio-server so that when you fire up your browser to http://localhost:8787 you can use the username rstudio (hardcoded into the bioconductor docker) and password you want (bioc). You can change the password to anything you want but the user is always rstudio.

Some useful things to do:

You can create your own launch script as a docker entrypoint for even more customization and usability.