Dockerizing your bioconductor code is one way to ensure a reproducible workflow. It also enables easier deployment to cloud environments (e.g. AWS Batch) in case you want to scale your analysis.
The official Bioconductor docker containers are hosted on Docker Hub. They are built with Rstudio-server installed so that you can run an interactive analysis. Official details are on the bioconductor docker site.
In my case, I want to dockerize Bioconductor so that the summarizeOverlaps()
function from GenomicAlignments
is available. You could start the container, make changes to it and then commit. But the recommended route is to just make a Dockerfile to build a docker container with the libraries you want to install. Here I will install GenomicAlignments
and the hg19 human transcriptome annotation. Create a file called Dockerfile
and add this
1 2 3 4 5 6 7 |
# Docker inheritance FROM bioconductor/bioconductor_docker:latest # Install required Bioconductor package RUN R -e 'BiocManager::install("TxDb.Hsapiens.UCSC.hg19.knownGene")' RUN R -e 'BiocManager::install("GenomicAlignments")' |
Now you can build the docker container and tag it. It will look for a file named Dockerfile by default.
1 |
docker build -t bioconductor_docker_hg19:latest . |
When you type docker images, your new image should show up.
You can then run it locally with something like
1 2 |
docker run -p 8787:8787 -e PASSWORD=bioc \ bioconductor_docker_hg19:latest |
The password bit is for Rstudio-server so that when you fire up your browser to http://localhost:8787 you can use the username rstudio (hardcoded into the bioconductor docker) and password you want (bioc). You can change the password to anything you want but the user is always rstudio.
Some useful things to do:
1 2 3 4 5 6 7 |
docker images docker ps -a docker run --rm -it -v $HOME/tmp:/tmp docker stop docker run -it --user rstudio bioconductor/bioconductor_docker:devel bash |
You can create your own launch script as a docker entrypoint for even more customization and usability.