Serving BAM files from S3

Its handy to have a webserver where you can put up BAM files for sharing with collaborators. But if you have tons of files or available webserver, you can use Amazon S3 to serve up your files. While S3 is pretty simple, setting it up for serving BAM (and index) files has a few tricky aspects to it.

Setting up an S3 bucket

You can do this pretty simply from the AWS dashboard. Pick a region (close to you), create a folder or two. You can specify an encryption type so that data in S3 is encrypted (I used server side encryption with Amazon S3-Managed Keys).

Remember S3 bucket names are unique! It cannot be the same as any other bucket that has ever existed. Also, it’s a good idea to make the bucket private so that the BAM files are not open to the public.

Copy files to S3

I’ll assume you have AWS CLI for copying files. To list your buckets

aws s3 ls
# list a specific archive
aws s3 ls s3://mybucket

We want to copy our files into S3. We need both the BAM file and the BAM index file.

aws s3 cp mybamfile.bam s3://mybucket/bamfiles
aws s3 cp mybamfile.bam.bai s3://mybucket/bamfiles
# alternatively
aws s3 sync myfolder s3://mybucket/bamfiles

More details about using the CLI are here.

S3 does not support authentication or any such mechanism. So if your bucket is private, you will need to use some methods to expose your data. There are a few options.

AWS CLI to presign a URL

Using the awscli, you can generate a signed URL. Basically this generates a URL that will work for anyone who has the link. The URL will expire (default 3600 seconds). We actually need to make two signed URLs. One for the BAM and one for the BAM index file.

# this will make one signed URL for the BAM file
aws s3 presign s3://mybucket/path/to/mybam.bam
# this makes a signed URL for the BAM index file
aws s3 presign s3://mybucket/path/to/mybam.bam.bai
# to specify an expiry time
aws s3 presign s3://mybucket/path/to/mybam.bam --expires-in

View in IGV

Then open up IGV, use File, Load from URI and input the signed URL for both the BAM and BAM index. Note that my standalone binary distribution version of IGV did not have two places to input the BAM and Bam index signed URL (there was only one input box). Instead, I had to goto the Broad website and load from Java Web Start which started IGV after downloading from the Broad. This version of IGV (2.4.10 03/20/2018) did have two input boxes for the BAM and BAM index.

Generate presigned URL using Java Eclipse

There is a code example here. A few steps need to be completed:

  1. download the AWS Java SDK
  2. add the SDK jar files to your build path
  3. You must specify these things:
    clientRegion (such as us-west-2)
    bucketName (mybucket)
    objectKey (name of the object stored in S3). Note that if you have a subfolder, that goes into the objectKey name (examples/bamfiles/mybam.bam). This is because S3 doesn’t really have folders, so the object name actually includes the whole path.
  4. Change HttpMethod.GET to HttpsMethod.GET
  5. Build and it will output the signedURL