Biomart is an Ensembl service for querying and retrieving data. Here I’ll use R/Bioconductor to get gene positions for a list of genes.
We basically need a mart object which encapsulates the species, host and database information. Here I’m using the US west coast mirror. This defaults to the current build (GRCh38) of the human genome.
1 2 3 4 5 6 |
library(biomart) # from bioconductor library(DT) # for easier viewing mart = biomaRt::useMart(biomart='ENSEMBL_MART_ENSEMBL', dataset='hsapiens_gene_ensembl', host='uswest.ensembl.org') |
In Biomart parlance, filters are search parameters (like the WHERE clause in SQL) while atttributes are the information you are searching for.
1 2 3 4 5 6 |
all_filters = listFilters(mart = mart) all_attributes listAttributes(mart=mart) # DT just makes it easier to look at a large table datatable(all_filters) datatable(all_attributes) |
Here, I have a list of gene symbols and I want to query Biomart. This basically says to query the database using the field external_gene_name for any values that match my list of genes. Then return to me the ensembl gene id, gene symbol, chrom, start, stop positions as a data.frame.
1 2 3 4 5 6 7 8 9 |
# query: ensembl_id, symbol, chrom, start, stop pos4gene = biomaRt::getBM(attributes=c('ensembl_gene_id', "hgnc_symbol", 'chromosome_name', 'start_position', 'end_position'), filters=c("external_gene_name"), values = genes, mart=mart) |