Why and how to use biomaRt?

The bioinformatics work includes the gene annotation work. In recent years more and more biological data has become available.  Meanwhile, how to get the access these valuable data resources and analyse the data is important for comprehensive bioinformatics data analysis. The biomaRt is a very useful tool to achieve that. Now there are two questions: 1. Why to use biomaRt? 2. How to use biomaRt?

Let us first get the concept of BioMart. The BioMart project (http://www.biomart.org) provides free software and data services to the international scientific community in order to foster scientific collaboration and facilitate the scientific discovery process. Examples of BioMart databases are Ensembl, Uniprot and HapMap.  However, if the dataset is big and the conversion from different datasets  is troublesome, we need a bioinformatics tool which could do it automatically. The biomaRt is the package which provides an interface to a growing collection of databases implementing the BioMart software suite. The package enables retrieval of large amounts of data in a uniform way without the need to know the underlying database schemas or write complex SQL queries. The major databases (e.g. Ensembl, Uniprot)give biomaRt users direct access to a diverse set of data and enable a wide range of powerful online queries from R&Bioconductor.

The first way to use BioMart is online ID conversion. We could go to website: http://useast.ensembl.org/biomart/martview/ and then select the corresponding datasets, filters and attributes. If we click the ‘Results’ button, we could see the final outputs.

The second way is to use biomaRt, which is a R&Bioconductor package. There are 2 steps: (1) select the Mart database and (2) use getBM to get the gene annotation. However, how many Mart database does the package have? And how do we get the correct setting from filters and attributes from the corresponding datasets?  We could use function ‘listMart’ and ‘listDatasets’ to check the database, meanwhile the function ‘listFilters’ and ‘listAttributes’ are useful for you to get the correct setting . Let ‘s check the corresponding results from R.

Mart version by the command listMarts()

Datasets version by the command listDatasets(ensembl)

Filter function by the command listFilters(ensembl)

Attribute function by the command listAttributes(ensembl)

Besides the database ID conversion (e.g. ID,symbol, name) , the biomaRt could achieve the information of SNP, alternative splicing, exon, intron, 5’utr, 3’utr as well.

The third way is to use Biomart Perl API, it is also one of the most convenient way to access BioMart programmatically.  We would not introduce it in detail in this post.

Generally speaking, it is an amazing bioinformatics tool, and moreover, it is free!


Categories: Uncategorized