Setting up automatic BLAST database update on linux servers

Basic Local Alignment Search Tool (BLAST) is one of the most commonly used programs for sequence classification using similarity search.

Standalone BLAST can be setup easily on the local server. More info about how to set it up on a local Linux server can be found here:

http://www.ncbi.nlm.nih.gov/books/NBK52640/

In our lab, all our servers run the BioLinux operating system and BLAST is pre-installed on the server. With local BLAST, it is important to update local BLAST databases regularly to include new sequences submitted to NCBI. However, sometimes it does become a bit tricky to install and regularly update these databases.

Here is a small tutorial about how to setup local BLAST databases and regularly update them.

In BioLinux, the BLASTDB variable path is usually set up to /var/lib/blastdb and is specified in the blast_environment.sh file in /etc/profile.d/blast_environment.sh

The standard blast_environment.sh file looks like this.

#Added by package bio-linux-blast
# Ages ago we had a directory called /home/db/blastdb but new users don't want that.
# /var makes most sense, as it is more likely to be a local disk and suitable for "variable" data.

if [ -e /home/db/blastdb ] ; then

    #customised BLASTDB location
    export BLASTDB=/home/db/blastdb
# elif [ -e /var/lib/blastdb ] ; then
else
    #default BLASTDB location
    export BLASTDB=/db/blast
fi

BLASTDB path can be updated to /your/blastdb/location by changing details in the “if” statement of the file.

The following example shows how I will change the location to my customized blastdb in my home directory /home/sejalmodha/blastdb

if [ -e /home/sejalmodha/blastdb] ; then
    export BLASTDB = /home/sejalmodha/blastdb
else
    export BLASTDB = /home/db/blastdb
fi

On a standard linux server you can specify the BLASTDB path variable in /etc/bash.bashrc or in your local ~/.bashrc

BLASTDB = /home/sejalmodha/blastdb
export BLASTDB

To update these databases regularly on the server, use NCBI’s update_blastdb script and wrap it in a cronjob.

I have an update_db.sh script that downloads nr, nt and refseq_protein databases from the NCBI website and changes the permissions of those files so that all users can use the files.

#update_db.sh

cd /home/sejalmodha/blastdb/
update_blastdb --passive --decompress nr nt refseq_protein
chown root *
chgrp users *
chmod 755 *

To schedule the downloading of these databases monthly, put it in a cronjob called blast_cronrun and save the log to download.log file.

@monthly       /home/sejalmodha/blastdb/update_db.sh > /home/sejalmodha/blastdb/download.log 2>&1

The last step is to submit the cronjob using the crontab command.

crontab blast_cronrun