Category Archives: BLAST

1st Viral Bioinformatics and Genomics Training Course

SRE_9674

The first Viral Bioinformatics and Genomics training course held at the University of Glasgow was completed successfully by 14 delegates (nine external and five internal) on 10-14 August 2015. The course took place in the McCall Building computer cluster, and the adjacent Lomond and Dumgoyne Rooms were used for refreshments and lunch.

Instructors:
Joseph Hughes (Course Organiser)
Andrew Davison
Robert Gifford
Sejal Modha
Richard Orton
Sreenu Vattipally
Gavin Wilkie

9:00-10:00 Tea & Coffee in Dumgoyne Room – Arrival of participants
DAY 1 – SEQUENCING TECHNOLOGIES AND A UNIX PRIMER

Day one will introduced the participant to the different sequencing technologies available, the ways in which samples are prepared with an emphasis on how this impacts the bioinformatic analyses. The rest of the day aimed to familiarize the researcher with the command line and useful UNIX commands.

IMG_20150814_121005

10:00-10:45 Next-generation sequencing technologies – Gavin Wilkie
10:45-11:30 Examples of HTS data being used in virology – Richard Orton
11:30-11:45 Brief introduction and history of Unix/Linux – Sreenu Vattipally and Richard Orton
11:45-12:30 The command line anatomy and basic commands – Sreenu Vattipally and Richard Orton

12:30-13:30 Lunch break in Dumgoyne Room

13:30-14:30 Essential UNIX commands – Sreenu Vattipally and Sejal Modha
14:30-15:30 Text processing with grep – Sreenu Vattipally and Sejal Modha
15:30-16:00 Tea & Coffee in Dumgoyne Room
16:00-17:30 sed and awk – Sreenu Vattipally and Sejal Modha

DAY 2 – UNIX CONTINUED AND HOW TO DO REFERENCE ASSEMBLIES

The participant continued to practice UNIX commands and learn how to run basic bioinformatic tools. By the end of the day, the participant were able to analyse HTS data with different reference assemblers and automate the processing of multiple files.

9:30-10:15 Introduction to file formats (fasta, fastq, SAM, BAM, vcf) – Sreenu Vattipally and Richard Orton
10:15-11:00 Quality scores, quality control and trimming (Prinseq and FastQC and trim_galore) – Sreenu Vattipally and Richard Orton
11:00-11:30 Tea & Coffee in Dumgoyne Room
11:30-13:00 Introduction to alignment and assembly programs (Blast, BWA, Bowtie) – Sreenu Vattipally and Richard Orton

13:00-14:00 Lunch break in Dumgoyne Room

14:00-14:45 Continuation of alignment and assembly programs (Novoaligner, Tanoti) – Sreenu Vattipally and Richard Orton
14:45-15:30 Post-assembly processing and alignment visualization (Tablet and UGENE) – Sreenu Vattipally and Richard Orton
15:30-16:00 Tea & Coffee in Dumgoyne Room
16:00-16:45 Workflow design and automation – Sreenu Vattipally and Richard Orton
16:45-17:30 Loops and command line arguments – Sreenu Vattipally and Richard Orton

DAY 3 – HOW TO DO DE NOVO ASSEMBLY AND USE METAGENOMICS PIPELINES

The third day covered the different approaches used for de-novo assembly and provided hands-on experience. A popular metagenomic pipeline was presented and participants learned how to use it as well as create their own pipeline. (Slides and practical)

IMG_20150812_113351

9:30-10:15 De-novo assemblers – Sejal Modha and Gavin Wilkie
10:15-11:00 Using different de-novo assemblers (e.g. Meta-idba, Edena, Abyss, Spades) – Sejal Modha and Gavin Wilkie

11:00-11:30 – Tea & Coffee in Lomond Room
11:30-13:00 Scaffolding contigs, filling gaps in assemblies and correcting errors (e.g. phrap, gap_filler, scaffold builder, ICORN2, spades) – Gavin Wilkie and Sejal Modha

13:00-14:00 Lunch break in Lomond Room

14:00-14:45 Practice building a custom de-novo pipeline (BLAST, KronaTools)
14:45-15:30 the MetAmos metagenomic pipeline – Sejal Modha and Sreenu Vattipally
15:30-16:00 Tea & Coffee in Lomond Room
16:00-17:30 Analysis using the pipeline – Sejal Modha and Sreenu Vattipally

DAY 4 – SEQUENCING TECHNOLOGIES AND HOW TO DO GENOME ANNOTATION

Day four gave the participant the opportunity to look into more detail at their assembly, they learned how to create a finished curated full genome (emphasis on finishing) with gene annotations and analysed the variation within their sample

9:30-11:00 Finishing and Annotating genomes – Andrew Davison, Gavin Wilkie and Sreenu Vattipally
11:00-11:30 Tea & Coffee in Dumgoyne Room
11:30-13:00 Annotation transfer from related species – Gavin Wilkie, Andrew Davison and Sreenu Vattipally

13:00-14:00 Lunch break in Dumgoyne Room

IMG_20150813_141642

14:00-15:30 Error detection and variant calling – Richard Orton
15:30-16:00 Tea & Coffee in Dumgoyne Room
16:00-17:30 Quasi-species characterisation – Richard Orton

17:30 Social evening in the pub/restaurant at Curlers on Byres Road organised by Richard Orton (http://www.thecurlersrestglasgow.co.uk). 

 

DAY 5 – PRIMER IN INVESTIGATING PATHWAYS OF VIRAL EVOLUTION

Researchers worked through several practical examples of using sequences in virology and spent the remaining time analysing their own data with the teachers help.

9:30-10:15 Identifying mutations of interest for individual sequences and within a set of sequences – Robert Gifford
10:15-11:00 Combining phylogenies with traits – Robert Gifford
11:00-11:30 Tea & Coffee in Dumgoyne Room
11:30-12:15 Investigating epidemiology (IDU example) – Robert Gifford
12:15-13:00 Investigate transmission of drug resistance – Robert Gifford

13:00-14:00 Lunch break in Dumgoyne Room

14:00-15:30 Analysing your own data or developing your own pipeline
15:30-16:00 Tea & Coffee in Dumgoyne Room
16:00-17:00 Analysing your own data or developing your own pipeline
17:00 Goodbyes

IMG_20150814_130750

 

If you would like to be informed of similar training courses run in the future, please fill in your details here.

 

 

Setting up automatic BLAST database update on linux servers

Basic Local Alignment Search Tool (BLAST) is one of the most commonly used programs for sequence classification using similarity search.

Standalone BLAST can be setup easily on the local server. More info about how to set it up on a local Linux server can be found here:

http://www.ncbi.nlm.nih.gov/books/NBK52640/

In our lab, all our servers run the BioLinux operating system and BLAST is pre-installed on the server. With local BLAST, it is important to update local BLAST databases regularly to include new sequences submitted to NCBI. However, sometimes it does become a bit tricky to install and regularly update these databases.

Here is a small tutorial about how to setup local BLAST databases and regularly update them.

In BioLinux, the BLASTDB variable path is usually set up to /var/lib/blastdb and is specified in the blast_environment.sh file in /etc/profile.d/blast_environment.sh

The standard blast_environment.sh file looks like this.

BLASTDB path can be updated to /your/blastdb/location by changing details in the “if” statement of the file.

The following example shows how I will change the location to my customized blastdb in my home directory /home/sejalmodha/blastdb

On a standard linux server you can specify the BLASTDB path variable in /etc/bash.bashrc or in your local ~/.bashrc

To update these databases regularly on the server, use NCBI’s update_blastdb script and wrap it in a cronjob.

I have an update_db.sh script that downloads nr, nt and refseq_protein databases from the NCBI website and changes the permissions of those files so that all users can use the files.

To schedule the downloading of these databases monthly, put it in a cronjob called blast_cronrun and save the log to download.log file.

The last step is to submit the cronjob using the crontab command.

 

Bioinformatician at CVR.
http://bioinformatics.cvr.ac.uk/sejal.php