Category Archives: Metagenomics

2nd Viral Bioinformatics and Genomics Training Course (1st – 5th August 2016)

We have shared our knowledge on Viral bioinformatics and genomics with yet another clever and friendly bunch of researchers. Sixteen delegates from across the world joined us for a week of intensive training. The line-up of instructors changed slightly due to the departure of Gavin Wilkie earlier in the year.

Instructors:
Joseph Hughes (Course Organiser)
Andrew Davison
Sejal Modha
Richard Orton (co-organiser)
Sreenu Vattipally
Ana Da Silva Filipe

The timetable changed a bit with more focus on advanced bash scripting (loops and conditions) as we asked the participants to have basic linux command experience (ls, mkdir, cp) which saved us a lot of time. Rick Smith-Unna’s Linux bootcamp was really useful for the students to check their expertise before the course: http://rik.smith-unna.com/command_line_bootcamp.

The timetable this year follows and as in the previous year, we had plenty of time for discussion at lunch time and tea breaks and the traditional celebratory cake at the end of the week.

9:00-9:45                 Tea & Coffee in the Barn – Arrival of participants

DAY 1 – SEQUENCING TECHNOLOGIES AND A UNIX PRIMER

The first day will start with an introduction to the various high-throughput sequencing (HTS) technologies available and the ways in which samples are prepared, with an emphasis on how this impacts the bioinformatic analyses. The rest of the first day and the second day will aim to familiarize participants with the command line and useful UNIX commands, in order to empower them to automate various analyses.

9:45-10:00           Welcome and introductions – Massimo Palmarini and Joseph Hughes
MassimoWelcomeOIE2016
10:00-10:45        Next-generation sequencing technologies – Ana Da Silva Filipe
AnaSequencingTechnology
10:45-11:15        Examples of HTS data being used in virology – Richard Orton
11:15:11:30            Short break
11:30-11:45        Introduction to Linux and getting started – Sreenu Vattipally
SreenuExplaining
11:45-12:30        Basic commands – Sreenu Vattipally
12:30-13:30            Lunch break in the Barn followed by a guided tour of the sequencing facility with Ana Da Silva Filipe
13:30-14:30        File editing in Linux – Sreenu Vattipally & Richard Orton
14:30-15:30        Text processing – Sreenu Vattipally & Richard Orton
15:30-16:00            Tea & Coffee in the Barn Room
16:00-17:30        Advanced Linux commands – Sreenu Vattipally
DAY 2 – UNIX CONTINUED AND HOW TO DO REFERENCE ASSEMBLIES

The second day will continue with practicing UNIX commands and learning how to run basic bioinformatic tools. By the end, participants will be able to analyse HTS data using various reference assemblers and will be able to automate the processing of multiple files.

9:30-11:00           BASH scripting (conditions and loops) – Sreenu Vattipally
11:00-11:30            Tea & Coffee in the Barn Room
11:30-12:15        Introduction to file formats (fasta, fastq, SAM, BAM, vcf) – Sreenu Vattipally & Richard Orton
12:15-13:00        Sequence quality checks – Sreenu Vattipally & Richard Orton
13:00-14:00            Lunch break in the Barn followed by a guided tour of the sequencing facility with Ana Da Silva Filipe
14:00-14:45        Introduction to assembly (BWA and Bowtie2)– Sreenu Vattipally & Richard Orton
14:45-15:30        More reference assembly (Novoalign, Tanoti and comparison of mapping methods) – Sreenu Vattipally & Sejal Modha
15:30-16:00            Tea & Coffee in the Barn Room
16:00-17:30        Post-processing of assemblies and visualization (working with Tablet and Ugene and consensus sequence generation) – Sreenu Vattipally & Sejal Modha
DAY 3 – HOW TO DO VARIANT CALLING AND DE NOVO ASSEMBLY

The third day will start with participants looking at variant calling and quasi-species characterisation. In the afternoon, we will use different approaches for de novo assembly and also provide hands-on experience.

9:30-11:00           Error detection and variant calling – Richard Orton
RichardHelping
11:00-11:30            Tea & Coffee in Barn Room
11:30-13:00        Quasi-species characterisation – Richard Orton
13:00-14:00            Lunch break in the Lomond Room with an informal presentation of Pablo Murcia’s research program.
PabloPontificating
14:00-14:45        De novo assemblers – Sejal Modha
14:45-1:30           Using different de novo assemblers (e.g. idba-ud, MIRA, Abyss, Spades) – Sejal Modha
15:30-16:00            Tea & Coffee in the Barn
16:00-17:30        Assembly quality assessment, merging contigs, filling gaps in assemblies and correcting errors (e.g. QUAST, GARM, scaffold builder, ICORN2, spades) – Sejal Modha
DAY 4 – METAGENOMICS AND HOW TO DO GENOME ANNOTATION

On the fourth day, participants will look at their own assemblies in more detail, and will learn how to create a finished genome with gene annotations. A popular metagenomic pipeline will be presented, and participants will learn how to use it. In the afternoon, the participants will build their own metagenomic pipeline putting in practice the bash scripting learnt during the first two days.

9:30-10:15           Finishing and annotating genomes – Andrew Davison & Sejal Modha
10:15-11:00        Annotation transfer from related species – Joseph Hughes
11:00-11:30            Tea & Coffee in the Barn
11:30-12:15        The MetAMOS metagenomic pipeline – Sejal Modha & Sreenu Vattipally
13:00-14:00            Lunch break in Lomond Room with informal presentation of Roman Biek’s research program.
Roman
14:00-15:30        Practice in building a custom de novo pipeline – Sejal Modha & Sreenu Vattipally
SejPresenting
15:30-16:00            Tea & Coffee in the Barn
16:00-17:30        Practice in building a custom de novo pipeline – Sejal Modha
GroupPhotoOIE2016
17:30                         Group photo followed by social evening and Dinner at the Curler’s Rest (http://www.thecurlersrestglasgow.co.uk). 
CurelersDinner
DAY 5 – PRIMER IN APPLIED PHYLOGNETICS

On the final day, participants will combine the the consensus sequences generated during day two with data from Genbank to produce phylogenies. The practical aspects of automating phylogenetic analyses will be emphasised to reinforce the bash scripting learnt over the previous days.

9:30-10:15           Downloading data from GenBank using the command line – Joseph Hughes & Sejal Modha
10:15-11:00        Introduction to multiple sequence alignments – Joseph Hughes
11:00-11:30            Tea & Coffee in the Barn
11:30-1300         Introduction to phylogenetic analysis – Joseph Hughes
13:00-14:00            Lunch break in the Lomond Room with a celebratory cake
OIEcake2016
14:00-15:30        Analysing your own data or developing your own pipeline – Whole team available
15:30-16:00            Tea & Coffee in the Barn
16:00-17:00        Analysing your own data or developing your own pipeline – Whole team available
17:00                       Goodbyes
We wish all the participants lots of fun with their new bioinformatic skills.

If you are interested in finding out about future course that we will be running, please fill in the form with your details.

1st Viral Bioinformatics and Genomics Training Course

SRE_9674

The first Viral Bioinformatics and Genomics training course held at the University of Glasgow was completed successfully by 14 delegates (nine external and five internal) on 10-14 August 2015. The course took place in the McCall Building computer cluster, and the adjacent Lomond and Dumgoyne Rooms were used for refreshments and lunch.

Instructors:
Joseph Hughes (Course Organiser)
Andrew Davison
Robert Gifford
Sejal Modha
Richard Orton
Sreenu Vattipally
Gavin Wilkie

9:00-10:00 Tea & Coffee in Dumgoyne Room – Arrival of participants
DAY 1 – SEQUENCING TECHNOLOGIES AND A UNIX PRIMER

Day one will introduced the participant to the different sequencing technologies available, the ways in which samples are prepared with an emphasis on how this impacts the bioinformatic analyses. The rest of the day aimed to familiarize the researcher with the command line and useful UNIX commands.

IMG_20150814_121005

10:00-10:45 Next-generation sequencing technologies – Gavin Wilkie
10:45-11:30 Examples of HTS data being used in virology – Richard Orton
11:30-11:45 Brief introduction and history of Unix/Linux – Sreenu Vattipally and Richard Orton
11:45-12:30 The command line anatomy and basic commands – Sreenu Vattipally and Richard Orton

12:30-13:30 Lunch break in Dumgoyne Room

13:30-14:30 Essential UNIX commands – Sreenu Vattipally and Sejal Modha
14:30-15:30 Text processing with grep – Sreenu Vattipally and Sejal Modha
15:30-16:00 Tea & Coffee in Dumgoyne Room
16:00-17:30 sed and awk – Sreenu Vattipally and Sejal Modha

DAY 2 – UNIX CONTINUED AND HOW TO DO REFERENCE ASSEMBLIES

The participant continued to practice UNIX commands and learn how to run basic bioinformatic tools. By the end of the day, the participant were able to analyse HTS data with different reference assemblers and automate the processing of multiple files.

9:30-10:15 Introduction to file formats (fasta, fastq, SAM, BAM, vcf) – Sreenu Vattipally and Richard Orton
10:15-11:00 Quality scores, quality control and trimming (Prinseq and FastQC and trim_galore) – Sreenu Vattipally and Richard Orton
11:00-11:30 Tea & Coffee in Dumgoyne Room
11:30-13:00 Introduction to alignment and assembly programs (Blast, BWA, Bowtie) – Sreenu Vattipally and Richard Orton

13:00-14:00 Lunch break in Dumgoyne Room

14:00-14:45 Continuation of alignment and assembly programs (Novoaligner, Tanoti) – Sreenu Vattipally and Richard Orton
14:45-15:30 Post-assembly processing and alignment visualization (Tablet and UGENE) – Sreenu Vattipally and Richard Orton
15:30-16:00 Tea & Coffee in Dumgoyne Room
16:00-16:45 Workflow design and automation – Sreenu Vattipally and Richard Orton
16:45-17:30 Loops and command line arguments – Sreenu Vattipally and Richard Orton

DAY 3 – HOW TO DO DE NOVO ASSEMBLY AND USE METAGENOMICS PIPELINES

The third day covered the different approaches used for de-novo assembly and provided hands-on experience. A popular metagenomic pipeline was presented and participants learned how to use it as well as create their own pipeline. (Slides and practical)

IMG_20150812_113351

9:30-10:15 De-novo assemblers – Sejal Modha and Gavin Wilkie
10:15-11:00 Using different de-novo assemblers (e.g. Meta-idba, Edena, Abyss, Spades) – Sejal Modha and Gavin Wilkie

11:00-11:30 – Tea & Coffee in Lomond Room
11:30-13:00 Scaffolding contigs, filling gaps in assemblies and correcting errors (e.g. phrap, gap_filler, scaffold builder, ICORN2, spades) – Gavin Wilkie and Sejal Modha

13:00-14:00 Lunch break in Lomond Room

14:00-14:45 Practice building a custom de-novo pipeline (BLAST, KronaTools)
14:45-15:30 the MetAmos metagenomic pipeline – Sejal Modha and Sreenu Vattipally
15:30-16:00 Tea & Coffee in Lomond Room
16:00-17:30 Analysis using the pipeline – Sejal Modha and Sreenu Vattipally

DAY 4 – SEQUENCING TECHNOLOGIES AND HOW TO DO GENOME ANNOTATION

Day four gave the participant the opportunity to look into more detail at their assembly, they learned how to create a finished curated full genome (emphasis on finishing) with gene annotations and analysed the variation within their sample

9:30-11:00 Finishing and Annotating genomes – Andrew Davison, Gavin Wilkie and Sreenu Vattipally
11:00-11:30 Tea & Coffee in Dumgoyne Room
11:30-13:00 Annotation transfer from related species – Gavin Wilkie, Andrew Davison and Sreenu Vattipally

13:00-14:00 Lunch break in Dumgoyne Room

IMG_20150813_141642

14:00-15:30 Error detection and variant calling – Richard Orton
15:30-16:00 Tea & Coffee in Dumgoyne Room
16:00-17:30 Quasi-species characterisation – Richard Orton

17:30 Social evening in the pub/restaurant at Curlers on Byres Road organised by Richard Orton (http://www.thecurlersrestglasgow.co.uk). 

 

DAY 5 – PRIMER IN INVESTIGATING PATHWAYS OF VIRAL EVOLUTION

Researchers worked through several practical examples of using sequences in virology and spent the remaining time analysing their own data with the teachers help.

9:30-10:15 Identifying mutations of interest for individual sequences and within a set of sequences – Robert Gifford
10:15-11:00 Combining phylogenies with traits – Robert Gifford
11:00-11:30 Tea & Coffee in Dumgoyne Room
11:30-12:15 Investigating epidemiology (IDU example) – Robert Gifford
12:15-13:00 Investigate transmission of drug resistance – Robert Gifford

13:00-14:00 Lunch break in Dumgoyne Room

14:00-15:30 Analysing your own data or developing your own pipeline
15:30-16:00 Tea & Coffee in Dumgoyne Room
16:00-17:00 Analysing your own data or developing your own pipeline
17:00 Goodbyes

IMG_20150814_130750

 

If you would like to be informed of similar training courses run in the future, please fill in your details here.

 

 

Adding a new assembler in MetAMOS

MetAMOS is a modular metagenomic analysis pipeline that can be used to automate metagenomic assembly, annotation and scaffolding analysis. Further information and published paper about MetAMOS can be found here and documentation about the pipeline can be found here.

One of the biggest advantages of using the MetAMOS framework is the modularity of the framework. It allows users to add new programs at various steps of the pipeline such as annotation and assembly.

Here, I discuss how to add a new assembler within the framework. I am going to use metacortex as an example assembler that we wanted to incorporate in the CVR’s MetAMOS pipeline.

The chart below show the steps to add a new assembler into the MetAMOS pipeline.

Add a new assembler in MetAMOS

Steps to add new assembler in MetAMOS pipeline

To add a new assembler, add the name of the assembler in the ASSEMBLE.generic file. This file can be found here metAMOS/Utilities/config/ASSEMBLE.generic

It is also possible to add various version of the same program in the list as long as the names of the programs are different, e.g. soap_v1, soap_v2 etc.

Writing a configuration file is the most important part of adding a new assembler to the pipeline. The configuration file is the place where we specify various parameters for the assembly software including name, location, input, output, commands to run etc. The configuration file has several reserved keywords. A list of currently available reserved keywords has been described here.

The configuration file for metacortex looks like this:

The program configuration can be specified here in the [CONFIG] section. The “commands” keyword is used to specify the commands that need to run the actual program. Multiple commands can be separated using && sign. One can also use the reserved keywords such as [FIRST], [SECOND], [KMER], [PREFIX] etc to specify the input and output details. These keywords are identified by MetAMOS and are consistent throughout the framework.

input Input type – fasta, fastq etc
name Program name that you want to report
output Output file name
location Location of executable
paired Type of paired end read and how to pass it. You can use reserved keywords such as [FIRST] and [SECOND] here
commands Actual commands that need to be run using the executable specified at the location specified in the location

After preparing this file and following the steps mentioned in the figure 1, metAMOS needs to be built again. To rebuild the metAMOS executable:

1) Delete .pyc and .c files from the metAMOS-master/src/ folder
2) Delete the executables initPipeline and runPipeline.
Do not delete the .py scripts.
To rebuild metAMOS run the python script setup.py. This should generate the .pyc and .c files in the src/ directory and should also create the initPipeline and runPipeline executable.

When I tried running this newly added metacortex assembler within the pipeline, it gave me the following error.

I posted this on the github forum for help and MetAMOS developers (Todd J Treangen and Chris Hill) asked me to run the following command and provide my config file:

I got the following output for the grep command:

Sergey Koren, one of the developers of metAMOS , pointed out that the error was not due the configuration file but was because of the ‘echo’ command used in the file. MetAMOS has a list of allowed system commands, as echo was not specified in that list, MetAMOS could not recognize the command. System commands are specified in metAMOS-master/src/generic.py

Line #13 SYSTEM_COMMANDS

As highlighted above, the ‘echo’ command was added to this list, and MetAMOS executable was rebuilt again for the changes to take effect.

Job done, metacortex is now part of our in-house customized MetAMOS.

Bioinformatician at CVR.
http://bioinformatics.cvr.ac.uk/sejal.php