Blog

Presentation at the 11th OIE Seminar at WAVLD in Saskatoon, 17th June 2015

June 17, 2015
No Comment

Here are the slides of my talk in Saskatoon if anybody is interested. The Saskatchewan, like the Glaswegians, are friendly but sometimes hard to understand. They certainly have trouble understanding me. [slideshare id=49505335&doc=oiepresentation-150617124835-lva1-app6891]

June 12, 2015
2 Comments

Basic Local Alignment Search Tool (BLAST) is one of the most commonly used programs for sequence classification using similarity search. Standalone BLAST can be setup easily on the local server. More info about how to set it up on a local Linux server can be found here: http://www.ncbi.nlm.nih.gov/books/NBK52640/ In our lab, all our servers run […]

June 5, 2015
2 Comments

A critical step before phylogenetic analysis and molecular selection analysis is to detect recombination and either remove recombinant sequences or partition the alignment into different spans that a recombination-free. The problem is that there are so many different recombination detection programs available. These have been nicely review by Posada (2002) and some of the programs […]

April 30, 2015
2 Comments

Here are the few essential awk command line scripts for next generation sequence analysis. Users need latest version of gawk to run commands with bitwise operations. Most of the Linux distributions come with gawk. However OSX users have to install it from here http://rudix.org/packages/gawk.html Count number of reads in a FastQ file awk ‘END{print NR/4}’ […]

March 23, 2015
1 Comment

The bioinformatics work includes the gene annotation work. In recent years more and more biological data has become available. Meanwhile, how to get the access these valuable data resources and analyse the data is important for comprehensive bioinformatics data analysis. The biomaRt is a very useful tool to achieve that. Now there are two questions: […]

March 23, 2015
No Comment

RNA viruses have high mutation rates and exist within their hosts as large, complex and heterogeneous populations, comprising a spectrum of related but non-identical genome sequences. Next generation sequencing has revolutionised the study of viral populations by enabling the ultra deep sequencing of their genomes, and the subsequent identification of the full spectrum of variants […]

February 23, 2015
9 Comments

I thought it was time for me to compile all the short command that I use on a more or less regular basis to manipulate sequence files. Convert a multi-line fasta to a singleline fasta awk ‘!/^>/ { printf “%s”, $0; n = “\n” } /^>/ { print n $0; n = “” } END […]

January 26, 2015
17 Comments

Illumina Adapter and Primer Sequences Illumina libraries are normally constructed by ligating adapters to short fragments (100 – 1000bp) of DNA. The exception to this is if Nextera is used (see end of this post) or where PCR amplicons have been constructed that already incorporate the P5/P7 ends that bind to the flowcell. Illumina Paired […]

November 17, 2014
2 Comments

In Computer Science, a trie is a data structure that is also known as a digital search tree or a prefix tree. It can be used for fast retrieval on large data sets such as looking up words in a dictionary. The term trie was invented from the phrase ‘Information Retrieval’ by Fredkin(1960). As a […]

November 16, 2014
22 Comments

vNvS Our upcoming tool vNvS calculates the dN/dS ratio at each site, codon and also for the sample as a whole, here is an explanation of the theory behind it. vNvS is currently in development – for more information email Richard.Orton@glasgow.ac.uk dN/dS dN/dS is the ratio of the number of nonsynonymous substitutions per non-synonymous site (pN) […]

Presentation at the 11th OIE Seminar at WAVLD in Saskatoon, 17th June 2015

Setting up automatic BLAST database update on linux servers

Recombination detection programs

Essential AWK Commands for Next Generation Sequence Analysis

Why and how to use biomaRt?

A simple method to distinguish low frequency variants from Illumina sequence errors

Short command lines for manipulation FASTQ and FASTA sequence files

Illumina adapter and primer sequences

Trie Data Structure

Calculating dNdS for NGS datasets