Recombination detection programs

A critical step before phylogenetic analysis and molecular selection analysis is to detect recombination and either remove recombinant sequences or partition the alignment into different spans that a recombination-free. The problem is that there are so many different recombination detection programs available. These have been nicely review by Posada (2002) and some of the programs […]

Read More

Essential AWK Commands for Next Generation Sequence Analysis

Here are the few essential awk command line scripts for next generation sequence analysis. Users need latest version of gawk to run commands with bitwise operations. Most of the Linux distributions come with gawk. However OSX users have to install it from here http://rudix.org/packages/gawk.html Count number of reads in a FastQ file

Convert FastQ […]

Read More

Why and how to use biomaRt?

The bioinformatics work includes the gene annotation work. In recent years more and more biological data has become available.  Meanwhile, how to get the access these valuable data resources and analyse the data is important for comprehensive bioinformatics data analysis. The biomaRt is a very useful tool to achieve that. Now there are two questions: […]

Read More

A simple method to distinguish low frequency variants from Illumina sequence errors

RNA viruses have high mutation rates and exist within their hosts as large, complex and heterogeneous populations, comprising a spectrum of related but non-identical genome sequences. Next generation sequencing has revolutionised the study of viral populations by enabling the ultra deep sequencing of their genomes, and the subsequent identification of the full spectrum of variants […]

Read More

Short command lines for manipulation FASTQ and FASTA sequence files

I thought it was time for me to compile all the short command that I use on a more or less regular basis to manipulate sequence files. Convert a multi-line fasta to a singleline fasta

  To convert a fastq file to fasta in a single line using sed

  Dirty way to […]

Read More

Illumina adapter and primer sequences

Illumina Adapter and Primer Sequences Illumina libraries are normally constructed by ligating adapters to short fragments (100 – 1000bp) of DNA. The exception to this is if Nextera is used (see end of this post) or where PCR amplicons have been constructed that already incorporate the P5/P7 ends that bind to the flowcell. Illumina Paired […]

Read More

Trie Data Structure

In Computer Science, a trie is a data structure that is also known as a digital search tree or a prefix tree. It can be used for fast retrieval on large data sets such as looking up words in a dictionary. The term trie was invented from the phrase ‘Information Retrieval’ by Fredkin(1960). As a […]

Read More

Calculating dNdS for NGS datasets

vNvS Our upcoming tool vNvS calculates the dN/dS ratio at each site, codon and also for the sample as a whole, here is an explanation of the theory behind it. vNvS is currently in development – for more information email Richard.Orton@glasgow.ac.uk dN/dS dN/dS is the ratio of the number of nonsynonymous substitutions per non-synonymous site (pN) […]

Read More

Parsing PubMed for email addresses in author affiliations

USE THE FOLLOWING RESPONSIBLY PLEASE! Recently, we wanted to send out a survey for the International Committee on Taxonomy of Viruses (ICTV) to a large number of authors who have recently published in a virology journal. Fortunately, PubMed stores author affiliations and the email address is also sometimes present in the affiliation. We decided to […]

Read More

Adding a new assembler in MetAMOS

MetAMOS is a modular metagenomic analysis pipeline that can be used to automate metagenomic assembly, annotation and scaffolding analysis. Further information and published paper about MetAMOS can be found here and documentation about the pipeline can be found here. One of the biggest advantages of using the MetAMOS framework is the modularity of the framework. […]

Read More