ViCTree is a bioinformatics framework that automatically selects new candidate virus sequences from GenBank, generates multiple sequence alignments, calculates a maximum likelihood phylogeny, and is capable of automatically building new phylogenies when new data is available on GenBank.

How it works?

Workflow of ViCTree pipeline

Initially, all known protein sequences available in GenBank are downloaded, avoiding filtering based on GenBank annotations. BLAST is then used to compare these sequences with a seed set consisting of a curated set of sequences spanning the known diversity of the family. Significant matches are extracted based on the e-value and a pre-defined length parameter, followed by multiple sequence alignment and RAxML maximum likelihood tree generation. These data can be updated when new sequences are available in GenBank and all versions of the tree and alignments are retained for future reference using version control. The latest version of the tree is submitted to an interactive online tree visualisation tool, which combines the tree with pairwise distance data and enables the user to employ filtering based on defined distance cut-off values. The pipeline is currently set up for the Herpesviridae and Parvoviridae families but is flexible and can be adapted for any virus family.

Installation and Example

Authors and Contributors

ViCTree framework is developed by :

Sejal Modha (@sejmodha), Anil Thanki (@anilthanki) and Joseph Hughes (@josephhughes).