DisCVR: Rapid viral diagnosis from HTS data

DisCVR is a computer program which allows diagnosticians to detect known human viruses in clinical samples from High-Throughput Sequencing (HTS) data. It works by creating a database of short nucleotide sequences, called k-mers, which are extracted from viral genomes. k-mers of 22 bases in length are generated by sliding a window along a sequence, 1 nucleotide base at a time. Only unique k-mers from a set of viruses are included in the database and assigned taxonomic labels. To investigate a patient sample sequenced using HTS, the database is queried to find exact matches with k-mers from each read in the sample. A list of all viruses found in the sample is shown and reference-based assembly can be used to show depth and coverage of data in relation to a reference genome, in order to assess significance of matches.

DisCVR is a fast and accurate tool designed to analyse HTS data and validate the results interactively on computers with limited resources. At present, DisCVR is a human viral diagnostic tool, but it could be extended to include non-viral human pathogens as well as pathogens of other hosts.

DisCVR Operating Manual

An overview of the GUI menu can be found here.

The manual can be found here.

Test samples

  • Three respiratory samples are provided to test run DisCVR. They can be dowloaded here (166Mb).


  • Kanalyze: A Fast Versatile Pipelined K-mer Toolkit, Bioinformatics 30(14) (2014) 2070-2072
  • TANOTI: A rapid BLAST-guided read mapper for small, divergent genomes.(manuscript communicated)
  • JFreechart: Free Java chart library


Please send your comments, suggestions or bug reports to Joseph Hughes.