DisCVR: Rapid viral diagnosis from HTS data
DisCVR is a computer program which allows diagnosticians to detect known human viruses in clinical samples from High-Throughput Sequencing (HTS) data. It works by creating a database of short nucleotide sequences, called k-mers, which are extracted from viral genomes. k-mers of 22 bases in length are generated by sliding a window along a sequence, 1 nucleotide base at a time. Only unique k-mers from a set of viruses are included in the database and assigned taxonomic labels. To investigate a patient sample sequenced using HTS, the database is queried to find exact matches with k-mers from each read in the sample. A list of all viruses found in the sample is shown and reference-based assembly can be used to show depth and coverage of data in relation to a reference genome, in order to assess significance of matches.
DisCVR is a fast and accurate tool designed to analyse HTS data and validate the results interactively on computers with limited resources. At present, DisCVR is a human viral diagnostic tool, but it could be extended to include non-viral human pathogens as well as pathogens of other hosts.