Tag Archives: PubMed

Recombination detection programs

A critical step before phylogenetic analysis and molecular selection analysis is to detect recombination and either remove recombinant sequences or partition the alignment into different spans that a recombination-free. The problem is that there are so many different recombination detection programs available. These have been nicely review by Posada (2002) and some of the programs have been benchmarked by Kosakovsky Pond and Frost (2005).

I decided to compile all the programs that are available but there are so many that I am sure I have missed some. If you know of any others, please leave a comment below.

I have split the programs into the same four different categories as Posada (2002) but some programs implement multiple methods so it gets a bit tricky. The size of the font relates to the number of citations each program/publication has received in Google Scholar. As you can see, there are some clear favourites.

You can click on the names of the programs and this should either take you to the publication abstract on Pubmed or to the website for the software.


Parsing PubMed for email addresses in author affiliations


Recently, we wanted to send out a survey for the International Committee on Taxonomy of Viruses (ICTV) to a large number of authors who have recently published in a virology journal. Fortunately, PubMed stores author affiliations and the email address is also sometimes present in the affiliation. We decided to target the following journals: Journal of Virology; Journal of General Virology, Virology, Virus Research, Antiviral Research, Viruses and Journal of Medical Virology. A lot of the difficult work can be done using E-utilities to generate the URL for the search. As we may be retrieving a large number of emails, we need to retrieve the results from the URL query in batches. We then want to extract the affiliations and the emails from the affiliations using:

As we didn’t want to send all the emails off in one go, we split the output into multiple batches of 100 emails.   Here’s the full code also available as a Gist on Github:

Here are the email counts: Journal of Virology = 634: Journal of General Virology = 169; Virology = 546: Virus Research = 425; Antiviral Research = 252; Viruses = 892; Journal of Medical Virology = 0.

The Journal of Medical Virology doesn’t release the email addresses of authors and if the information is not used responsibly, then a number of other journals might go that way to as discussed in “E-mail Address Harvesting on PubMed—A Call for Responsible Handling of E-mail Addresses“.

If you re-run this script, you might find a few more hits as more papers get published this year.