Convert NCBI Protein GI to Genome Accession

A few days back I posted a question on BioStars about getting genome accession numbers for a list of protein GIs. I had a long list of protein GI and I wanted the genome accession number for each protein GI (if there is one in NCBI databases) but without downloading files for each protein GI in genbank or xml format.

One way to  do this is use db2db. However you can only use db2db if you have a list of protein accession number for the protein GIs of interest. Also I wanted to include this step as part of a pipeline and automate it. db2db is a web based approach so doesn’t allow for easy automation.

I wrote the following script that first uses NCBI utilities to convert the list of protein GI to nucleotide GI and then fetches genome accession numbers for those nucleotide GIs.

This script will take a file with the list of protein GI as an input and can be run as

where test_gi_list file contains the following protein GIs

752901017
675510735
674269272
360086542

This command should provide the following output.

Bioinformatician at CVR.
http://bioinformatics.cvr.ac.uk/sejal.php