NCBI Entrez Direct UNIX E-utilities

I use NCBI Entrez Direct UNIX E-utilities regularly for sequence and data retrieval from NCBI. These UNIX utils can be combined with any UNIX commands.

It is available to download from the NCBI website: ftp://ftp.ncbi.nlm.nih.gov/entrez/entrezdirect/

A few useful examples for NCBI edirect utilities.

Download a sequence in fasta format from NCBI using accession number

Batch retrieval for all proteins for taxon ID. This example will download all proteins for viruses in fasta format.

Download sequences infasta format from NCBI using edirect using isolate info

Download sequences from NCBI using edirect using bioproject accession or ID

Get all CDS from a genome

Get taxonomy ID from protein accession number

Get taxonomy ID from accession number using esummary

Get full lineage from accession number
Tip : xtract can be used to fetch any element from the xml output

Get scientific name from accession number

Download all refseq protein sequences for viruses

Download reference genome sequence from taxonomy ID
Note: Using efilter command

Get all proteins from a genome accession

Extract genome accession from protein accession – DBSOURCE attribute in genbank file and an alternative to the script mentioned in one of my earlier blog post.
Note: Following command would work with protein accession and GIs used as -id parameter in elink command.

More info about NCBI Entrez Direct E-utillities is available on the NCBI website. http://www.ncbi.nlm.nih.gov/books/NBK179288/

Bioinformatician at CVR.
http://bioinformatics.cvr.ac.uk/sejal.php