Monthly Archives: February 2015

Short command lines for manipulation FASTQ and FASTA sequence files

I thought it was time for me to compile all the short command that I use on a more or less regular basis to manipulate sequence files.

Convert a multi-line fasta to a singleline fasta

 

To convert a fastq file to fasta in a single line using sed

 

Dirty way to count the number of sequences in a fastq

It’s dirty because sometimes the quality information line may also start with “@” so the number of sequences could be overestimated.

A more precise way is to count the lines and divide by four:

One liner to remove the description information from a fasta file and just keep the identifier

 

Get all the identifier names from a fasta file

 

Extract sequences by their ID from a fasta file
For example, you want to get the sequences with id1 and id2 as identifiers

If you have a long list of identifiers in a file called ids.txt, then the following should do the trick:

 

Convert from a two column text tab-delimited file (ID and sequence) to a fasta file

 

Get the length of a fasta sequence (the sequence must in singleline)

 

I’ll update this when I find some more useful single line commands for manipulation fastq and fasta files.

Please post comments if you have some suggestions.

 

Continue reading