Monthly Archives: November 2017

A compilation of conversion tools for BED, SAM/BAM, psl, pslx, blast tabular and blast xml

A wide range of formats exist for representing the comparisons of different sequences to each other: blast tabular, blast xml, psl, pslx, SAM/BAM, BED
Most of these formats can be converted from one format to another. Sometimes the format is lossless allowing for the original data to be perfectly converted
without the loss of information. Other times, the format conversion is lossy permitting the conversion of only part of the original data resulting in the loss
of some information.

Here, I have compiled the tools or UNIX commands necessary for converting from one file format to another. As you can see, I am still needing to compete some of the gaps so please let me know of any other tools which are missing.

The command is shown in full below the table.

Conversion From Row/To Col blast-xml blast-tab psl pslx SAM/BAM BED
blast-xml N/A blast2tsv.xsl blastXmlToPsl blastXmlToPsl -pslx blast2bam BLAST_to_BED
blast-tab perl blast2xml.pl N/A ?? ?? ?? blast2bed
psl ?? ?? N/A pslToPslx psl2sam.pl pslToBed
psl2BED
pslx ?? ?? ?? N/A ?? pslToBed
SAM/BAM ?? ?? sam2psl.py sam2psl.py -s samtools view bedtools bamtobed
BED ?? ?? bed2psl ?? bedtools bedtobam N/A

Convert from SAM to psl using sam2psl.py

Available from: https://github.com/ndaniel/fusioncatcher/blob/master/bin/sam2psl.py

Example command:

This is a lossless format conversion with the -s option , however the sequence as a read is no longer supported in the psl format.

Usage:

Convert from psl to SAM

Available from the samtools legacy scripts: https://github.com/lh3/samtools-legacy/blob/master/misc/psl2sam.pl

Example command:

This ends up being a lossy conversion as the read sequence is not in the output.

Usage

The options are used to calculate a blast like scoring see post:
https://www.biostars.org/p/44281/


Convert psl to pslx

Using https://github.com/ENCODE-DCC/kentUtils/tree/master/bin/linux.x86_64

This is a lossless conversion. For usage:

 

Convert SAM to fasta

 

Convert psl to BED

Option 1:

Using pslToBed from https://github.com/ENCODE-DCC/kentUtils/tree/master/bin/linux.x86_64

This is a lossless conversion as the standard psl doesn’t have the sequence and so the bed file doesn’t either.

Usage:

 

Option 2 as suggested by Alex Reynolds:

Using psl2bed from http://bedops.readthedocs.io/en/latest/content/reference/file-management/conversion/psl2bed.html

This is also lossless when used with –keep-header:

Example:

As a bonus, it uses sort-bed to make a sorted BED file, so that it is ready to use with bedops, bedmap, etc.

Usage:

 

Convert pslx to BED

Using https://github.com/ENCODE-DCC/kentUtils/tree/master/bin/linux.x86_64

This is a lossy conversion as the sequence is lost

Usage:

 

Convert BAM to BED

Using bedtools: http://bedtools.readthedocs.io/en/latest/content/tools/bamtobed.html

This is a lossy conversion as the sequence data is lost.

Usage:

 

Convert BED to BAM

Create the genome file for bed

 

Using the genome file and BED file to produce the BAM file.

Usage:

 

The sequence is not present in the BED file so is absent from BAM as well. This is a lossy format conversion.
Additionally, there are differences in the number of read compared to the original file.

Convert BED to psl

Using https://github.com/ENCODE-DCC/kentUtils/tree/master/bin/linux.x86_64

This is a lossless conversion as neither the BED nor psl contain sequence information

 

Usage:

 

Preparing blast-xml format

Preparing blast-tab format

Blast-xml to psl

Using https://github.com/ENCODE-DCC/kentUtils/tree/master/bin/linux.x86_64

This is a lossy conversion

However, if you use the -pslx option, you can get lossless conversion

Usage:

 


Converting from blast-xml to SAM/BAM

Using https://github.com/guyduche/Blast2Bam

This is a lossless conversion with sequence and read quality introduced.

 

Usage

Subsequently converted to BAM using samtools

 

Blast-xml to BED

Using https://github.com/mojaveazure/BLAST_to_BED

Command

This is a lossy conversion as the sequence information is lost.

 

Converting Blast tabular to blast-xml

Not completely possible due to missing information such as the alignment but see post https://www.biostars.org/p/7981/#7983
Or using the script blast2xml.pl from the blast2go google group: https://11302289526311073549.googlegroups.com/attach/ed2c446e1b1852a9/blast2xml.pl?part=0.1&view=1&vt=ANaJVrEJYYa7SZC-uvOtoKb6932qlMJWltc2p_5GrTK5Wi7jo-hw14zFroKEcLhdNcJUcQweoUJOuXk2H7wQB5q6mzDTTn211hC2OvwiWw0b5PZev-HQ7Qg

Command:

Usage

 

Convert blast-xml to blast tabular

Several approaches have been suggested here https://www.biostars.org/p/7290/ but the most straight forward I have found is using the style sheet blast2tsv.xml from here:
https://github.com/lindenb/xslt-sandbox/blob/master/stylesheets/bio/ncbi/blast2tsv.xsl. This is a lossless conversion but is not a standardly formatted blast-tabular output as it contains the sequence and the aligned site information in the last two columns.

Command:

 

Blast tabular to BED

Using https://github.com/nterhoeven/blast2bed

The output will be in test.blasttab.bed, this is a lossless conversion neither blast-tabular nor BED have the sequence.

Usage

 


Converting SAM to BAM

Using samtools http://quinlanlab.org/tutorials/samtools/samtools.html
This is a lossless format conversion.

Command


Converting BAM to SAM using samtools

This is a lossless format conversion.

 

PS: I dedicate this tutorial to Sej, a great bioinformatician and friend 😉