Exploring the FAST5 format
- Post by: Joseph Hughes
- July 19, 2017
- 2 Comments
FAST5 format from Oxford Nanopore (ONT) is in fact HDF5, which is a very flexible data model, library, and file format for storing and managing data. It is able to store an unlimited variety of datatypes.
A number of tools have been developed for handling HDF5 available from here. The most useful are:
- hdfview, a java visual tool for viewing HDF5 files with some limited functionality of plotting data and the option of exporting subsets in HDF5 (extension .h5)
- h5ls, for listing specified entries of the HDF5 file
- h5dump, to examine the HDF5 file and export specified groups or datasets in ASCII.
Here’s a run through exploring the lambda phage control run. First off, looking at the FAST5 file produced by the MinION.
At this stage, the FAST5 file only has one dataset which is the “Signal” dataset.
The same thing, on a FAST5 file, which has been processed by Metrichor, now has a lot more associated information, notably Fastq, Events, various Log files for the different analyses and still contains the raw Signal dataset.
hdfview /home3/ont/lambda_fc1/downloads/pass/vgb_20170110_FNFAB46402_MN19940_sequencing_run_lambdacontrol_10012017_23602_ch9_read939_strand.fast5 &
To list all groups recursively using h5ls use -r:
h5ls -r /home3/ont/lambda_fc1/uploaded/vgb_20170110_FNFAB46402_MN19940_sequencing_run_lambdacontrol_10012017_23602_ch9_read984_strand.fast5
Similar information can be obtained using h5dump -n:
h5dump -n /home3/ont/lambda_fc1/downloads/pass/vgb_20170110_FNFAB46402_MN19940_sequencing_run_lambdacontrol_10012017_23602_ch9_read939_strand.fast5
To get all data and metadata for a given group /Raw/Reads/Read_939:
h5dump -g /Raw/Reads/Read_939 /home3/ont/lambda_fc1/downloads/pass/vgb_20170110_FNFAB46402_MN19940_sequencing_run_lambdacontrol_10012017_23602_ch9_read939_strand.fast5
Or, the following is similar without the group tags. The -d option is used for printing a specified dataset.
Removing the array indices using option -y:
h5dump -y -d /Raw/Reads/Read_939/Signal /home3/ont/lambda_fc1/downloads/pass/vgb_20170110_FNFAB46402_MN19940_sequencing_run_lambdacontrol_10012017_23602_ch9_read939_strand.fast5
Saving the raw Signal dataset to file “test”:
h5dump -o test -y -d /Raw/Reads/Read_939/Signal /home3/ont/lambda_fc1/downloads/pass/vgb_20170110_FNFAB46402_MN19940_sequencing_run_lambdacontrol_10012017_23602_ch9_read939_strand.fast5
The same as the above but specifying that the column width of the dataset is 1 with the option -w 1:
h5dump -w 1 -o test -y -d /Raw/Reads/Read_939/Signal /home3/ont/lambda_fc1/downloads/pass/vgb_20170110_FNFAB46402_MN19940_sequencing_run_lambdacontrol_10012017_23602_ch9_read939_strand.fast5
Dumping the whole FAST5 into XML format:
h5dump --xml /home3/ont/Toledo_DeltaMerlin/pass/vgb_20170201_FNFAB45374_MN19940_sequencing_run_Toledo_DeltaMerlin_010217_3_98936_ch99_read985_strand.fast5
O.K., that it for now.