Exploring the FAST5 format

Post by: Joseph Hughes
July 19, 2017
2 Comments

FAST5 format from Oxford Nanopore (ONT) is in fact HDF5, which is a very flexible data model, library, and file format for storing and managing data. It is able to store an unlimited variety of datatypes.

A number of tools have been developed for handling HDF5 available from here. The most useful are:

hdfview, a java visual tool for viewing HDF5 files with some limited functionality of plotting data and the option of exporting subsets in HDF5 (extension .h5)
h5ls, for listing specified entries of the HDF5 file
h5dump, to examine the HDF5 file and export specified groups or datasets in ASCII.

Here’s a run through exploring the lambda phage control run. First off, looking at the FAST5 file produced by the MinION.

hdfview /home3/ont/lambda_fc1/uploaded/vgb_20170110_FNFAB46402_MN19940_sequencing_run_lambdacontrol_10012017_23602_ch9_read984_strand.fast5

At this stage, the FAST5 file only has one dataset which is the “Signal” dataset.

The same thing, on a FAST5 file, which has been processed by Metrichor, now has a lot more associated information, notably Fastq, Events, various Log files for the different analyses and still contains the raw Signal dataset.

hdfview /home3/ont/lambda_fc1/downloads/pass/vgb_20170110_FNFAB46402_MN19940_sequencing_run_lambdacontrol_10012017_23602_ch9_read939_strand.fast5 &

To list all groups recursively using h5ls use -r:

h5ls -r /home3/ont/lambda_fc1/uploaded/vgb_20170110_FNFAB46402_MN19940_sequencing_run_lambdacontrol_10012017_23602_ch9_read984_strand.fast5

Similar information can be obtained using h5dump -n:

h5dump -n /home3/ont/lambda_fc1/downloads/pass/vgb_20170110_FNFAB46402_MN19940_sequencing_run_lambdacontrol_10012017_23602_ch9_read939_strand.fast5

To get all data and metadata for a given group /Raw/Reads/Read_939:

h5dump -g /Raw/Reads/Read_939 /home3/ont/lambda_fc1/downloads/pass/vgb_20170110_FNFAB46402_MN19940_sequencing_run_lambdacontrol_10012017_23602_ch9_read939_strand.fast5

Or, the following is similar without the group tags. The -d option is used for printing a specified dataset.

/home3/ont/lambda_fc1/downloads/pass/vgb_20170110_FNFAB46402_MN19940_sequencing_run_lambdacontrol_10012017_23602_ch9_read939_strand.fast5

Removing the array indices using option -y:

h5dump -y -d /Raw/Reads/Read_939/Signal /home3/ont/lambda_fc1/downloads/pass/vgb_20170110_FNFAB46402_MN19940_sequencing_run_lambdacontrol_10012017_23602_ch9_read939_strand.fast5

Saving the raw Signal dataset to file “test”:

h5dump -o test -y -d /Raw/Reads/Read_939/Signal /home3/ont/lambda_fc1/downloads/pass/vgb_20170110_FNFAB46402_MN19940_sequencing_run_lambdacontrol_10012017_23602_ch9_read939_strand.fast5

The same as the above but specifying that the column width of the dataset is 1 with the option -w 1:

h5dump -w 1 -o test -y -d /Raw/Reads/Read_939/Signal /home3/ont/lambda_fc1/downloads/pass/vgb_20170110_FNFAB46402_MN19940_sequencing_run_lambdacontrol_10012017_23602_ch9_read939_strand.fast5

Dumping the whole FAST5 into XML format:

h5dump --xml /home3/ont/Toledo_DeltaMerlin/pass/vgb_20170201_FNFAB45374_MN19940_sequencing_run_Toledo_DeltaMerlin_010217_3_98936_ch99_read985_strand.fast5

O.K., that it for now.

Categories: ONT, Uncategorized, UNIX

Tagged: FAST5, HDF5, MinION, ONT

2 Comments on “Exploring the FAST5 format”

Bigleeu says:

February 26, 2018 at 11:12 am

Hi Joseph,

You post is very helpful.
I am new to nanopore. Can you tell me what those values associated with ‘Signal’ are?
Thanks.
Huanlee

Gaby says:

April 19, 2018 at 2:39 am

Thank you so much for this! you are my new Best Bioinformatician Friend!

Exploring the FAST5 format

2 Comments on “Exploring the FAST5 format”

Leave a Reply Cancel reply