NCBI Entrez Direct UNIX E-utilities

I use NCBI Entrez Direct UNIX E-utilities regularly for sequence and data retrieval from NCBI. These UNIX utils can be combined with any UNIX commands. It is available to download from the NCBI website: ftp://ftp.ncbi.nlm.nih.gov/entrez/entrezdirect/ A few useful examples for NCBI edirect utilities. Download a sequence in fasta format from NCBI using accession number

Batch […]

Read More

NGS Data Formats and Analyses

Here are my slides from a session on NGS data formats and analyses that I gave as part of the EPIZONE Workshop on Next Generation Sequencing applications and Bioinformatics in Brussels in April 2016. It covers file formats such as FASTA, FASTQ, SAM, BAM, and VCF, and also goes over IUAPAC nucleotide ambiguity codes, read names, quality […]

Read More

How to Import data for libraries with index tags into BaseSpace

In this blog we describe how to import lists of sample data with defined index tags into BaseSpace, and provide templates for TruSeqLT and TruSeqHT libraries. We have found this saves a lot of time and eliminates errors associated with manual entry. The Illumina NextSeq500 sequencer requires all users to complete sample data entry on […]

Read More

Setting up an Amazon ftp server to receive big files

Sharing large files with collaborators has rarely been a problem, we usually just compress them and put them on our web server and then send the link to our collaborator who can then download the file. However, we have struggled to find a solution to receive large files. We usually run out of space in […]

Read More

featureCounts or htseq-count?

Count-based differential expression analysis of sequencing data is one of the best known pipeline in bioinformatics analysis. In this pipeline, the vital step is to estimate the reads count of each genomic features. After counting the features, the differential expression(DE) analysis tools are used for getting the differential expression list of genomic features.  It has been […]

Read More

Java CIGAR Parser for SAM format

Sequence Alignment/Map (SAM) format is a well-known bioinformatics format designed to store  information about reads mapping against large reference sequence.  The SAM file is split into two sections: a header section and an alignment section. The header section starts with ‘@’ and it contains information such as the name and length of the reference sequence. […]

Read More