How to generate a Sample Sheet from sample/index data in BaseSpace

If you are using BaseSpace for sample entry but demultiplexing your data manually, you may have been frustrated that there is no facility to download your sample names and index tag data from BaseSpace as a sample sheet. This means you have to enter the same data twice – with the possibility of errors creeping […]

Read More

NCBI Entrez Direct UNIX E-utilities

I use NCBI Entrez Direct UNIX E-utilities regularly for sequence and data retrieval from NCBI. These UNIX utils can be combined with any UNIX commands. It is available to download from the NCBI website: ftp://ftp.ncbi.nlm.nih.gov/entrez/entrezdirect/ A few useful examples for NCBI edirect utilities. Download a sequence in fasta format from NCBI using accession number esearch -db […]

Read More

NGS Data Formats and Analyses

Here are my slides from a session on NGS data formats and analyses that I gave as part of the EPIZONE Workshop on Next Generation Sequencing applications and Bioinformatics in Brussels in April 2016. It covers file formats such as FASTA, FASTQ, SAM, BAM, and VCF, and also goes over IUAPAC nucleotide ambiguity codes, read names, quality […]

Read More

How to Import data for libraries with index tags into BaseSpace

In this blog we describe how to import lists of sample data with defined index tags into BaseSpace, and provide templates for TruSeqLT and TruSeqHT libraries. We have found this saves a lot of time and eliminates errors associated with manual entry. The Illumina NextSeq500 sequencer requires all users to complete sample data entry on […]

Read More

Setting up an Amazon ftp server to receive big files

Sharing large files with collaborators has rarely been a problem, we usually just compress them and put them on our web server and then send the link to our collaborator who can then download the file. However, we have struggled to find a solution to receive large files. We usually run out of space in […]

Read More

featureCounts or htseq-count?

Count-based differential expression analysis of sequencing data is one of the best known pipeline in bioinformatics analysis. In this pipeline, the vital step is to estimate the reads count of each genomic features. After counting the features, the differential expression(DE) analysis tools are used for getting the differential expression list of genomic features.  It has been […]

Read More