Author Archives: Gavin Wilkie

How to generate a Sample Sheet from sample/index data in BaseSpace

If you are using BaseSpace for sample entry but demultiplexing your data manually, you may have been frustrated that there is no facility to download your sample names and index tag data from BaseSpace as a sample sheet. This means you have to enter the same data twice – with the possibility of errors creeping in especially for large projects with many samples and dual index tags.

We have found a way to avoid typing the same information twice and instead fetch the sample names, index ID’s and index tag sequences from BaseSpace straight to a sample sheet. This saves a huge amount of time for large projects with many samples.

Log in to BaseSpace, and navigate to the ‘Libraries’ page within the ‘Prep Libraries’ tab. Each line is a set of libraries with complete information on index names and tag sequences. Clicking a set of libraries will bring up the following screen – this example has 24 samples with TruSeqLT tags (only 7 are visible without scrolling down the list).

libraries_for_export

Clicking the ‘EXPORT’ button will download a comma separated file (csv) that can be opened in Excel. This file has all the sample names, index ID’s and index sequences (but not in quite the correct format to paste into a sample sheet).

excel_for_export

Open the file in Excel, select the entire Index1 Column and click the ‘Text to Columns’ function (under the ‘Data’ menu in Excel). Choose the ‘Delimited’ option, then tick ‘Other’ and enter a hyphen (-) in the box. This will split the Index1 Column into two, with the name of the Index and the actual Tag sequence in two separate columns, as below.

excel_for_indexes

If using dual indexing (e.g. TruSeqHT or NexteraXT) then do the same for the second column with Index 2 to split the index2 names and sequences into two separate columns.

Now open a blank or used sample sheet that is set up for the correct library chemistry and sequencing instrument (see previous blog post) then copy and paste the sample ID’s, Index ID’s and Index sequences into the sample sheet. Save as a comma separated file (csv) and its ready to use for demultiplexing and fastq generation, or your next MiSeq run. The above example looks like this…

sample_sheet

How to demultiplex Illumina data and generate fastq files using bcl2fastq

Sequence runs on NGS instruments are typically carried out with multiple samples pooled together. An index tag (also called a barcode) consisting of a unique sequence of between 6 and 12bp is added to each sample so that the sequence reads from different samples can be identified.

On the Illumina MiSeq, the process of demultiplexing (dividing your sequence reads into separate files for each index tag/sample) and generating the fastq data files required for downstream analysis is carried out automatically using the onboard PC. However, on the higher-throughput NextSeq500 and HiSeq models this process is carried out on BaseSpace – Illumina’s cloud-based resource.

Whilst there are many advantages to having your sequence data in the cloud (e.g. monitoring a sequence run from home, ease of sharing data with collaborators, etc) there are also some drawbacks to this system. In particular the process of demultiplexing and fastq file generation in BaseSpace can be very slow. It takes up to 8 hours to demultiplex the data from a high output NextSeq500 run on BaseSpace, and if the fastq files then have to be downloaded to your local computer or server for analysis this requires a further 3 hours.

If your data is urgent you may not want to wait 11 hours or more after your sequence run has finished to begin your analysis! We have found that demultiplexing and fastq file generation from a high output NextSeq500 run can instead be carried out in about 30 minutes on our in-house UNIX server. This also has the advantage of avoiding the rather slow step of downloading your fastq files from BaseSpace.

In order to do this, you need to install a free piece of software from Illumina called bcl2fastq on your UNIX server. Demultiplexing NextSeq500 data (or any Illumina system running RTA version 1.18.54 and later) requires bcl2fastq version 2.16 or newer (the latest version at the time of writing is v2.17 and can be downloaded here.

Importantly, we have checked that the results obtained from bcl2fastq and BaseSpace are equivalent – the fastq files generated are exactly the same. BaseSpace is set to remove adapter sequences by default, meaning that the sequence reads may not all be the same length (any reads from short fragments with adapter read-through will have those sequences removed). In bcl2fastq you have the option to either remove adapter sequences or leave them in so that all reads are the same length.

In order to demultiplex the data, first copy the entire run folder from the sequencer to your UNIX server. On the NextSeq500, the run folder will be inside the following directory on the hard disc –
D:\Illumina\NextSeq Control Software\Temp\
It ought to be the ONLY folder here as the NextSeq only retains data from the most recent run – as soon as you start a new sequence run the data from the previous run is deleted. Copy the entire folder, including all its subdirectories. This folder contains the raw basecall (bcl) files. Do not change the name of the folder, which will be named as per the following convention – YYMMDD_InstrumentID_RunID_FlowcellID
For example, the 10th run carried out on a NextSeq500 with serial number 500999, on 14th April 2016 and using flowcell number AHLFNLBGXX would be named as follows –
160414_NB500999_0010_AHLFNLBGXX

The other requirement is a sample sheet – a simple comma separated file (csv) with the library chemistry, sample names and the index tag used for each sample, in addition to some other metrics describing the run. Anyone running a MiSeq will already be familiar with these, but NextSeq and HiSeq users may only have used BaseSpace to enter these values. Unfortunately there is no way to automatically download a sample sheet from BaseSpace (although we have figured out a way round this to avoid double data entry, see the next blog post). Sample sheets can be made and modified using MS Excel or any other software that can read csv files, but the easiest way to make one is to use a free wizard-type program for the PC called Illumina Experiment Manager, which guides you through the process. The latest version at the time of writing is v1.9, which is available here.

Open Illumina Experiment Manager, and click on ‘Create Sample Sheet.’ Then, make certain that you choose the correct sequencer (essential since the NextSeq and MiSeq use opposite reverse complements during index reads). Select ‘Fastq only’ output. Enter any value (numbers or text) for the Reagent Kit Barcode – this will become the filename. Ensure correct library chemistry is selected (e.g. TruSeqLT, TruSeqHT, NexteraXT, etc). If there are custom/non-standard tags these will need to be manually entered in the csv file. Tick adapter trimming for read1 and read2 if required, select either paired or single end reads and enter the read length as appropriate (add one base, so for 150bp reads enter 151). Then either follow the instructions in the next blog post to import sample names and tags from BaseSpace, or enter them manually by adding a blank row for each sample, entering the sample names and selecting the index tag(s) for each sample. It is wise to double check that the sample names and indexes are correct, as mistakes will cause data to be allocated to the wrong file. Change the name of the file to ‘SampleSheet.csv’ and copy it into the top directory inside the sequence run folder on the server. The sample sheet file should resemble the example below – this is for a paired end 2x151bp NextSeq run with four samples, TruSeqLT index tags, and adapter trimming selected.

SampleSheet_example

Now use the command line below on the server to run bcl2fastq. For speed, we use 12 threads for processing the data on our UNIX server (-p 12), however the optimal number will depend on your system architecture, resources and usage limits. It is important to set a limit to the number of threads, otherwise bcl2fastq will use 100% of the CPU’s on the server. We usually invoke the no-lane-splitting option, otherwise each output file from our NextSeq is divided into four (one for each lane on the flowcell). Here we are using the NextSeq run folder mentioned above as an example (160414_NB500999_0010_AHLFNLBGXX) and sending the output to a subdirectory within it called ‘fastq_files.’ For other bcl2fastq options please see Illumina’s manual on the software.

In this example, there should be two fastq files generated for each sample (one each for forward R1 and reverse R2 reads, since this is a paired end 2x151bp run) plus a forward and reverse file for ‘Undetermined’ reads where the index tag did not match any of the tags in the sample sheet. The Undetermined file will contain all of the reads from the PhiX spike-in if used (as PhiX does not have a tag) and also any other reads where there was a basecalling error during the index read. Depending on the PhiX spike-in % and the total number of samples on the run, the size of the Undetermined file should normally be smaller than the other files. If there is a problem suspected with demultiplexing or tagging always check the ‘index.html’ file within the ‘Reports/html’ subdirectory. This file will open on a standard web browser, and clicking the ‘unknown barcode’ option will display the top unknown barcodes and allow problems to be diagnosed. Common issues are that one or more samples were omitted from the sample sheet, errors entering the barcodes, incorrect library chemistry (e.g. selecting NexteraXT instead of TruSeqHT) or that the barcodes (especially sometimes index 2 on dual-indexed samples) need to be reverse-complemented on the sample sheet.

How to Import data for libraries with index tags into BaseSpace

In this blog we describe how to import lists of sample data with defined index tags into BaseSpace, and provide templates for TruSeqLT and TruSeqHT libraries. We have found this saves a lot of time and eliminates errors associated with manual entry.
The Illumina NextSeq500 sequencer requires all users to complete sample data entry on BaseSpace (Illumina’s cloud-based resource) including sample names, species, project names, index tags and sample pools. Whilst there are many advantages to having this data in the cloud, the BaseSpace interface is not always the most convenient or user-friendly system for data entry and management.
Our experience has been that for large projects with many samples, it is impractical to use the manual method of entering sample names in the ‘Biological Samples’ tab, then individually assigning an index tag in the ‘Libraries’ tab by dragging each sample onto an image of a 96-well plate of barcodes. To make matters worse, BaseSpace always mixes up the order of the samples (even if they are named 1-96), so it becomes all too easy to make an error when faced with a long list of sample names in a random order that each require a tag to be assigned.
It is quite easy to import a csv file created in Excel (or similar) with the sample names, species, project and nucleic acid into the ‘Biological Samples’ tab, and thus avoid a large part of the manual data entry. However this still requires the user to individually assign an index tag to each sample using the cumbersome and error-prone interface pictured below, dragging each sample on the list to the correct well on the index plate.
BaseSpace_indexing
It is possible to avoid this by importing a csv file with the sample names, species, project, nucleic acid, index name and also the index tags into the ‘Libraries’ tab on BaseSpace. However, there is very little guidance on how to do this – and Illumina only provide an example template for libraries made using Nextera XT with none of the sequence tags themselves.
We are mainly using TruSeq indexes, so we have generated our own import templates with all 24 TruSeqLT tags, and all 96 dual-indexed TruSeqHT tags. This took quite a bit of trial and error, plus fetching the sequences of all 216 index tags. We have therefore made our own templates for importing TruSeqLT and TruSeqHT libraries available here for others to use.
Simply open the csv file in Excel (or similar) and insert the names of your own samples in the first two columns. Copy and past the index tags you have used to the correct sample lines (Each sample requires the Well, Index1Name, Index1Sequence,Index2Name and Index2Sequence). Change the name of the ContainerID from ‘Platename’ to your own name and delete any lines you don’t need (e.g. if you have less than 24 or 96 samples). Here we are using the template to import 24 samples called apples 1-24 with TruSeqHT dual tags.If using 96 samples, use this.
 template_image
Save the csv file, navigate to the ‘Libraries’ tab in your BaseSpace account and then click the ‘Import’ button on the top-right corner. Choose your csv file, and after a minute you should see your libraries successfully imported with the correct index tags as below, ready to pool for a sequence run.
 imported_libraries
Now, if Illumina would just allow us to import pools of samples we could also avoid having to individually drag each sample into a small dot in the ‘Pools’ tab. This is rather tiresome when there are large numbers of samples in a pool!

Illumina adapter and primer sequences

Illumina Adapter and Primer Sequences

Illumina libraries are normally constructed by ligating adapters to short fragments (100 – 1000bp) of DNA. The exception to this is if Nextera is used (see end of this post) or where PCR amplicons have been constructed that already incorporate the P5/P7 ends that bind to the flowcell.

Illumina Paired End Adapters (cannot be used for multiplexing)

Top adapter
5′ ACACTCTTTCCCTACACGACGCTCTTCCGATC*T 3’

Bottom adapter
5′ P-GATCGGAAGAGCGGTTCAGCAGGAATGCCGAG 3’

Note that the last 12nt of the adapters are complementary (when the bottom adapter is viewed 3’-5’ as below) hence the name ‘forked adapters’. The adapters are annealed together then ligated to both ends of the library DNA. The bottom adapter is 5’-phosphorylated in order to promote ligation. The top adapter has a phosphorothioate bond (*) before the terminal T, ensuring that exonucleases cannot digest the T overhang that pairs to the A-tail added to library fragments.

Structure of Illumina forked PE adapter

PCR with partially complementary primers then extends the ends and resolves the forks, adding unique termini that bind to the oligos on the surface of the flow cell (P5 blue/P7 red, also see diagram at foot of page).

PE PCR Primer 1.0 (P5) (same as universal adapter)
5' AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATC*T 3’

PE PCR Primer 2.0 (P7)
5' CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATC*T 3’

Structure of Illumina TruSeq™ indexed forked adapters

The last 12nt of the adapters are complementary, allowing them to anneal and form the forked structure. The adapter is ligated to both ends of the A-tailed DNA library, generating larger floppy overhangs than with the paired-end adapters on the first page. Note that while the top adapter is identical to the Illumina Universal oligo, the bottom adapter is different to the PE adapter in the purple highlighted section. The adapter already has the index and complete P7/P5 ends.

PCR with the following primers resolves the forked ends to generate products with no floppy overhangs. The sequences that bind to the flow cell (P5 blue/P7 red) finish up at opposite ends of the library fragments.

PCR Primer 1.0 (P5)
5’ AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGA 3’

PCR Primer 2.0 (P7)
5’ CAAGCAGAAGACGGCATACGAGAT 3’

The following oligos (provided in the MiSeq reagent cartridge) are used to prime the sequence reads. Note that the index read primer is complementary to the Read 2 sequencing primer (see diagram below). This is used to sequence the hexamer index tag in the forward direction after read 1 is complete, before the reverse strand is synthesised by bridge amplification.

Multiplexing Read 1 Sequencing Primer
5' ACACTCTTTCCCTACACGACGCTCTTCCGATCT

Multiplexing Index Read Sequencing Primer
5' GATCGGAAGAGCACACGTCTGAACTCCAGTCAC

Multiplexing Read 2 Sequencing Primer
5' GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT

When ordering primers for use in Illumina libraries, make certain to include the modifications (e.g. 5’-phosphorylation and phosphorothioate bonds on the 3' terminal nucleotide) and ensure the oligos are PAGE purified. Even small amounts of n-1 primers will lead to messy out-of-phase sequencing and cause clusters to fail filtering. Costs per oligo for 0.2µmole synthesis scale and PAGE are in region of £40.

TruSeq™ DNA Sample Prep Kit v2

There are currently two versions of the kit, each with 12 different adapters that incorporate unique index tags – allowing samples to be multiplexed on the same sequencing run.

Kit A contains indexes: 2, 4, 5, 6, 7, 12, 20, 21, 22, 23, 25, 27.
Kit B contains indexes: 1, 3, 8, 9, 10, 11, 13, 14, 15, 16, 18, 19.

NOTE that all the indexed adapters should be 5’-Phosphorylated. For unknown reasons adapters 13-27 have an extra 2 bases (these are not used for the indexing). Illumina also reserve certain numbers e.g. 17, 24 and 26. The 6-base index tag sequences are in italics below.

TruSeq Universal Adapter
5’ AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATC*T

TruSeq Adapter, Index 1
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG
TruSeq Adapter, Index 2
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATGTATCTCGTATGCCGTCTTCTGCTTG
TruSeq Adapter, Index 3
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACTTAGGCATCTCGTATGCCGTCTTCTGCTTG
TruSeq Adapter, Index 4
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACTGACCAATCTCGTATGCCGTCTTCTGCTTG
TruSeq Adapter, Index 5
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCTCGTATGCCGTCTTCTGCTTG
TruSeq Adapter, Index 6
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACGCCAATATCTCGTATGCCGTCTTCTGCTTG
TruSeq Adapter, Index 7
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACCAGATCATCTCGTATGCCGTCTTCTGCTTG
TruSeq Adapter, Index 8
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACACTTGAATCTCGTATGCCGTCTTCTGCTTG
TruSeq Adapter, Index 9
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACGATCAGATCTCGTATGCCGTCTTCTGCTTG
TruSeq Adapter, Index 10
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACTAGCTTATCTCGTATGCCGTCTTCTGCTTG
TruSeq Adapter, Index 11
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACGGCTACATCTCGTATGCCGTCTTCTGCTTG
TruSeq Adapter, Index 12
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACCTTGTAATCTCGTATGCCGTCTTCTGCTTG
TruSeq Adapter, Index 13
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACAGTCAACAATCTCGTATGCCGTCTTCTGCTTG
TruSeq Adapter, Index 14
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACAGTTCCGTATCTCGTATGCCGTCTTCTGCTTG
TruSeq Adapter, Index 15
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACATGTCAGAATCTCGTATGCCGTCTTCTGCTTG
TruSeq Adapter, Index 16
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACCCGTCCCGATCTCGTATGCCGTCTTCTGCTTG
TruSeq Adapter, Index 18
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACGTCCGCACATCTCGTATGCCGTCTTCTGCTTG
TruSeq Adapter, Index 19
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACGTGAAACGATCTCGTATGCCGTCTTCTGCTTG
TruSeq Adapter, Index 20
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACGTGGCCTTATCTCGTATGCCGTCTTCTGCTTG
TruSeq Adapter, Index 21
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACGTTTCGGAATCTCGTATGCCGTCTTCTGCTTG
TruSeq Adapter, Index 22
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACCGTACGTAATCTCGTATGCCGTCTTCTGCTTG
TruSeq Adapter, Index 23
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACGAGTGGATATCTCGTATGCCGTCTTCTGCTTG
TruSeq Adapter, Index 25
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACACTGATATATCTCGTATGCCGTCTTCTGCTTG
TruSeq Adapter, Index 27
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACATTCCTTTATCTCGTATGCCGTCTTCTGCTTG

NEBNext® DNA Library Prep

The NEB kit uses a short adapter which is supplied as a single self-complimentary oligo forming a stem-loop. It has a Uracil base that is later cleaved and removed by Uracil Glycosylase and base excision repair enzyme mix (USER).

NEBNext adaptor for Illumina

5’ P-GATCGGAAGAGCACACGTCTGAACTCCAGTC-U-ACACTCTTTCCCTACACGACGCTCTTCCGATC*T 3’

Oligo is designed to self-anneal forming a stem-loop structure as below. This may help to prevent formation of adapter dimers during ligation.

The Index tags and the P5/P7 ends are added by PCR using universal and tagged primers. The end result is exactly the same as TruSeq.

NEBnext Universal primer
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATC*T

NEBnext Indexed primers 1 – 12 (6-mer indexes)

Index 1 CAAGCAGAAGACGGCATACGAGATCGTGATGTGACTGGAGTTCAGACGTGTGCTCTTCCGATC*T
Index 2 CAAGCAGAAGACGGCATACGAGATACATCGGTGACTGGAGTTCAGACGTGTGCTCTTCCGATC*T
Index 3 CAAGCAGAAGACGGCATACGAGATGCCTAAGTGACTGGAGTTCAGACGTGTGCTCTTCCGATC*T
Index 4 CAAGCAGAAGACGGCATACGAGATTGGTCAGTGACTGGAGTTCAGACGTGTGCTCTTCCGATC*T
Index 5 CAAGCAGAAGACGGCATACGAGATCACTGTGTGACTGGAGTTCAGACGTGTGCTCTTCCGATC*T
Index 6 CAAGCAGAAGACGGCATACGAGATATTGGCGTGACTGGAGTTCAGACGTGTGCTCTTCCGATC*T
Index 7 CAAGCAGAAGACGGCATACGAGATGATCTGGTGACTGGAGTTCAGACGTGTGCTCTTCCGATC*T
Index 8 CAAGCAGAAGACGGCATACGAGATTCAAGTGTGACTGGAGTTCAGACGTGTGCTCTTCCGATC*T
Index 9 CAAGCAGAAGACGGCATACGAGATCTGATCGTGACTGGAGTTCAGACGTGTGCTCTTCCGATC*T
Index 10 CAAGCAGAAGACGGCATACGAGATAAGCTAGTGACTGGAGTTCAGACGTGTGCTCTTCCGATC*T
Index 11 CAAGCAGAAGACGGCATACGAGATGTAGCCGTGACTGGAGTTCAGACGTGTGCTCTTCCGATC*T
Index 12 CAAGCAGAAGACGGCATACGAGATTACAAGGTGACTGGAGTTCAGACGTGTGCTCTTCCGATC*T

SURESELECT (POST-CAPTURE INDEXING)

This begins with a shorter bottom adapter that is extended to add the P5 end in the pre-capture PCR. The post-capture PCR step adds the index and P7 end. Note The NEB adapter is more efficient than the InPE adapter in my comparative tests.

InPE adapter (indexing paired end adapter)

PRE-CAPTURE PCR

PCR Primer 1.0 [Tm 70deg]
5’ AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACG*A

Multiplex PCR Primer 2.0 [Tm 67deg]
5' GTGACTGGAGTTCAGACGTGTGCTCTTCCGAT*C

POST-CAPTURE PCR Indexing Primers
2nd PCR reaction (post-capture amplification) adds indexes and P7 sequence.

Universal Primer [Tm 75deg]
5' AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATC*T

PCR Primer, Index 1 5’ CAAGCAGAAGACGGCATACGAGATCGTGATGTGACTGGAGTT*C
PCR Primer, Index 2 5’ CAAGCAGAAGACGGCATACGAGATACATCGGTGACTGGAGTT*C
PCR Primer, Index 3 5’ CAAGCAGAAGACGGCATACGAGATGCCTAAGTGACTGGAGTT*C
PCR Primer, Index 4 5’ CAAGCAGAAGACGGCATACGAGATTGGTCAGTGACTGGAGTT*C
PCR Primer, Index 5 5’ CAAGCAGAAGACGGCATACGAGATCACTGTGTGACTGGAGTT*C
PCR Primer, Index 6 5’ CAAGCAGAAGACGGCATACGAGATATTGGCGTGACTGGAGTT*C
PCR Primer, Index 7 5’ CAAGCAGAAGACGGCATACGAGATGATCTGGTGACTGGAGTT*C
PCR Primer, Index 8 5’ CAAGCAGAAGACGGCATACGAGATTCAAGTGTGACTGGAGTT*C
PCR Primer, Index 9 5’ CAAGCAGAAGACGGCATACGAGATCTGATCGTGACTGGAGTT*C
PCR Primer, Index 10 5’ CAAGCAGAAGACGGCATACGAGATAAGCTAGTGACTGGAGTT*C
PCR Primer, Index 11 5’ CAAGCAGAAGACGGCATACGAGATGTAGCCGTGACTGGAGTT*C
PCR Primer, Index 12 5’ CAAGCAGAAGACGGCATACGAGATTACAAGGTGACTGGAGTT*C

Guidelines for Low-Level Pooling
Some sequencing experiments require the use of fewer than 12 index sequences in a lane with a high cluster density. In such cases, a careful selection of indexes is required to ensure optimum cluster discrimination by having different bases at each cycle of the index read. Illumina recommends the following sets of indexes for low-level pooling experiments

Pool of 2 samples:
Index 6 GCCAAT
Index 12 CTTGTA

Pool of 3 samples:
Index 4 TGACCA
Index 6 GCCAAT
Index 12 CTTGTA

Pool of 6 samples:
Index 2 CGATGT
Index 4 TGACCA
Index 5 ACAGTG
Index 6 GCCAAT
Index 7 CAGATC
Index 12 CTTGTA

Nextera Sample Preparation

The sequences in Nextera libraries are different to all the other workflows.

Nextera® transposase sequences (FC-121-1031, FC-121-1030)

The Tn5 transposase cuts the sample DNA and adds the following sequence at either end of each fragment, with the highlighted sequence next to the library insert.

5’ TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG
Read 1 >

5’ GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG
Read 2 >

Nextera® Index Kit – PCR primers (FC-121-1012, FC-121-1011)

PCR with the following primers adds the P5 and P7 termimi that bind to the flowcell and also the dual 8bp index tags (denoted by the i5 and i7 below).

5’ AATGATACGGCGACCACCGAGATCTACAC[i5]TCGTCGGCAGCGTC

5’ CAAGCAGAAGACGGCATACGAGAT[i7]GTCTCGTGGGCTCGG

If trimming adapters from Nextera runs should cut the reads at CTGTCTCTTATACACATCT instead of the usual AGATCGGAAGAGC. Use of cutadapt, trim_galore or similar program is recommended with custom adapter specified.