Adding a new assembler in MetAMOS

MetAMOS is a modular metagenomic analysis pipeline that can be used to automate metagenomic assembly, annotation and scaffolding analysis. Further information and published paper about MetAMOS can be found here and documentation about the pipeline can be found here.

One of the biggest advantages of using the MetAMOS framework is the modularity of the framework. It allows users to add new programs at various steps of the pipeline such as annotation and assembly.

Here, I discuss how to add a new assembler within the framework. I am going to use metacortex as an example assembler that we wanted to incorporate in the CVR’s MetAMOS pipeline.

The chart below show the steps to add a new assembler into the MetAMOS pipeline.

Add a new assembler in MetAMOS

Steps to add new assembler in MetAMOS pipeline

To add a new assembler, add the name of the assembler in the ASSEMBLE.generic file. This file can be found here metAMOS/Utilities/config/ASSEMBLE.generic

It is also possible to add various version of the same program in the list as long as the names of the programs are different, e.g. soap_v1, soap_v2 etc.

Writing a configuration file is the most important part of adding a new assembler to the pipeline. The configuration file is the place where we specify various parameters for the assembly software including name, location, input, output, commands to run etc. The configuration file has several reserved keywords. A list of currently available reserved keywords has been described here.

The configuration file for metacortex looks like this:

The program configuration can be specified here in the [CONFIG] section. The “commands” keyword is used to specify the commands that need to run the actual program. Multiple commands can be separated using && sign. One can also use the reserved keywords such as [FIRST], [SECOND], [KMER], [PREFIX] etc to specify the input and output details. These keywords are identified by MetAMOS and are consistent throughout the framework.

input Input type – fasta, fastq etc
name Program name that you want to report
output Output file name
location Location of executable
paired Type of paired end read and how to pass it. You can use reserved keywords such as [FIRST] and [SECOND] here
commands Actual commands that need to be run using the executable specified at the location specified in the location

After preparing this file and following the steps mentioned in the figure 1, metAMOS needs to be built again. To rebuild the metAMOS executable:

1) Delete .pyc and .c files from the metAMOS-master/src/ folder
2) Delete the executables initPipeline and runPipeline.
Do not delete the .py scripts.
To rebuild metAMOS run the python script setup.py. This should generate the .pyc and .c files in the src/ directory and should also create the initPipeline and runPipeline executable.

When I tried running this newly added metacortex assembler within the pipeline, it gave me the following error.

I posted this on the github forum for help and MetAMOS developers (Todd J Treangen and Chris Hill) asked me to run the following command and provide my config file:

I got the following output for the grep command:

Sergey Koren, one of the developers of metAMOS , pointed out that the error was not due the configuration file but was because of the ‘echo’ command used in the file. MetAMOS has a list of allowed system commands, as echo was not specified in that list, MetAMOS could not recognize the command. System commands are specified in metAMOS-master/src/generic.py

Line #13 SYSTEM_COMMANDS

As highlighted above, the ‘echo’ command was added to this list, and MetAMOS executable was rebuilt again for the changes to take effect.

Job done, metacortex is now part of our in-house customized MetAMOS.

Bioinformatician at CVR.
http://bioinformatics.cvr.ac.uk/sejal.php