Adding a new assembler in MetAMOS

MetAMOS is a modular metagenomic analysis pipeline that can be used to automate metagenomic assembly, annotation and scaffolding analysis. Further information and published paper about MetAMOS can be found here and documentation about the pipeline can be found here.

One of the biggest advantages of using the MetAMOS framework is the modularity of the framework. It allows users to add new programs at various steps of the pipeline such as annotation and assembly.

Here, I discuss how to add a new assembler within the framework. I am going to use metacortex as an example assembler that we wanted to incorporate in the CVR’s MetAMOS pipeline.

The chart below show the steps to add a new assembler into the MetAMOS pipeline.

Add a new assembler in MetAMOS
Steps to add new assembler in MetAMOS pipeline

To add a new assembler, add the name of the assembler in the ASSEMBLE.generic file. This file can be found here metAMOS/Utilities/config/ASSEMBLE.generic

It is also possible to add various version of the same program in the list as long as the names of the programs are different, e.g. soap_v1, soap_v2 etc.

Writing a configuration file is the most important part of adding a new assembler to the pipeline. The configuration file is the place where we specify various parameters for the assembly software including name, location, input, output, commands to run etc. The configuration file has several reserved keywords. A list of currently available reserved keywords has been described here.

The configuration file for metacortex looks like this:

[CONFIG]
input FASTQ
name MetaCortex
output [PREFIX]_metacortex_contig.fa
location cpp/[MACHINE]/metacortex-master/bin
paired [FIRST] [SECOND]
commands echo [FIRST] > input.txt && \
         echo [SECOND] >> input.txt && \
         metacortex_k160 -k [KMER] -n 23 -b 65 -i input.txt -t fastq –o [PREFIX]_metacortex_contig.ctx -f [PREFIX]_metacortex_contig.fa -l metacortex.log

The program configuration can be specified here in the [CONFIG] section. The “commands” keyword is used to specify the commands that need to run the actual program. Multiple commands can be separated using && sign. One can also use the reserved keywords such as [FIRST], [SECOND], [KMER], [PREFIX] etc to specify the input and output details. These keywords are identified by MetAMOS and are consistent throughout the framework.

input Input type – fasta, fastq etc
name Program name that you want to report
output Output file name
location Location of executable
paired Type of paired end read and how to pass it. You can use reserved keywords such as [FIRST] and [SECOND] here
commands Actual commands that need to be run using the executable specified at the location specified in the location

After preparing this file and following the steps mentioned in the figure 1, metAMOS needs to be built again. To rebuild the metAMOS executable:

1) Delete .pyc and .c files from the metAMOS-master/src/ folder
2) Delete the executables initPipeline and runPipeline.
Do not delete the .py scripts.
To rebuild metAMOS run the python script setup.py. This should generate the .pyc and .c files in the src/ directory and should also create the initPipeline and runPipeline executable.

When I tried running this newly added metacortex assembler within the pipeline, it gave me the following error.

Job = [metacortex.97.run -> metacortex.97.asm.contig] completed
Completed Task = assemble.Assemble
*** MetAMOS Warning: metacortex assembler did not run successfully!
** MetAMOS Error: no selected assembler ran successfully! Please check the logs in /home/test_output/Log/ASSEMBLE.log for details.
ruffus.ruffus_exceptions.RethrownJobError:

Exception #1
'ruffus.ruffus_exceptions.JobSignalledBreak(
)' raised in ...
Task = def assemble.CheckAsmResults(...):
Job  = [[metacortex.97.asm.contig] -> [assemble.ok]]

Traceback (most recent call last):
File "/home/metAMOS-master/Utilities/ruffus/task.py", line 625, in run_pooled_job_without_exceptions
return_value =  job_wrapper(param, user_defined_work_func, register_cleanup, touch_files_only)
File "/home/metAMOS-master/Utilities/ruffus/task.py", line 491, in job_wrapper_io_files
ret_val = user_defined_work_func(*param)
File "/home/metAMOS-master/src/assemble.py", line 596, in CheckAsmResults
raise(JobSignalledBreak)
JobSignalledBreak:

I posted this on the github forum for help and MetAMOS developers (Todd J Treangen and Chris Hill) asked me to run the following command and provide my config file:

grep METACORTEX [output_dir]/pipeline.conf

I got the following output for the grep command:

Error: requested to run MetaCortex (echo) but not available in specified location metacortex-master/bin. Please check your specification and try again
Job = [metacortex.97.run -> metacortex.97.asm.contig] completed
Completed Task = assemble.Assemble
*** MetAMOS Warning: metacortex assembler did not run successfully! 

Sergey Koren, one of the developers of metAMOS , pointed out that the error was not due the configuration file but was because of the ‘echo’ command used in the file. MetAMOS has a list of allowed system commands, as echo was not specified in that list, MetAMOS could not recognize the command. System commands are specified in metAMOS-master/src/generic.py

Line #13 SYSTEM_COMMANDS

#!python

import os, sys, string, time, BaseHTTPServer, getopt, re, subprocess, webbrowser
from datetime import date
from datetime import time
from datetime import datetime
from operator import itemgetter
from utils import *
from task import JobSignalledBreak

LIBRARY_TYPES = enum("PAIRED", "MATED")
TECHNOLOGY_TYPES = enum("SOLEXA", "SANGER", "PACBIO", "454")
SYSTEM_COMMANDS = [ "mkdir", "mv", "bash", "ln", "rm", "cp", "ls", "echo" ]

_readlibs = []
_skipsteps = []
......
......

As highlighted above, the ‘echo’ command was added to this list, and MetAMOS executable was rebuilt again for the changes to take effect.

Job done, metacortex is now part of our in-house customized MetAMOS.

Categories: Metagenomics