Adding a new assembler in MetAMOS
MetAMOS is a modular metagenomic analysis pipeline that can be used to automate metagenomic assembly, annotation and scaffolding analysis. Further information and published paper about MetAMOS can be found here and documentation about the pipeline can be found here.
One of the biggest advantages of using the MetAMOS framework is the modularity of the framework. It allows users to add new programs at various steps of the pipeline such as annotation and assembly.
Here, I discuss how to add a new assembler within the framework. I am going to use metacortex as an example assembler that we wanted to incorporate in the CVR’s MetAMOS pipeline.
The chart below show the steps to add a new assembler into the MetAMOS pipeline.
To add a new assembler, add the name of the assembler in the ASSEMBLE.generic file. This file can be found here metAMOS/Utilities/config/ASSEMBLE.generic
It is also possible to add various version of the same program in the list as long as the names of the programs are different, e.g. soap_v1, soap_v2 etc.
Writing a configuration file is the most important part of adding a new assembler to the pipeline. The configuration file is the place where we specify various parameters for the assembly software including name, location, input, output, commands to run etc. The configuration file has several reserved keywords. A list of currently available reserved keywords has been described here.
The configuration file for metacortex looks like this:
[CONFIG] input FASTQ name MetaCortex output [PREFIX]_metacortex_contig.fa location cpp/[MACHINE]/metacortex-master/bin paired [FIRST] [SECOND] commands echo [FIRST] > input.txt && \ echo [SECOND] >> input.txt && \ metacortex_k160 -k [KMER] -n 23 -b 65 -i input.txt -t fastq –o [PREFIX]_metacortex_contig.ctx -f [PREFIX]_metacortex_contig.fa -l metacortex.log
The program configuration can be specified here in the [CONFIG] section. The “commands” keyword is used to specify the commands that need to run the actual program. Multiple commands can be separated using && sign. One can also use the reserved keywords such as [FIRST], [SECOND], [KMER], [PREFIX] etc to specify the input and output details. These keywords are identified by MetAMOS and are consistent throughout the framework.
|input||Input type – fasta, fastq etc|
|name||Program name that you want to report|
|output||Output file name|
|location||Location of executable|
|paired||Type of paired end read and how to pass it. You can use reserved keywords such as [FIRST] and [SECOND] here|
|commands||Actual commands that need to be run using the executable specified at the location specified in the location|
After preparing this file and following the steps mentioned in the figure 1, metAMOS needs to be built again. To rebuild the metAMOS executable:
1) Delete .pyc and .c files from the metAMOS-master/src/ folder
2) Delete the executables initPipeline and runPipeline.
Do not delete the .py scripts.
To rebuild metAMOS run the python script setup.py. This should generate the .pyc and .c files in the src/ directory and should also create the initPipeline and runPipeline executable.
When I tried running this newly added metacortex assembler within the pipeline, it gave me the following error.
Job = [metacortex.97.run -> metacortex.97.asm.contig] completed Completed Task = assemble.Assemble *** MetAMOS Warning: metacortex assembler did not run successfully! ** MetAMOS Error: no selected assembler ran successfully! Please check the logs in /home/test_output/Log/ASSEMBLE.log for details. ruffus.ruffus_exceptions.RethrownJobError: Exception #1 'ruffus.ruffus_exceptions.JobSignalledBreak( )' raised in ... Task = def assemble.CheckAsmResults(...): Job = [[metacortex.97.asm.contig] -> [assemble.ok]] Traceback (most recent call last): File "/home/metAMOS-master/Utilities/ruffus/task.py", line 625, in run_pooled_job_without_exceptions return_value = job_wrapper(param, user_defined_work_func, register_cleanup, touch_files_only) File "/home/metAMOS-master/Utilities/ruffus/task.py", line 491, in job_wrapper_io_files ret_val = user_defined_work_func(*param) File "/home/metAMOS-master/src/assemble.py", line 596, in CheckAsmResults raise(JobSignalledBreak) JobSignalledBreak:
grep METACORTEX [output_dir]/pipeline.conf
I got the following output for the grep command:
Error: requested to run MetaCortex (echo) but not available in specified location metacortex-master/bin. Please check your specification and try again Job = [metacortex.97.run -> metacortex.97.asm.contig] completed Completed Task = assemble.Assemble *** MetAMOS Warning: metacortex assembler did not run successfully!
Sergey Koren, one of the developers of metAMOS , pointed out that the error was not due the configuration file but was because of the ‘echo’ command used in the file. MetAMOS has a list of allowed system commands, as echo was not specified in that list, MetAMOS could not recognize the command. System commands are specified in metAMOS-master/src/generic.py
Line #13 SYSTEM_COMMANDS
#!python import os, sys, string, time, BaseHTTPServer, getopt, re, subprocess, webbrowser from datetime import date from datetime import time from datetime import datetime from operator import itemgetter from utils import * from task import JobSignalledBreak LIBRARY_TYPES = enum("PAIRED", "MATED") TECHNOLOGY_TYPES = enum("SOLEXA", "SANGER", "PACBIO", "454") SYSTEM_COMMANDS = [ "mkdir", "mv", "bash", "ln", "rm", "cp", "ls", "echo" ] _readlibs =  _skipsteps =  ...... ......
As highlighted above, the ‘echo’ command was added to this list, and MetAMOS executable was rebuilt again for the changes to take effect.
Job done, metacortex is now part of our in-house customized MetAMOS.