SeSaMe (stands for Spore associated Symbiotic Microbes) is metagenome sequence classifier of short sequences obtained by next-generation DNA sequencing.
SeSaMe is designed for taxonomic classification of sequences from microorganisms associated with Arbuscular mycorrhizal fungi (AMF).
SeSaMe enables users to estimate not only taxonomic diversity and abundance but also gene reservoir of taxonomic group associated with AMF.
SeSaMe calculates genus probability scores based on genus specific sequence properties: amino acid usage and codon usage of three consecutive codon DNA 9-mers encoding amino acid trimer in protein secondary structure.
There are two SeSaMe programs for taxonomic classification and each program is quipped with taxon probability scoring method and P value score method.
One classifies a query sequence into one out of 54 genus references and the other classifies it into one out of 13 taxon groups: Clostridia, Bacilli, Oscillatoriophycideae, Nostocales, Acidobacteriales, Betaproteobacteria, Deltaproteobacteria, Gammaproteobacteria, Alphaproteobacteria, Actinobacteria, AMF (Glomeromycotina), Agaricomycotina, and Pezizomycotina.
SeSaMe can be applicable to soil metagenomes as well. |
Requirements
Operating System: Linux/ Unix. The program was tested on Linux operating system- CentOS Linux 7 (www.centos.org).
Computer programming language: Java (www.java.net, www.oracle.com (Java8)). There are two sets of programs. One set requires additional libraries: Apache Commons Math3 (3.3) and IO (2.4) libraries (www.apache.org). Program output size is very large. You should calculate how much space you will need before you run the program.
|
Prediction Accuracy
The mean of the correct prediction percentages in CDS and non-CDS test sets in genus level:
CDS/non-CDS | Bacteria | Fungi | AMF |
CDS | 71% | 65% | 49% |
non-CDS | 50% | 73% | 72% |
The mean and standard deviation of the correct prediction percentages in CDS test sets in 13 taxon group level:
Clostridia | 64% �� 4.2% | Gammaproteobacteria | 81% �� 7.8% |
Bacilli | 71% �� 6.4% | Alphaproteobacteria | 88% �� 9.2% |
Oscillatoriophycideae | 84% �� 2.5% | Actinobacteria | 85% �� 5.9% |
Nostocales | 70% �� 2.8% | AMF (R. irregularis) | 42% �� 0% |
Acidobacteriales | 73% �� 0% | Agaricomycotina | 65% �� 6.4% |
Betaproteobacteria | 83% �� 8% | Pezizomycotina | 79% �� 6.7% |
Deltaproteobacteria | 74% �� 10% | | |
|