Genome Classifier

[1]:
import modelseedpy

Pull the genome classifier model

[2]:
from modelseedpy.helpers import get_classifier
[3]:
classifier = get_classifier('knn_filter')
[4]:
type(classifier)
[4]:
modelseedpy.core.msgenomeclassifier.MSGenomeClassifier

Get a Genome and Annotate with RAST

RAST annotation is essential since the classifier was trained with RAST annotated functions

[5]:
# Load e. coli genome
genome = modelseedpy.MSGenome.from_fasta('GCF_000005845.2_ASM584v2_protein.faa', split=' ')
[6]:
modelseedpy.RastClient().annotate_genome(genome)
[6]:
[{'id': '23AFF380-F4F9-11EB-BBBA-BBE5BBF382BD',
  'parameters': ['-a',
   '-g',
   200,
   '-m',
   5,
   '-d',
   '/opt/patric-common/data/kmer_metadata_v2',
   '-u',
   'http://pear.mcs.anl.gov:6100/query'],
  'hostname': 'pear',
  'tool_name': 'kmer_search',
  'execution_time': 1628063644.76991},
 {'execution_time': 1628063644.90382,
  'tool_name': 'KmerAnnotationByFigfam',
  'hostname': 'pear',
  'id': '23C46324-F4F9-11EB-BBBA-BBE5BBF382BD',
  'parameters': ['annotate_hypothetical_only=1',
   'dataset_name=Release70',
   'kmer_size=8']},
 {'parameters': [],
  'id': '23F64B78-F4F9-11EB-908D-F73BBDF382BD',
  'tool_name': 'annotate_proteins_similarity',
  'hostname': 'pear',
  'execute_time': 1628063645.23091}]

Run classifier

  • A: Archaea

  • C: Cyanobacteria

  • N: Gram Negative

  • P: Gram Positive

[ ]:
classifier.classify(genome)
'N'
[ ]: