mscommphitting

MSCommPhitting()

This class contains the functions that load and parameterize experimental data, define a linear problem for the defined system with the parameterized data, simulates the LP, and graphically the results:

from modelseedpy.community import MSCommPhitting
mscommfit = MSCommPhitting(msdb_path, community_members: dict=None, fluxes_df=None,
                           growth_df=None, carbon_conc=None, media_conc=None,
                           experimental_metadata=None, base_media=None, solver: str = "glpk",
                           all_phenotypes=True, data_paths: dict = None,
                           species_abundances: str = None, carbon_conc_series: dict = None,
                           ignore_trials: Union[dict, list] = None, ignore_timesteps: list = None,
                           species_identities_rows=None, significant_deviation: float = 2,
                           extract_zip_path: str = None)

The member models and experimental data can be parsed and parameterized or the processed files of experimental data can be passed in the class initialization. The former option is enacted by providing the community_members argument or when one of the fluxes_df and growth_df arguments are missing; the latter option is enacted otherwise.

  • msdb_path str: the path to the ModelSEED Database GitHub repository, which is loaded and referenced by the model. This is the only ubiquitously required argument.

  • community_members dict: a description of the member models and phenotypes in the simulated community. A community of E. coli (Acetate and Maltose phenotypes) and Pseudomonas fluorescens (Acetate and 4-Hydroxybenzoate phenotypes) would be expressed by the following block, where ecoli and pf denote the COBRA model objects and the list keys with “consumed” and “excreted” describe the set of metabolites that are consumed or excreted for the given growth phenotype, respectively.

{
  ecoli: {
    "name": "ecoli",
    "phenotypes": {
      "Maltose": {"consumed":["cpd00179"], "excreted":["cpd00029"]},
      "Acetate": {"consumed":["cpd00029"]},
    }
},
  pf: {
        "name": "pf",
    "phenotypes": {
      "Acetate": {"consumed":["cpd00029"]},
      "4-Hydroxybenzoate": {"consumed":["cpd00136"]}
    }
  }
}
  • fluxes_df Pandas DataFrame: a DataFrame that consists of the metabolic flux profile for each phenotype that is described in community_members and will be simulated by CommPhiting. Each column is a separate phenotype, each row is an exchange reaction, and each element is the flux of the exchange reaction for the respective phenotype. This argument offers an opportunity to save compute time by loading a defined DataFrame from a previous simulation.

  • growth_df Pandas DataFrame: a DataFrame that contains parsed and organized experimental data to which the model will fit. The DataFrame is indexed by short_codes that concisely describe the experiment, while the trial_IDs fields offer more detail about the trial, including the relative abundances of each member and the initial mM concentrations of all pertinent compounds delimited by - hyphens. This argument offers an opportunity to save compute time by loading a defined DataFrame from a previous simulation.

  • carbon_conc dict: the concentrations (values) of carbon sources as ModelSEED IDs (keys) in the media, denoted by either columns or rows for the dimension in the experimental well-plate where the specified concentration varies.

{
"rows": {
   "cpd00136": {"B":0, "C": 0, "D": 1, "E": 1, "F": 4, "G": 4},
   "cpd00179": {"B":5, "C": 5, "D":5, "E": 5, "F": 5, "G": 5},
 },
"columns": {
   "cpd00029": {2:100, 3: 50, 4: 25, 5: 12.5, 6: 6.25, 7: 3}
 }
}
  • media_conc dict: the mM concentration of each media component indexed by its ModelSEED ID.

  • experimental_metadata Pandas DataFrame: a DataFrame that consists of metadata for the experiments, indexed by the short_codes. The trial_IDs column emulates that of the growth_df DataFrame. The a additional_compounds column lists the chemicals, and their initial and final mM concentrations, that augment the media defined in the base_media column. The strains column lists the community members and their respective relative abundances (an abbreviated form of this information is provided in the trial_IDs column). The date column provides the date when the experiment occurred.

  • base_media ModelSEEDpy Media: a media object that is parsed to acquire the concentration for each component in the media, and can therefore supplement the omission of the media_conc argument.

  • solver str: the Linear Programming solver that will be used to solve the constructed problem. The open-source GLPK solveris used by default, to accommodate the greatest number of users.

  • all_phenotypes bool: specifies whether all phenotypes for the respective members will be defined and simulated.

  • data_paths dict: the local path to the data spreadsheet and the identification of pertinent content in the worksheets:

{
   "path":"data/Jeffs_data/PF-EC 4-29-22 ratios and 4HB changes.xlsx",
   "Raw OD(590)":"OD",
   "mNeonGreen":"pf",
   "mRuby":"ecoli"
}
  • species_abundance dict: the relative abundances of all members in the community for each column in the experimental well-plates:

{
   1:{"ecoli":0, "pf":1},
   2:{"ecoli":1, "pf":50},
   3:{"ecoli":1, "pf":20},
   4:{"ecoli":1, "pf":10},
   5:{"ecoli":1, "pf":3},
   6:{"ecoli":1, "pf":1},
   7:{"ecoli":3, "pf":1},
   8:{"ecoli":10, "pf":1},
   9:{"ecoli":20, "pf":1},
   10:{"ecoli":1, "pf":0},
   11:{"ecoli":0, "pf":0}
}
  • ignore_trials list: the trials (identified through the row & column well-plate coordinates) that will be ignored in the simulation.

  • ignore_timesteps list: the timesteps that will be ignored in the simulation.

  • species_identities_rows dict: the specification of strains for each member species, where it differs, per row in the well-plate experiments:

{
   1:{"ecoli":"mRuby"},
   2:{"ecoli":"ACS"},
   3:{"ecoli":"mRuby"},
   4:{"ecoli":"ACS"},
   5:{"ecoli":"mRuby"},
   6:{"ecoli":"ACS"}
}
  • significant_deviation float: the smallest multiple of a trial mean relative to its initial value that permits its inclusion in the simulation.

  • extract_zip_path str: the path of a zipped file that contents some or all of the files that must be loaded in the simulation.

fit()

The parsed experimental data is used to define and constrain a Global Linear Problem of the community system:

mscommfit.fit(parameters:dict=None, mets_to_track: list = None,
              rel_final_conc:dict=None, zero_start:list=None,
              abs_final_conc:dict=None, graphs: list = None,
              data_timesteps: dict = None, export_zip_name: str = None,
              export_parameters: bool = True, requisite_biomass: dict = None,
              export_lp: str = "CommPhitting.lp", figures_zip_name:str=None,
              publishing:bool=False, primals_export_path=None)
  • parameters dict: simulation parameters that will overwrite default and calculated options. The possible key values include

  • mets_to_track list: the ModelSEED ID”s of all compounds that will be graphically plotted, unless metabolites are specifically listed in a graph of the graphs argument.

  • rel_final_conc dict: the final concentration of a phenotype compound in the media that is normalized by its initial concentration: e.g.

{
   "cpd00179":0.1
}

denotes that the final concentration of Maltose is 10% of its initial concentration.

  • zero_start list: the compounds that possess a zero initial concentration, which is often assumed for cross-feeding compounds that are not provided in the media.

  • abs_final_conc dict: the final mM concentration of a phenotype compound in the media, which follows the same syntactic structure as the rel_final_conc parameter.

  • graphs list<dict>: the collection of graphs that will be plotted from the primal values after the simulation executes. Each dictionary in the list describes a figure, with descriptive keys that specify the type of figure, attributes of the figure, and the data that populates the figure. The trial key designates which experimental trial will be simulated. The experimental_data key accepts a boolean for whether the experimental growth data is overlaid as a scatter upon the predicted biomass plots, where the default is true. The content key designates what content of the trial will be plotted, with acceptable string values of

content option

Description

biomass

The g/L biomass of the defined phenotypes

total_biomass

The g/L biomass of the defined phenotypes and the total OD biomass of the complete community

conc

The mM concentration of the metabolites that are defined in either 1) an accompanying mets key that corresponds to a list of metabolites to plot, 2) the mets_to_track parameter of the function, or 3) all carbonaceous metabolites in the simulated phenotypes as a default.

Graphing designations for non-concentration figures can be tailored with the species and phenotype keys, which correspond lists of the species and phenotypes for which primal values will be graphed, or a string "*" can be passed as the value to denote all available species and phenotypes will be plotted. Finally, the parsed key accepts a boolean for whether the biomass plots are segregated for each species, which can alleviate busyness for complex communities. All of these plots are all defined with time on the x-axis, and either mM concentration or g/L on the y-axis depending upon the plotted content.

The following graphs argument samples the range of supported figures:

[
   {
       "trial":"G48",
       "phenotype": "*",
       "content": "biomass",
       "experimental_data": false
   },
   {
       "trial":"G48",
       "content": "conc"
   },
   {
       "trial":"G48",
       "phenotype": "*",
       "content": "biomass",
       "parsed": true
   },
   {
       "trial":"G48",
       "content": "total_biomass",
       "experimental_data": true
   }
]
  • data_timesteps dict: a list of timesteps for each short_code trial that will be simulated, which can be a more succinct tool for tailoring a simulation than specifying the timesteps to ignore from the full dataset.

  • export_zip_name str: the name of the zip file to which the simulation contents will be stored, where the omission of this parameter does not export content to a zip file.

  • export_parameters bool: specifies whether the simulation parameters will be exported as CSV to the current working directory.

  • requisite_biomass dict: the requisite amount of biomass that must grow for the prescribed final metabolite concentration to be achieved, according to the phenotype flux profiles. This is calculated in the MSCommPhitting initialization when community_members is defined, but this parameter option allows previous or custom objects to be provided for the simulation.

  • export_lp str: the name of the LP file, including the “.lp” extension, that will be exported to the current working directory. The default is “CommPhitting.lp”.

  • figures_zip_name str: the name of the zip file to which all of the figures will be exported, where omitting this argument exports the figures to the current working directory.

  • publishing bool: specifies whether figure proportions and attributes are tailored to make the figures more desirable for publication or poster formats.

  • primals_export_path str: the path to which simulation primal values will be exported, which defaults to the export_lp name with “json” extension.

fit_kcat()

This function simulates the defined community while implementing a range growth kinetic constants for each phenotype and refining the estimate of phenotype growth kinetics through a few iterative simulations. The parameters are identical to the fit() function:

mscommfit.fit_kcat(parameters:dict=None, mets_to_track: list = None,
                   rel_final_conc:dict=None, zero_start:list=None,
                       abs_final_conc:dict=None, graphs: list = None,
                       data_timesteps: dict = None, export_zip_name: str = None,
                       export_parameters: bool = True, requisite_biomass: dict = None,
                       export_lp: str = "CommPhitting.lp", figures_zip_name:str=None,
                       publishing:bool=False, primals_export_path=None)

Un-updated documentation

compute()

The Linear Problem is simulated, and the primal values are parsed, optionally exported, and visualized as figures.

mscommfit.compute(graphs=[], zip_name=None)
  • zip_name str: the name of the export zip file to which content will be exported.

graph()

Primal values are visualized as figures.

mscommfit.compute(graphs=[], primal_values_filename=None, primal_values_zip_path=None, zip_name=None, data_timestep_hr=0.163)
  • graph list: the graph specifications that specify which primal values will be graphed, which is elaborated above for the compute function.

  • primal_values_filename str: the name of the primal value JSON file (“primal_values.json”)

  • primal_values_zip_path str: the path of the zip file that contains the primal values file

  • zip_name str: the name of the export zip file to which content will be exported.

  • data_timestep_hr float: the timestep value in hours of the data that is being graphed. This permits graphing primal values without previously simulating a model. The value is automatically overwritten by previously defined data timesteps in the MSCommFitting class object.

load_model()

A JSON model file is imported.

mscommfit.load_model(mscomfit_json_path, zip_name=None, class_object=False)
  • mscomfit_json_path str: the path of the JSON model file that will be loaded and simulated.

  • zip_name str: the path of the zip file that contains the JSON model file.

  • class_object bool: specifies whether the loaded model will be defined in the class object.

returns model Optland.Model: The model that is loaded via the .

change_parameters()

Primal values are visualized figures.

mscommfit.load_model(cvt=None, cvf=None, diff=None, vmax=None, mscomfit_json_path="mscommfitting.json", zip_name=None, class_object=False)
  • cvt, cvf, diff, & vmax float or dict: the parameter values that will replace existing values in the LP file. The parameters may be defined as either floats, which will be applied globally to all applicable instances in the model, or as dictionaries that defined values at specific times and possibly at specific trials for a certain time. The latter follows a dictionary structure of param["time"]["trial"], where the “trial” level can be omitted to applied a parameter value at every trial of a time. A default value can also be specified in the dictionary param["default"] that applies to times+trials that are not captured by the defined conditions.

  • mscomfit_json_path str: the path of the JSON model file that will be loaded and simulated.

  • zip_name str: the zipfile to which the edited LP JSON will be exported .

Accessible content

Several objects within the MSCommFitting class may be useful for subsequent post-processing or troubleshooting:

  • problem Optlang.Model: the LP model of the experimental system that is simulated.

  • carbon_conc dict: the media concentrations per substrate as defined in carbon_conc_series.

  • variables & constraints dict: the complete collection of all variables and constraints that comprise the LP model.