mmtbx.ions.svm package

Submodules

Module contents

A set of functions to act upon ChemicalEnvironment and ScatteringEnvironment and produce a single class and vector of features for use with a classifier.

This module relies on a SVM classifier generated by the module within phenix_dev.ion_identification.nader_ml. See that module’s description for more details.

See also

phenix_dev.ion_identification.nader_ml

mmtbx.ions.svm.ion_anomalous_vector(scatter_env, elements=None, ratios=True, anom_peak=False)

Creates a vector of the anomalous features of a site. These can either include the f’’ / f’’_expected for a variety of ion identities or the exact anomalous peak height.

Parameters:
  • scatter_env (mmtbx.ions.environment.ScatteringEnvironment) – An object containing information about the scattering environment at a site.

  • elements (list of str, optional) – List of elements to include when calculating f’’_expected values. If unset, takes the list from mmtbx.ions.ALLOWED_IONS.

  • ratios (bool, optional) – If False, instead of calculating ratios, just return a vector of the wavelength, f’, and f’’.

  • anom_peak (bool, optional) – Whether to use the actual height of the anomalous map instead of the calculated f’’ values.

Returns:

A vector containing quantitative properties for classification.

Return type:

numpy.array of float

mmtbx.ions.svm.ion_class(chem_env)

Returns the class name associated with the ion, analogous to the chemical ID.

Parameters:

chem_env (mmtbx.ions.environment.ChemicalEnvironment) – The object to extract the ion class from.

Returns:

The class associated with the ion (i.e. “HOH” or “ZN”).

Return type:

str

mmtbx.ions.svm.ion_electron_density_vector(scatter_env, b_iso=False, occ=False, diff_peak=False)

Creates a vector containing information about the electron density around the site. Currently this only includes the site’s peak in the 2FoFc map. May be expanded in the future to include information about the volume of density around the site.

Parameters:
  • scatter_env (mmtbx.ions.environment.ScatteringEnvironment) – An object containing information about the scattering environment at a site.

  • b_iso (bool, optional) – Include the atom’s refined isotropic b-factor, divided by the mean b-factor of solvent molecules in the structure.

  • occ (bool, optional) – Include the atom’s refined occupancy.

  • diff_peak (bool, optional) – Include the difference map peak height.

Returns:

A vector containing quantitative properties for classification.

Return type:

numpy.array of float

mmtbx.ions.svm.ion_geometry_vector(chem_env, geometry_names=None)

Creates a vector for a site’s geometry. For each geometry in geometry_names the vector contains a 1 if that geometry is present at the site and 0 otherwise.

A single boolean was chosen after some trial and error with an SVM as differences in deviations < 15 degrees were not found to be significant in helping to diffentiate ions.

Parameters:
  • chem_env (mmtbx.ions.environment.ChemicalEnvironment) – A object containing information about the chemical environment at a site.

  • geometry_names (list of str, optional) – A list of geometry names to check for. If unset, take names from mmtbx.ions.SUPPORTED_GEOMETRY_NAMES.

Returns:

A vector containing quantitative properties for classification.

Return type:

numpy.array of float

mmtbx.ions.svm.ion_model_vector(scatter_env, d_min=True, nearest_res=0.5)

Creates a vector containing information about the general properties of the model in which the site is found. Currently this only includes the minimum resolution of the data set.

Parameters:
  • scatter_env (mmtbx.ions.environment.ScatteringEnvironment) – An object containing information about the scattering environment at a site.

  • d_min (bool, optional) – Include the high resolution associated with the model as a feature.

  • nearest_res (float, optional) – If not None, the nearest value to round d_min to. This value has no effect if nearest_res is False.

Returns:

A vector containing quantitative properties for classification.

Return type:

numpy.array of float

mmtbx.ions.svm.ion_nearby_atoms_vector(chem_env, environments=None)

Creates a vector for the identities of the ions surrounding a site. Returns a vector with a count of coordinating nitrogens, oxygens, sulfurs, and chlorines.

Parameters:
  • chem_env (mmtbx.ions.environment.ChemicalEnvironment) – A object containing information about the chemical environment at a site.

  • environments (list of int, optional) – A list of environments to check for. If unset, takes values from mmtbx.ions.environment.N_SUPPORTED_ENVIRONMENTS.

Returns:

A vector containing quantitative properties for classification.

Return type:

numpy.array of float

mmtbx.ions.svm.ion_valence_vector(chem_env, elements=None)

Calculate the BVS and VECSUM values for a variety of ion identities.

Parameters:
  • chem_env (mmtbx.ions.environment.ChemicalEnvironment) – A object containing information about the chemical environment at a site.

  • elements (list of str, optional) – List of elements to include when calculating BVS and VECSUM values. If unset, takes the list from mmtbx.ions.ALLOWED_IONS.

Returns:

A vector containing quantitative properties for classification.

Return type:

numpy.array of float

mmtbx.ions.svm.ion_vector(chem_env, scatter_env, use_scatter=True, use_chem=True, b_iso=True, occ=True, diff_peak=True, geometry=True, elements=None, valence=True, anom=True, ratios=True, anom_peak=False, d_min=True)

Creates a vector containing all of the useful properties contained within ion. Merges together the vectors from ion_*_vector().

Parameters:
  • chem_env (mmtbx.ions.environment.ChemicalEnvironment) – A object containing information about the chemical environment at a site.

  • scatter_env (mmtbx.ions.environment.ScatteringEnvironment, optional) – An object containing information about the scattering environment at a site.

  • use_scatter (bool, optional) – Include information derived from the scattering environment (b-factor, occupancy, FoFc peak, 2FoFc peak and width, f’’ values)

  • use_chem (bool, optional) – Include information derived from the chemical environment (Geometry, coordinating atom identities, valences).

  • b_iso (bool, optional) – Include the atom’s refined isotropic b-factor, divided by the mean b-factor of solvent molecules in the structure.

  • occ (bool, optional) – Include the atom’s refined occupancy.

  • diff_peak (bool, optional) – Include the difference map peak height.

  • geometry (bool, optional) – Include information about the presence of various geometries.

  • elements (list of str, optional) –

    List of elements to include when calculating BVS, VECSUM, and f’’_expected

    values. If unset, takes the list from mmtbx.ions.ALLOWED_IONS.

  • valence (bool, optional) – Include BVS and VECSUM values.

  • anom (bool, optional) – Include anomalous scattering information.

  • ratios (bool, optional) – Use f’’ / f’’ expected, instead of raw f’’ and f’ values.

  • anom_peak (bool, optional) – Whether to use the actual height of the anomalous map instead of the calculated f’’ values.

  • d_min (bool, optional) – Include the high resolution associated with the model as a feature.

Returns:

A vector containing quantitative properties for classification.

Return type:

numpy.array of float

class mmtbx.ions.svm.manager(fmodel, pdb_hierarchy, xray_structure, params=None, wavelength=None, connectivity=None, nproc=1, verbose=False, log=None)

Bases: manager

analyze_water(i_seq, debug=True, candidates=<libtbx.AutoType object>, filter_outputs=True)

Analyzes a single water site using a SVM to decide whether to re-assign it as an ion.

Parameters:
  • i_seq (int)

  • debug (bool, optional)

  • candidates (list of str, optional)

Return type:

svm_prediction or None

analyze_waters(out=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, debug=True, candidates=<libtbx.AutoType object>)

Uses a SVM to analyze all of a model’s water sites and decide whether to re-assign them as ions.

Parameters:
  • out (file, optional)

  • debug (bool, optional)

  • candidates (list of str, optional)

Return type:

list of svm_prediction

mmtbx.ions.svm.predict_ion(chem_env, scatter_env, elements=None, svm_name=None)

Uses the trained classifier to predict the ions that most likely fit a given list of features about the site.

Parameters:
  • chem_env (mmtbx.ions.environment.ChemicalEnvironment) – A object containing information about the chemical environment at a site.

  • scatter_env (mmtbx.ions.environment.ScatteringEnvironment, optional) – An object containing information about the scattering environment at a site.

  • elements (list of str, optional) – A list of elements to include within the prediction. Must be a subset of mmtbx.ions.svm.ALLOWED_IONS. Note: Water is not added to elements by default.

  • svm_name (str, optional) – The SVM to use for prediction. By default, the SVM trained on heavy atoms and calcium in the presence of anomalous data is used

Returns:

A sorted list of classes and the predicted probabilities associated with each or None if the trained classifier cannot be loaded.

Return type:

list of tuple of str, float or None

class mmtbx.ions.svm.svm_prediction(**kwds)

Bases: slots_getstate_setstate_default_initializer

Contains information about a SVM’s prediction of a site’s identity.

i_seq
Type:

int

pdb_id_str
Type:

str

atom_info_str
Type:

str

map_stats
Type:

group_args

atom_types
Type:

list of str

scores

Probabilities associated with each element listed by atom_types.

Type:

list of float

final_choice
Type:

mmtbx.ions.metal_parameters

atom_info_str
atom_types
final_choice
i_seq
map_stats
pdb_id_str
scores
show(out=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, prefix='')

Shows information about a SVM’s prediction of a site’s identity.

Parameters:
  • out (file, optional)

  • prefix (str, optional)

show_brief(out=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, prefix='')

Shows a brief description of a SVM’s prediction of a site’s identity, for use in output as a table.

Parameters:
  • out (file, optional)

  • prefix (str, optional)