iotbx.file_reader - generic file input

The iotbx.file_reader module is intended to provide a single entry point for reading most common crystallographic file formats. This allows the programmer to use the underlying input functions without needing to know the specific APIs in detail, although the resulting objects will still be format-specific. It is also designed to support automatic file type determination, first by guessing the format based on the file extension, then by trying a succession of input methods until one finishes without an error. This facility is used both for processing command-line arguments (especially via the iotbx.phil extensions), and for handling file input in the Phenix GUI.

In the simplest case, reading a file requires only a single method:

>>> from iotbx.file_reader import any_file
>>> input_file = any_file(sys.argv[1:])
>>> file_data = input_file.file_object

Note that if the extension does not imply a particular format to try first, or if parsing using the appropriate module fails due to corrupted file data, this may be more inefficient than explicitly specifying the file type, and should be used only when the format is not known in advance. You can alternately specify which input module to use:

>>> pdb_in = any_file("model.pdb", force_type="pdb")
>>> pdb_in.assert_file_type("pdb")
>>> hierarchy = pdb_in.file_object.construct_hierarchy()

>>> mtz_in = any_file("data.mtz", force_type="hkl")
>>> miller_arrays = mtz_in.file_server.miller_arrays

This will skip the automatic format detection and only try the specified input method. Several options are available for error handling; the default behavior when force_type is set is to pass through any exceptions encountered when calling the underlying input method:

>>> from iotbx.file_reader import any_file
>>> f = any_file("model.pdb", force_type="hkl")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/nat/src/cctbx_project/iotbx/file_reader.py", line 197, in any_file
    raise_sorry_if_not_expected_format=raise_sorry_if_not_expected_format)
  File "/Users/nat/src/cctbx_project/iotbx/file_reader.py", line 251, in __init__
    read_method()
  File "/Users/nat/src/cctbx_project/iotbx/file_reader.py", line 341, in _try_as_hkl
    assert (hkl_file.file_type() is not None), "Not a valid reflections file."
AssertionError: Not a valid reflections file.

This is fine for internal use when an unexpected file parsing error is likely to be a bug in the code, but less suitable when processing user input. Alternately, a libtbx.utils.Sorry exception may be raised instead:

>>> f = any_file("model.pdb", force_type="hkl", raise_sorry_if_errors=True)
Sorry: Couldn't read 'model.pdb' as file type 'hkl': Not a valid reflections file.

For PDB, MTZ, and CIF files (the most commonly used formats in macromolecular crystallograph), it is also possible to get similar behavior by treating the file extension as an implicit replacement for force_type:

>>> from iotbx.file_reader import any_file
>>> f = any_file("model.sca")
>>> print f.file_type
pdb
>>> f = any_file("model.sca", raise_sorry_if_not_expected_format=True)
Sorry: File format error:
Not a valid reflections file.

The allowed file types are specified in the module:

standard_file_descriptions = {
  'pdb'  : "Model",
  'hkl'  : "Reflections",
  'cif'  : "Restraints",
  'seq'  : "Sequence",
  'xplor_map'  : "XPLOR map",
  'ccp4_map' : "CCP4 map",
  'phil' : "Parameters",
  'xml'  : "XML",
  'pkl'  : "Python pickle",
  'txt'  : "Text",
  'mtz'  : "Reflections (MTZ)",
  'aln'  : "Sequence alignment",
  'hhr'  : "HHpred alignment",
  'img'  : "Detector image",
}

However, in most cases only a subset of these will be tried automatically.

API documentation

Generic file input module, used in Phenix GUI and elsewhere. This trades some loss of efficiency for a simplified API for loading any file type more or less automatically. It will first try to guess the format based on the extension; if this fails, it will then try other formats. This is used on the command line and the Phenix GUI to process bulk file input. In most other cases a specific file type is desired, and the force_type argument will ensure that only this format is attempted.

Examples

>>> from iotbx.file_reader import any_file
>>> input_file = any_file(sys.argv[1:])
>>> file_data = input_file.file_object
>>> pdb_in = any_file("model.pdb", force_type="pdb")
>>> pdb_in.assert_file_type("pdb")
>>> hierarchy = pdb_in.file_object.hierarchy
>>> mtz_in = any_file("data.mtz", force_type="hkl")
>>> miller_arrays = mtz_in.file_server.miller_arrays
exception iotbx.file_reader.FormatError

Bases: Sorry

iotbx.file_reader.any_file(file_name, get_processed_file=False, valid_types=['pdb', 'hkl', 'cif', 'pkl', 'ncs', 'seq', 'phil', 'aln', 'a3m', 'txt', 'xplor_map', 'ccp4_map'], extensions_absolutely_defining_type=['ccp4', 'mrc'], allow_directories=False, force_type=None, input_class=None, raise_sorry_if_errors=False, raise_sorry_if_not_expected_format=False)

Main input method, wrapper for any_file_input class.

Parameters:
  • file_name – path to file (relative or absolute)

  • get_processed_file – TODO

  • valid_types – file types to consider

  • allow_directories – process directory if given as file_name

  • force_type – read as this format, don’t try any others

  • input_class – optional substitute for any_file_input, with additional parsers

  • raise_sorry_if_errors – raise a Sorry exception if parsing fails (used with force_type)

  • raise_sorry_if_not_expected_format – raise a Sorry exception if the file extension does not match the parsed file type

  • extensions_absolutely_defining_type – if the file has one of these extensions, only try the associated file type

Returns:

any_file_input object, or an instance of the input_class param

iotbx.file_reader.any_file_fast(file_name, get_processed_file=False, valid_types=['pdb', 'hkl', 'cif', 'pkl', 'ncs', 'seq', 'phil', 'aln', 'a3m', 'txt', 'xplor_map', 'ccp4_map'], allow_directories=False, force_type=None, input_class=None)

mimics any_file, but without parsing - will instead guess the file type from the extension. for most output files produced by cctbx/phenix this is relatively safe; for files of unknown provenance it is less effective.

class iotbx.file_reader.any_file_fast_input(file_name, valid_types)

Bases: object

class iotbx.file_reader.any_file_input(file_name, get_processed_file, valid_types, force_type, raise_sorry_if_errors=False, raise_sorry_if_not_expected_format=False)

Bases: object

Container for file data of any supported type. Usually obtained via the any_file() function rather than being instantiated directly.

assert_file_type(expected_type)

Verify that the automatically determined file type is the expected format.

check_file_type(expected_type=None, multiple_formats=())

Verify that the automatically determined file type is the expected format, with the option to consider multiple formats.

crystal_symmetry()

Extract the crystal symmetry (if any). Only valid for model (PDB/mmCIF) and reflection files.

property file_content

Return the underlying format-specific object containing file data.

file_info(show_file_size=True)

Format a string containing the file type and size.

property file_name
property file_object

Synonym for file_content()

property file_server

For reflection files only, returns an iotbx.reflection_file_utils.reflection_file_server object containing the extracted Miller arrays. Note that this will implicitly merge any non-unique observations.

property file_type

Return a string representing the generic data type, for example ‘pdb’ or ‘hkl’. Note that this is not necessarily the same as the underlying format, for example ‘pdb’ can mean either PDB or mmCIF format, and ‘hkl’ could mean MTZ, CIF, XDS, Scalepack, or SHELX format.

set_file_type(file_type)
show_summary(out=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)

Print out some basic information about the file.

try_all_types()
class iotbx.file_reader.directory_input(dir_name)

Bases: object

file_info(show_file_size=False)
iotbx.file_reader.find_closest_base_name(file_name, base_name, templates)
iotbx.file_reader.get_wildcard_string(format)
iotbx.file_reader.get_wildcard_strings(formats, include_any=True)
class iotbx.file_reader.group_files(file_names, template_format='pdb', group_by_directory=True)

Bases: object

iotbx.file_reader.guess_file_type(file_name, extensions={'a3m': ['a3m'], 'aln': ['aln', 'ali', 'clustal'], 'ccp4_map': ['ccp4', 'map', 'mrc'], 'cif': ['cif', 'mmcif'], 'hhr': ['hhr'], 'hkl': ['mtz', 'hkl', 'sca', 'cns', 'xplor', 'cv', 'ref', 'fobs'], 'img': ['img', 'osc', 'mccd', 'cbf', 'nxs', 'h5', 'hdf5'], 'map': ['xplor', 'map', 'ccp4'], 'mtz': ['mtz'], 'ncs': ['ncs', 'ncs_spec'], 'pdb': ['pdb', 'ent'], 'phil': ['params', 'eff', 'def', 'phil', 'param'], 'pkl': ['pickle', 'pkl'], 'rosetta': ['gz'], 'sdf': ['sdf'], 'seq': ['fa', 'faa', 'seq', 'pir', 'dat', 'fasta'], 'smi': ['smi'], 'txt': ['txt', 'log', 'html', 'geo'], 'xml': ['xml'], 'xplor_map': ['xplor', 'map']})
iotbx.file_reader.sort_by_file_type(file_names, sort_order=None)
iotbx.file_reader.splitext(file_name)
Parameters:

file_name – A plain text string of the file path

Returns: A tuple of the base filename, the file format extension, and possibly a compression extension

iotbx.file_reader.strip_shelx_format_extension(file_name)