iotbx.file_reader - generic file input¶
The iotbx.file_reader
module is intended to provide a single entry point
for reading most common crystallographic file formats. This allows the
programmer to use the underlying input functions without needing to know the
specific APIs in detail, although the resulting objects will still be
format-specific. It is also designed to support automatic file type
determination, first by guessing the format based on the file extension, then
by trying a succession of input methods until one finishes without an error.
This facility is used both for processing command-line arguments (especially
via the iotbx.phil
extensions), and for handling file input in the
Phenix GUI.
In the simplest case, reading a file requires only a single method:
>>> from iotbx.file_reader import any_file
>>> input_file = any_file(sys.argv[1:])
>>> file_data = input_file.file_object
Note that if the extension does not imply a particular format to try first, or if parsing using the appropriate module fails due to corrupted file data, this may be more inefficient than explicitly specifying the file type, and should be used only when the format is not known in advance. You can alternately specify which input module to use:
>>> pdb_in = any_file("model.pdb", force_type="pdb")
>>> pdb_in.assert_file_type("pdb")
>>> hierarchy = pdb_in.file_object.construct_hierarchy()
>>> mtz_in = any_file("data.mtz", force_type="hkl")
>>> miller_arrays = mtz_in.file_server.miller_arrays
This will skip the automatic format detection and only try the specified
input method. Several options are available for error handling; the default
behavior when force_type
is set is to pass through any exceptions
encountered when calling the underlying input method:
>>> from iotbx.file_reader import any_file
>>> f = any_file("model.pdb", force_type="hkl")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/nat/src/cctbx_project/iotbx/file_reader.py", line 197, in any_file
raise_sorry_if_not_expected_format=raise_sorry_if_not_expected_format)
File "/Users/nat/src/cctbx_project/iotbx/file_reader.py", line 251, in __init__
read_method()
File "/Users/nat/src/cctbx_project/iotbx/file_reader.py", line 341, in _try_as_hkl
assert (hkl_file.file_type() is not None), "Not a valid reflections file."
AssertionError: Not a valid reflections file.
This is fine for internal use when an unexpected file parsing error is likely
to be a bug in the code, but less suitable when processing user input.
Alternately, a libtbx.utils.Sorry
exception may be raised instead:
>>> f = any_file("model.pdb", force_type="hkl", raise_sorry_if_errors=True)
Sorry: Couldn't read 'model.pdb' as file type 'hkl': Not a valid reflections file.
For PDB, MTZ, and CIF files (the most commonly used formats in macromolecular
crystallograph), it is also possible to get similar behavior by treating the
file extension as an implicit replacement for force_type
:
>>> from iotbx.file_reader import any_file
>>> f = any_file("model.sca")
>>> print f.file_type
pdb
>>> f = any_file("model.sca", raise_sorry_if_not_expected_format=True)
Sorry: File format error:
Not a valid reflections file.
The allowed file types are specified in the module:
standard_file_descriptions = {
'pdb' : "Model",
'hkl' : "Reflections",
'cif' : "Restraints",
'seq' : "Sequence",
'xplor_map' : "XPLOR map",
'ccp4_map' : "CCP4 map",
'phil' : "Parameters",
'xml' : "XML",
'pkl' : "Python pickle",
'txt' : "Text",
'mtz' : "Reflections (MTZ)",
'aln' : "Sequence alignment",
'hhr' : "HHpred alignment",
'img' : "Detector image",
}
However, in most cases only a subset of these will be tried automatically.
API documentation¶
Generic file input module, used in Phenix GUI and elsewhere. This trades some loss of efficiency for a simplified API for loading any file type more or less automatically. It will first try to guess the format based on the extension; if this fails, it will then try other formats. This is used on the command line and the Phenix GUI to process bulk file input. In most other cases a specific file type is desired, and the force_type argument will ensure that only this format is attempted.
Examples
>>> from iotbx.file_reader import any_file
>>> input_file = any_file(sys.argv[1:])
>>> file_data = input_file.file_object
>>> pdb_in = any_file("model.pdb", force_type="pdb")
>>> pdb_in.assert_file_type("pdb")
>>> hierarchy = pdb_in.file_object.hierarchy
>>> mtz_in = any_file("data.mtz", force_type="hkl")
>>> miller_arrays = mtz_in.file_server.miller_arrays
- exception iotbx.file_reader.FormatError¶
Bases:
Sorry
- iotbx.file_reader.any_file(file_name, get_processed_file=False, valid_types=['pdb', 'hkl', 'cif', 'pkl', 'ncs', 'seq', 'phil', 'aln', 'a3m', 'txt', 'xplor_map', 'ccp4_map'], extensions_absolutely_defining_type=['ccp4', 'mrc'], allow_directories=False, force_type=None, input_class=None, raise_sorry_if_errors=False, raise_sorry_if_not_expected_format=False)¶
Main input method, wrapper for any_file_input class.
- Parameters:
file_name – path to file (relative or absolute)
get_processed_file – TODO
valid_types – file types to consider
allow_directories – process directory if given as file_name
force_type – read as this format, don’t try any others
input_class – optional substitute for any_file_input, with additional parsers
raise_sorry_if_errors – raise a Sorry exception if parsing fails (used with force_type)
raise_sorry_if_not_expected_format – raise a Sorry exception if the file extension does not match the parsed file type
extensions_absolutely_defining_type – if the file has one of these extensions, only try the associated file type
- Returns:
any_file_input object, or an instance of the input_class param
- iotbx.file_reader.any_file_fast(file_name, get_processed_file=False, valid_types=['pdb', 'hkl', 'cif', 'pkl', 'ncs', 'seq', 'phil', 'aln', 'a3m', 'txt', 'xplor_map', 'ccp4_map'], allow_directories=False, force_type=None, input_class=None)¶
mimics any_file, but without parsing - will instead guess the file type from the extension. for most output files produced by cctbx/phenix this is relatively safe; for files of unknown provenance it is less effective.
- class iotbx.file_reader.any_file_input(file_name, get_processed_file, valid_types, force_type, raise_sorry_if_errors=False, raise_sorry_if_not_expected_format=False)¶
Bases:
object
Container for file data of any supported type. Usually obtained via the any_file() function rather than being instantiated directly.
- assert_file_type(expected_type)¶
Verify that the automatically determined file type is the expected format.
- check_file_type(expected_type=None, multiple_formats=())¶
Verify that the automatically determined file type is the expected format, with the option to consider multiple formats.
- crystal_symmetry()¶
Extract the crystal symmetry (if any). Only valid for model (PDB/mmCIF) and reflection files.
- property file_content¶
Return the underlying format-specific object containing file data.
- file_info(show_file_size=True)¶
Format a string containing the file type and size.
- property file_name¶
- property file_object¶
Synonym for file_content()
- property file_server¶
For reflection files only, returns an
iotbx.reflection_file_utils.reflection_file_server
object containing the extracted Miller arrays. Note that this will implicitly merge any non-unique observations.
- property file_type¶
Return a string representing the generic data type, for example ‘pdb’ or ‘hkl’. Note that this is not necessarily the same as the underlying format, for example ‘pdb’ can mean either PDB or mmCIF format, and ‘hkl’ could mean MTZ, CIF, XDS, Scalepack, or SHELX format.
- set_file_type(file_type)¶
- show_summary(out=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)¶
Print out some basic information about the file.
- try_all_types()¶
- iotbx.file_reader.find_closest_base_name(file_name, base_name, templates)¶
- iotbx.file_reader.get_wildcard_string(format)¶
- iotbx.file_reader.get_wildcard_strings(formats, include_any=True)¶
- class iotbx.file_reader.group_files(file_names, template_format='pdb', group_by_directory=True)¶
Bases:
object
- iotbx.file_reader.guess_file_type(file_name, extensions={'a3m': ['a3m'], 'aln': ['aln', 'ali', 'clustal'], 'ccp4_map': ['ccp4', 'map', 'mrc'], 'cif': ['cif', 'mmcif'], 'hhr': ['hhr'], 'hkl': ['mtz', 'hkl', 'sca', 'cns', 'xplor', 'cv', 'ref', 'fobs'], 'img': ['img', 'osc', 'mccd', 'cbf', 'nxs', 'h5', 'hdf5'], 'map': ['xplor', 'map', 'ccp4'], 'mtz': ['mtz'], 'ncs': ['ncs', 'ncs_spec'], 'pdb': ['pdb', 'ent'], 'phil': ['params', 'eff', 'def', 'phil', 'param'], 'pkl': ['pickle', 'pkl'], 'rosetta': ['gz'], 'sdf': ['sdf'], 'seq': ['fa', 'faa', 'seq', 'pir', 'dat', 'fasta'], 'smi': ['smi'], 'txt': ['txt', 'log', 'html', 'geo'], 'xml': ['xml'], 'xplor_map': ['xplor', 'map']})¶
- iotbx.file_reader.sort_by_file_type(file_names, sort_order=None)¶
- iotbx.file_reader.splitext(file_name)¶
- Parameters:
file_name – A plain text string of the file path
Returns: A tuple of the base filename, the file format extension, and possibly a compression extension
- iotbx.file_reader.strip_shelx_format_extension(file_name)¶