cctbx.xray.structure_factors.from_scatterers_direct_parallel module

class cctbx.xray.structure_factors.from_scatterers_direct_parallel.direct_summation_cuda_platform(float_t='double')

Bases: direct_summation_simple

validate_platform_resources()
class cctbx.xray.structure_factors.from_scatterers_direct_parallel.direct_summation_simple

Bases: object

use_alt_parallel = True
class cctbx.xray.structure_factors.from_scatterers_direct_parallel.fcalc_container(fcalc)

Bases: object

f_calc()
class cctbx.xray.structure_factors.from_scatterers_direct_parallel.pprocess(instance, algorithm, verbose=False)

Bases: object

Class to represent a worker process

Software requirements for running direct summation using Cuda parallel processing units:

  1. numpy package (http://numpy.scipy.org) version 1.5.1 OK

  2. pycuda package, version 2011.1 OK

    • more information
    • suggested install scheme for Linux (not necessarily exact):

      ../base/bin/python configure.py --cuda-root=/usr/common/usg/cuda/3.2 \
      --cudadrv-lib-dir=/usr/common/usg/nvidia-driver-util/3.2/lib64 \
      --boost-inc-dir=$HOME/boostbuild/include \
      --boost-lib-dir=$HOME/boostbuild/lib \
      --boost-python-libname=boost_python \
      --boost-thread-libname=boost_thread
      make install
      
  3. gcc 4.4.2 or higher is required for Linux build of pycuda 2011.1

  4. boost_adaptbx.boost.python is required for pycuda; cctbx-installed version is probably OK, not tested. tests were performed with separately-installed boost:

    cd boost_1_45_0
    ./bootstrap.sh --prefix=$HOME/boostbuild --libdir=$HOME/boostbuild/lib \
    --with-python=$HOME/build/base/bin/python \
    --with-libraries=signals,thread,python
    ./bjam variant=release link=shared install
    
  5. cuda 3.2 separately installed is required for pycuda 2011.1 (http://developer.nvidia.com/object/cuda_3_2_downloads.html)

Suggested hardware as tested:
  • Nvidia Tesla C2050 (Fermi, compute capability 2.0): 225-fold performance improvement over CPU.

  • Nvidia Tesla C1060 (compute capability 1.3): 24-fold performance improvement.

prepare_gaussians_symmetries_cell(algorithm)
prepare_miller_arrays_for_cuda(algorithm, verbose=False)

The number of miller indices and atoms each must be an exact multiple of the BLOCKSIZE (32), so zero-padding is employed

prepare_scattering_sites_for_cuda(algorithm, verbose=False)
print_diagnostics()
validate_the_inputs(manager, cuda, algorithm)