Features

This module contains all the tools to compute feature values for molecular structure. Each submodule must be subclass deeprank.features.FeatureClass to inherit the export function. At the moment a few features have already been implemented. These are:
  • AtomicFeatures:Coulomb, van der Waals interactions and atomic charges
  • BSA : Burried Surface area
  • FullPSSM : Complete PSSM data
  • PSSM_IC : Information content of the PSSM
  • ResidueDensity : The residue density for polar/apolar/charged pairs

As you can see in the source each python file contained a __compute_feature__ function. This is the function called in deeprank.generate.

Here are detailed the class in charge of feature calculations.

Atomic Feature

class deeprank.features.AtomicFeature.AtomicFeature(pdbfile, chain1='A', chain2='B', param_charge=None, param_vdw=None, patch_file=None, contact_cutoff=8.5, verbose=False)[source]

Compute the Coulomb, van der Waals interaction and charges.

Parameters:
  • pdbfile (str) – pdb file of the molecule
  • chain1 (str) – First chain ID, defaults to ‘A’
  • chain2 (str) – Second chain ID, defaults to ‘B’
  • param_charge (str) – file name of the force field file containing the charges e.g. protein-allhdg5.4_new.top. Must be of the format: * CYM atom O type=O charge=-0.500 end * ALA atom N type=NH1 charge=-0.570 end
  • param_vdw (str) – file name of the force field containing vdw parameters e.g. protein-allhdg5.4_new.param. Must be of the format: * NONBonded CYAA 0.105 3.750 0.013 3.750 * NONBonded CCIS 0.105 3.750 0.013 3.750
  • patch_file (str) – file name of a valid patch file for the parameters e.g. patch.top. The way we handle the patching is very manual and should be made more automatic.
  • contact_cutoff (float) – the maximum distance in Å between 2 contact atoms.
  • verbose (bool) – print or not.

Examples

>>> pdb = '1AK4_100w.pdb'
>>>
>>> # get the force field included in deeprank
>>> # if another FF has been used to compute the ref
>>> # change also this path to the correct one
>>> FF = pkg_resources.resource_filename(
>>>     'deeprank.features','') + '/forcefield/'
>>>
>>> # declare the feature calculator instance
>>> atfeat = AtomicFeature(pdb,
>>>    param_charge = FF + 'protein-allhdg5-4_new.top',
>>>    param_vdw    = FF + 'protein-allhdg5-4_new.param',
>>>    patch_file   = FF + 'patch.top')
>>>
>>> # assign parameters
>>> atfeat.assign_parameters()
>>>
>>> # only compute the pair interactions here
>>> atfeat.evaluate_pair_interaction(save_interactions=test_name)
>>>
>>> # close the db
>>> atfeat.sqldb._close()
read_charge_file()[source]

Read the .top file given in entry.

This function creates:

  • self.charge: dictionary {(resname,atname):charge}
  • self.valid_resnames: list [‘VAL’,’ALP’, …..]
  • self.at_name_type_convertor: dict {(resname,atname):attype}
read_patch()[source]

Read the patchfile.

This function creates

  • self.patch_charge: Dict {(resName,atName): charge}
  • self.patch_type : Dict {(resName,atName): type}
read_vdw_file()[source]

Read the .param file.

The param file must be of the form:

NONBONDED ATNAME 0.10000 3.298765 0.100000 3.089222

  • First two numbers are for inter-chain interations
  • Last two nmbers are for intra-chain interactions (We only compute the interchain here)

This function creates

  • self.vdw: dictionary {attype:[E1,S1]}
get_contact_atoms()[source]

Get the contact atoms only select amino acids.

The ligands are not considered.

_extend_contact_to_residue()[source]

Extend the contact atoms to entire residue where one atom is contacting.

assign_parameters()[source]

Assign to each atom in the pdb its charge and vdw interchain parameters.

Directly deals with the patch so that we don’t loop over the residues multiple times.

static _get_altResName(resName, atNames)[source]

Apply the patch data.

This is adopted from preScan.pl This is very static and I don’t quite like it The structure of the dictionary is as following

{ NEWRESTYPE: ‘OLDRESTYPE’,
[atom types that must be present], [atom types that must NOT be present]]}
Parameters:
  • resName (str) – name of the residue
  • atNames (list(str)) – names of the atoms
_get_vdw(resName, altResName, atNames)[source]

Get vdw itneraction terms.

Parameters:
  • resName (str) – name of the residue
  • altResName (str) – alternative name of the residue
  • atNames (list(str)) – names of the atoms
_get_charge(resName, altResName, atNames)[source]

Get the charge information.

Parameters:
  • resName (str) – name of the residue
  • altResName (str) – alternative name of the residue
  • atNames (list(str)) – names of the atoms
evaluate_charges(extend_contact_to_residue=False)[source]

Evaluate the charges.

Parameters:extend_contact_to_residue (bool, optional) – extend to res
evaluate_pair_interaction(print_interactions=False, save_interactions=False)[source]

Evalaute the pair interactions (coulomb and vdw).

Parameters:
  • print_interactions (bool, optional) – print data to screen
  • save_interactions (bool, optional) – save the itneractions to file.
compute_coulomb_interchain_only(dosum=True, contact_only=False)[source]

Compute the coulomb interactions between the chains only.

Parameters:
  • dosum (bool, optional) – sum the interaction terms for each atoms
  • contact_only (bool, optional) – consider only contact atoms
compute_vdw_interchain_only(dosum=True, contact_only=False)[source]

Compute the vdw interactions between the chains only.

Parameters:
  • dosum (bool, optional) – sum the interaction terms for each atoms
  • contact_only (bool, optional) – consider only contact atoms
static _prefactor_vdw(r)[source]

prefactor for vdw interactions.

deeprank.features.AtomicFeature.__compute_feature__(pdb_data, featgrp, featgrp_raw, chain1, chain2)[source]

Main function called in deeprank for the feature calculations.

Parameters:
  • pdb_data (list(bytes)) – pdb information
  • featgrp (str) – name of the group where to save xyz-val data
  • featgrp_raw (str) – name of the group where to save human readable data
  • chain1 (str) – First chain ID
  • chain2 (str) – Second chain ID

Buried Surface Area

class deeprank.features.BSA.BSA(pdb_data, chain1='A', chain2='B')[source]

Compute the burried surface area feature.

Freesasa is required for this feature. From Freesasa version 2.0.3 the Python bindings are released as a separate module. They can be installed using >>> pip install freesasa

Parameters:
  • pdb_data (list(byte) or str) – pdb data or pdb filename
  • chain1 (str, optional) – name of the first chain
  • chain2 (str, optional) – name of the second chain

Example

>>> bsa = BSA('1AK4.pdb')
>>> bsa.get_structure()
>>> bsa.get_contact_residue_sasa()
>>> bsa.sql._close()
get_structure()[source]

Get the pdb structure of the molecule.

get_contact_residue_sasa(cutoff=5.5)[source]

Compute the feature of BSA.

It generates following feature:
bsa
Raises:ValueError – No interface residues found.
deeprank.features.BSA.__compute_feature__(pdb_data, featgrp, featgrp_raw, chain1, chain2)[source]

Main function called in deeprank for the feature calculations.

Parameters:
  • pdb_data (list(bytes)) – pdb information
  • featgrp (str) – name of the group where to save xyz-val data
  • featgrp_raw (str) – name of the group where to save human readable data
  • chain1 (str) – First chain ID
  • chain2 (str) – Second chain ID

FullPSSM

class deeprank.features.FullPSSM.FullPSSM(mol_name=None, pdb_file=None, chain1='A', chain2='B', pssm_path=None, pssm_format='new', out_type='pssmvalue')[source]

Compute all the PSSM data.

Simply extracts all the PSSM information and store that into features
Parameters:
  • mol_name (str) – name of the molecule. Defaults to None.
  • pdb_file (str) – name of the pdb_file. Defaults to None.
  • chain1 (str) – First chain ID. Defaults to ‘A’
  • chain2 (str) – Second chain ID. Defaults to ‘B’
  • pssm_path (str) – path to the pssm data. Defaults to None.
  • pssm_format (str) – “old” or “new” pssm format. Defaults to ‘new’.
  • out_type (str) – which feature to generate, ‘pssmvalue’ or ‘pssmic’. Defaults to ‘pssmvalue’. ‘pssm_format’ must be ‘new’ when set type is ‘pssmic’.

Examples

>>> path = '/home/test/PSSM_newformat/'
>>> pssm = FullPSSM(mol_name='2ABZ',
>>>                pdb_file='2ABZ_1w.pdb',
>>>                pssm_path=path)
>>> pssm.read_PSSM_data()
>>> pssm.get_feature_value()
>>> print(pssm.feature_data_xyz)
static get_ref_mol_name(mol_name)[source]

Get the bared mol name.

read_PSSM_data()[source]

Read the PSSM data into a dictionary.

get_feature_value(cutoff=5.5)[source]

get the feature value.

deeprank.features.FullPSSM.__compute_feature__(pdb_data, featgrp, featgrp_raw, chain1, chain2, out_type='pssmvalue')[source]

Main function called in deeprank for the feature calculations.

Parameters:
  • pdb_data (list(bytes)) – pdb information
  • featgrp (str) – name of the group where to save xyz-val data
  • featgrp_raw (str) – name of the group where to save human readable data
  • chain1 (str) – First chain ID
  • chain2 (str) – Second chain ID
  • out_type (str) – which feature to generate, ‘pssmvalue’ or ‘pssmic’.

PSSM Information Content

class deeprank.features.PSSM_IC.PSSM_IC(mol_name=None, pdb_file=None, chain1='A', chain2='B', pssm_path=None, pssm_format='new', out_type='pssmvalue')[source]

Compute all the PSSM data.

Simply extracts all the PSSM information and store that into features
Parameters:
  • mol_name (str) – name of the molecule. Defaults to None.
  • pdb_file (str) – name of the pdb_file. Defaults to None.
  • chain1 (str) – First chain ID. Defaults to ‘A’
  • chain2 (str) – Second chain ID. Defaults to ‘B’
  • pssm_path (str) – path to the pssm data. Defaults to None.
  • pssm_format (str) – “old” or “new” pssm format. Defaults to ‘new’.
  • out_type (str) – which feature to generate, ‘pssmvalue’ or ‘pssmic’. Defaults to ‘pssmvalue’. ‘pssm_format’ must be ‘new’ when set type is ‘pssmic’.

Examples

>>> path = '/home/test/PSSM_newformat/'
>>> pssm = FullPSSM(mol_name='2ABZ',
>>>                pdb_file='2ABZ_1w.pdb',
>>>                pssm_path=path)
>>> pssm.read_PSSM_data()
>>> pssm.get_feature_value()
>>> print(pssm.feature_data_xyz)
deeprank.features.PSSM_IC.__compute_feature__(pdb_data, featgrp, featgrp_raw, chain1, chain2)[source]

Contact Residue Density

class deeprank.features.ResidueDensity.ResidueDensity(pdb_data, chain1='A', chain2='B')[source]

Compute the residue contacts between polar/apolar/charged residues.

Parameters:
  • pdb_data (list(byte) or str) – pdb data or pdb filename
  • chain1 (str) – First chain ID. Defaults to ‘A’
  • chain2 (str) – Second chain ID. Defaults to ‘B’

Example

>>> rcd = ResidueDensity('1EWY_100w.pdb')
>>> rcd.get(cutoff=5.5)
>>> rcd.extract_features()
get(cutoff=5.5)[source]

Get residue contacts.

Raises:ValueError – No residue contact found.
extract_features()[source]

Compute the feature of residue contacts between polar/apolar/charged residues.

It generates following features: RCD_apolar-apolar RCD_apolar-charged RCD_charged-charged RCD_polar-apolar RCD_polar-charged RCD_polar-polar RCD_total

class deeprank.features.ResidueDensity.residue_pair(res, rtype)[source]

Ancillary class that holds information for a given residue.

deeprank.features.ResidueDensity.__compute_feature__(pdb_data, featgrp, featgrp_raw, chain1, chain2)[source]

Main function called in deeprank for the feature calculations.

Parameters:
  • pdb_data (list(bytes)) – pdb information
  • featgrp (str) – name of the group where to save xyz-val data
  • featgrp_raw (str) – name of the group where to save human readable data
  • chain1 (str) – First chain ID
  • chain2 (str) – Second chain ID

Generic Feature Class

class deeprank.features.FeatureClass.FeatureClass(feature_type)[source]

Master class from which all the other feature classes should be derived.

Arguments
feature_type(str): ‘Atomic’ or ‘Residue’

Note

Each subclass must compute:

  • self.feature_data: dictionary of features in human readable format, e.g.
    • for atomic features:
      • {‘coulomb’: data_dict_clb, ‘vdwaals’: data_dict_vdw}
      • data_dict_clb = {atom_info: [values]}
      • atom_info = (chainID, resSeq, resName, name)
    • for residue features:
      • {‘PSSM_ALA’: data_dict_pssmALA, …}
      • data_dict_pssmALA = {residue_info: [values]}
      • residue_info = (chainID, resSeq, resName, name)
  • self.feature_data_xyz: dictionary of features in xyz-val format, e.g.
    • {‘coulomb’: data_dict_clb, ‘vdwaals’: data_dict_vdw}
    • data_dict_clb = {xyz_info: [values]}
    • xyz_info = (chainNum, x, y, z)
export_data_hdf5(featgrp)[source]

Export the data in xyz-val format in an HDF5 file group.

Parameters:{[hdf5_group]} -- The hdf5 group of the feature (featgrp) –

Note

  • For atomic features, the format of the data must be:
    {(chainID, resSeq, resName, name): [values]}
  • For residue features, the format must be:
    {(chainID, resSeq, resName): [values]}
export_dataxyz_hdf5(featgrp)[source]

Export the data in xyz-val format in an HDF5 file group.

Parameters:{[hdf5_group]} -- The hdf5 group of the feature (featgrp) –
static get_residue_center(sql, centers=['CB', 'CA', 'mean'], res=None)[source]

Computes the center of each residue by trying different options

Parameters:

{pdb2sql} -- The pdb2sql instance (sql) –

Keyword Arguments:
 
  • {list} -- list of strings (default (centers) – {[‘CB’,’CA’,’mean’]})
  • {list} -- list of residue to be considered (res) –
Raises:

ValueError – [description]

Returns:

[type] – list(res), list(xyz)