Features¶

This module contains all the tools to compute feature values for molecular structure. Each submodule must be subclass deeprank.features.FeatureClass to inherit the export function. At the moment a few features have already been implemented. These are:

AtomicFeatures:Coulomb, van der Waals interactions and atomic charges
BSA : Burried Surface area
FullPSSM : Complete PSSM data
PSSM_IC : Information content of the PSSM
ResidueDensity : The residue density for polar/apolar/charged pairs

As you can see in the source each python file contained a __compute_feature__ function. This is the function called in deeprank.generate.

Here are detailed the class in charge of feature calculations.

Atomic Feature¶

class deeprank.features.AtomicFeature.AtomicFeature(pdbfile, chain1='A', chain2='B', param_charge=None, param_vdw=None, patch_file=None, contact_cutoff=8.5, verbose=False)[source]¶

Compute the Coulomb, van der Waals interaction and charges.

Parameters:

pdbfile (str) – pdb file of the molecule
chain1 (str) – First chain ID, defaults to ‘A’
chain2 (str) – Second chain ID, defaults to ‘B’
param_charge (str) – file name of the force field file containing the charges e.g. protein-allhdg5.4_new.top. Must be of the format: * CYM atom O type=O charge=-0.500 end * ALA atom N type=NH1 charge=-0.570 end
param_vdw (str) – file name of the force field containing vdw parameters e.g. protein-allhdg5.4_new.param. Must be of the format: * NONBonded CYAA 0.105 3.750 0.013 3.750 * NONBonded CCIS 0.105 3.750 0.013 3.750
patch_file (str) – file name of a valid patch file for the parameters e.g. patch.top. The way we handle the patching is very manual and should be made more automatic.
contact_cutoff (float) – the maximum distance in Å between 2 contact atoms.
verbose (bool) – print or not.

Examples

>>> pdb = '1AK4_100w.pdb'
>>>
>>> # get the force field included in deeprank
>>> # if another FF has been used to compute the ref
>>> # change also this path to the correct one
>>> FF = pkg_resources.resource_filename(
>>>     'deeprank.features','') + '/forcefield/'
>>>
>>> # declare the feature calculator instance
>>> atfeat = AtomicFeature(pdb,
>>>    param_charge = FF + 'protein-allhdg5-4_new.top',
>>>    param_vdw    = FF + 'protein-allhdg5-4_new.param',
>>>    patch_file   = FF + 'patch.top')
>>>
>>> # assign parameters
>>> atfeat.assign_parameters()
>>>
>>> # only compute the pair interactions here
>>> atfeat.evaluate_pair_interaction(save_interactions=test_name)
>>>
>>> # close the db
>>> atfeat.sqldb._close()

read_charge_file()[source]¶

Read the .top file given in entry.

This function creates:

self.charge: dictionary {(resname,atname):charge}
self.valid_resnames: list [‘VAL’,’ALP’, …..]
self.at_name_type_convertor: dict {(resname,atname):attype}

read_patch()[source]¶

Read the patchfile.

This function creates

self.patch_charge: Dict {(resName,atName): charge}

self.patch_type : Dict {(resName,atName): type}

read_vdw_file()[source]¶

Read the .param file.

The param file must be of the form:

NONBONDED ATNAME 0.10000 3.298765 0.100000 3.089222

First two numbers are for inter-chain interations

Last two nmbers are for intra-chain interactions (We only compute the interchain here)

This function creates

self.vdw: dictionary {attype:[E1,S1]}

get_contact_atoms()[source]¶

Get the contact atoms only select amino acids.

The ligands are not considered.

_extend_contact_to_residue()[source]¶: Extend the contact atoms to entire residue where one atom is contacting.

assign_parameters()[source]¶

Assign to each atom in the pdb its charge and vdw interchain parameters.

Directly deals with the patch so that we don’t loop over the residues multiple times.

static _get_altResName(resName, atNames)[source]¶

Apply the patch data.

This is adopted from preScan.pl This is very static and I don’t quite like it The structure of the dictionary is as following

{ NEWRESTYPE: ‘OLDRESTYPE’,: [atom types that must be present], [atom types that must NOT be present]]}

Parameters:	resName (str) – name of the residue atNames (list(str)) – names of the atoms

_get_vdw(resName, altResName, atNames)[source]¶

Get vdw itneraction terms.

Parameters:	resName (str) – name of the residue altResName (str) – alternative name of the residue atNames (list(str)) – names of the atoms

_get_charge(resName, altResName, atNames)[source]¶

Get the charge information.

Parameters:	resName (str) – name of the residue altResName (str) – alternative name of the residue atNames (list(str)) – names of the atoms

evaluate_charges(extend_contact_to_residue=False)[source]¶

Evaluate the charges.

Parameters:	extend_contact_to_residue (bool, optional) – extend to res

evaluate_pair_interaction(print_interactions=False, save_interactions=False)[source]¶

Evalaute the pair interactions (coulomb and vdw).

Parameters:	print_interactions (bool, optional) – print data to screen save_interactions (bool, optional) – save the itneractions to file.

compute_coulomb_interchain_only(dosum=True, contact_only=False)[source]¶

Compute the coulomb interactions between the chains only.

Parameters:	dosum (bool, optional) – sum the interaction terms for each atoms contact_only (bool, optional) – consider only contact atoms

compute_vdw_interchain_only(dosum=True, contact_only=False)[source]¶

Compute the vdw interactions between the chains only.

Parameters:	dosum (bool, optional) – sum the interaction terms for each atoms contact_only (bool, optional) – consider only contact atoms

static _prefactor_vdw(r)[source]¶: prefactor for vdw interactions.

deeprank.features.AtomicFeature.__compute_feature__(pdb_data, featgrp, featgrp_raw, chain1, chain2)[source]¶

Main function called in deeprank for the feature calculations.

Parameters:	pdb_data (list(bytes)) – pdb information featgrp (str) – name of the group where to save xyz-val data featgrp_raw (str) – name of the group where to save human readable data chain1 (str) – First chain ID chain2 (str) – Second chain ID

Buried Surface Area¶

class deeprank.features.BSA.BSA(pdb_data, chain1='A', chain2='B')[source]¶

Compute the burried surface area feature.

Freesasa is required for this feature. From Freesasa version 2.0.3 the Python bindings are released as a separate module. They can be installed using >>> pip install freesasa

Parameters:	pdb_data (list(byte) or str) – pdb data or pdb filename chain1 (str, optional) – name of the first chain chain2 (str, optional) – name of the second chain

Example

>>> bsa = BSA('1AK4.pdb')
>>> bsa.get_structure()
>>> bsa.get_contact_residue_sasa()
>>> bsa.sql._close()

get_structure()[source]¶: Get the pdb structure of the molecule.

get_contact_residue_sasa(cutoff=5.5)[source]¶

Compute the feature of BSA.

It generates following feature:

bsa

Raises:	`ValueError` – No interface residues found.

deeprank.features.BSA.__compute_feature__(pdb_data, featgrp, featgrp_raw, chain1, chain2)[source]¶

Main function called in deeprank for the feature calculations.

Parameters:	pdb_data (list(bytes)) – pdb information featgrp (str) – name of the group where to save xyz-val data featgrp_raw (str) – name of the group where to save human readable data chain1 (str) – First chain ID chain2 (str) – Second chain ID

FullPSSM¶

class deeprank.features.FullPSSM.FullPSSM(mol_name=None, pdb_file=None, chain1='A', chain2='B', pssm_path=None, pssm_format='new', out_type='pssmvalue')[source]¶

Compute all the PSSM data.

Simply extracts all the PSSM information and store that into features

Parameters:

mol_name (str) – name of the molecule. Defaults to None.
pdb_file (str) – name of the pdb_file. Defaults to None.
chain1 (str) – First chain ID. Defaults to ‘A’
chain2 (str) – Second chain ID. Defaults to ‘B’
pssm_path (str) – path to the pssm data. Defaults to None.
pssm_format (str) – “old” or “new” pssm format. Defaults to ‘new’.
out_type (str) – which feature to generate, ‘pssmvalue’ or ‘pssmic’. Defaults to ‘pssmvalue’. ‘pssm_format’ must be ‘new’ when set type is ‘pssmic’.

Examples

>>> path = '/home/test/PSSM_newformat/'
>>> pssm = FullPSSM(mol_name='2ABZ',
>>>                pdb_file='2ABZ_1w.pdb',
>>>                pssm_path=path)
>>> pssm.read_PSSM_data()
>>> pssm.get_feature_value()
>>> print(pssm.feature_data_xyz)

static get_ref_mol_name(mol_name)[source]¶: Get the bared mol name.

read_PSSM_data()[source]¶: Read the PSSM data into a dictionary.

get_feature_value(cutoff=5.5)[source]¶: get the feature value.

deeprank.features.FullPSSM.__compute_feature__(pdb_data, featgrp, featgrp_raw, chain1, chain2, out_type='pssmvalue')[source]¶

Main function called in deeprank for the feature calculations.

Parameters:	pdb_data (list(bytes)) – pdb information featgrp (str) – name of the group where to save xyz-val data featgrp_raw (str) – name of the group where to save human readable data chain1 (str) – First chain ID chain2 (str) – Second chain ID out_type (str) – which feature to generate, ‘pssmvalue’ or ‘pssmic’.

PSSM Information Content¶

class deeprank.features.PSSM_IC.PSSM_IC(mol_name=None, pdb_file=None, chain1='A', chain2='B', pssm_path=None, pssm_format='new', out_type='pssmvalue')[source]¶

Compute all the PSSM data.

Simply extracts all the PSSM information and store that into features

Parameters:

mol_name (str) – name of the molecule. Defaults to None.
pdb_file (str) – name of the pdb_file. Defaults to None.
chain1 (str) – First chain ID. Defaults to ‘A’
chain2 (str) – Second chain ID. Defaults to ‘B’
pssm_path (str) – path to the pssm data. Defaults to None.
pssm_format (str) – “old” or “new” pssm format. Defaults to ‘new’.
out_type (str) – which feature to generate, ‘pssmvalue’ or ‘pssmic’. Defaults to ‘pssmvalue’. ‘pssm_format’ must be ‘new’ when set type is ‘pssmic’.

Examples

>>> path = '/home/test/PSSM_newformat/'
>>> pssm = FullPSSM(mol_name='2ABZ',
>>>                pdb_file='2ABZ_1w.pdb',
>>>                pssm_path=path)
>>> pssm.read_PSSM_data()
>>> pssm.get_feature_value()
>>> print(pssm.feature_data_xyz)

deeprank.features.PSSM_IC.__compute_feature__(pdb_data, featgrp, featgrp_raw, chain1, chain2)[source]¶

Contact Residue Density¶

class deeprank.features.ResidueDensity.ResidueDensity(pdb_data, chain1='A', chain2='B')[source]¶

Compute the residue contacts between polar/apolar/charged residues.

Parameters:	pdb_data (list(byte) or str) – pdb data or pdb filename chain1 (str) – First chain ID. Defaults to ‘A’ chain2 (str) – Second chain ID. Defaults to ‘B’

Example

>>> rcd = ResidueDensity('1EWY_100w.pdb')
>>> rcd.get(cutoff=5.5)
>>> rcd.extract_features()

get(cutoff=5.5)[source]¶

Get residue contacts.

Raises:	`ValueError` – No residue contact found.

extract_features()[source]¶

Compute the feature of residue contacts between polar/apolar/charged residues.

It generates following features: RCD_apolar-apolar RCD_apolar-charged RCD_charged-charged RCD_polar-apolar RCD_polar-charged RCD_polar-polar RCD_total

class deeprank.features.ResidueDensity.residue_pair(res, rtype)[source]¶: Ancillary class that holds information for a given residue.

deeprank.features.ResidueDensity.__compute_feature__(pdb_data, featgrp, featgrp_raw, chain1, chain2)[source]¶

Main function called in deeprank for the feature calculations.

Parameters:	pdb_data (list(bytes)) – pdb information featgrp (str) – name of the group where to save xyz-val data featgrp_raw (str) – name of the group where to save human readable data chain1 (str) – First chain ID chain2 (str) – Second chain ID

Generic Feature Class¶

class deeprank.features.FeatureClass.FeatureClass(feature_type)[source]¶

Master class from which all the other feature classes should be derived.

Arguments: feature_type(str): ‘Atomic’ or ‘Residue’

Note

Each subclass must compute:

self.feature_data: dictionary of features in human readable format, e.g.
- for atomic features:
  
  {‘coulomb’: data_dict_clb, ‘vdwaals’: data_dict_vdw}
  
  data_dict_clb = {atom_info: [values]}
  
  atom_info = (chainID, resSeq, resName, name)
- for residue features:
  
  {‘PSSM_ALA’: data_dict_pssmALA, …}
  
  data_dict_pssmALA = {residue_info: [values]}
  
  residue_info = (chainID, resSeq, resName, name)
self.feature_data_xyz: dictionary of features in xyz-val format, e.g.
- {‘coulomb’: data_dict_clb, ‘vdwaals’: data_dict_vdw}
- data_dict_clb = {xyz_info: [values]}
- xyz_info = (chainNum, x, y, z)

export_data_hdf5(featgrp)[source]¶

Export the data in xyz-val format in an HDF5 file group.

Parameters:	{[hdf5_group]} -- The hdf5 group of the feature (featgrp) –

Note

For atomic features, the format of the data must be:

{(chainID, resSeq, resName, name): [values]}
For residue features, the format must be:

{(chainID, resSeq, resName): [values]}

export_dataxyz_hdf5(featgrp)[source]¶

Export the data in xyz-val format in an HDF5 file group.

Parameters:	{[hdf5_group]} -- The hdf5 group of the feature (featgrp) –

static get_residue_center(sql, centers=['CB', 'CA', 'mean'], res=None)[source]¶

Computes the center of each residue by trying different options

Keyword Arguments:
Parameters:	{pdb2sql} -- The pdb2sql instance (sql) –
	{list} -- list of strings (default (centers) – {[‘CB’,’CA’,’mean’]}) {list} -- list of residue to be considered (res) –
Raises:	`ValueError` – [description]
Returns:	[type] – list(res), list(xyz)