Features¶
- This module contains all the tools to compute feature values for molecular structure. Each submodule must be subclass
deeprank.features.FeatureClass
to inherit the export function. At the moment a few features have already been implemented. These are: AtomicFeatures
:Coulomb, van der Waals interactions and atomic chargesBSA
: Burried Surface areaFullPSSM
: Complete PSSM dataPSSM_IC
: Information content of the PSSMResidueDensity
: The residue density for polar/apolar/charged pairs
As you can see in the source each python file contained a __compute_feature__
function. This is the function called in deeprank.generate
.
Here are detailed the class in charge of feature calculations.
Atomic Feature¶
-
class
deeprank.features.AtomicFeature.
AtomicFeature
(pdbfile, chain1='A', chain2='B', param_charge=None, param_vdw=None, patch_file=None, contact_cutoff=8.5, verbose=False)[source]¶ Compute the Coulomb, van der Waals interaction and charges.
Parameters: - pdbfile (str) – pdb file of the molecule
- chain1 (str) – First chain ID, defaults to ‘A’
- chain2 (str) – Second chain ID, defaults to ‘B’
- param_charge (str) – file name of the force field file containing the charges e.g. protein-allhdg5.4_new.top. Must be of the format: * CYM atom O type=O charge=-0.500 end * ALA atom N type=NH1 charge=-0.570 end
- param_vdw (str) – file name of the force field containing vdw parameters e.g. protein-allhdg5.4_new.param. Must be of the format: * NONBonded CYAA 0.105 3.750 0.013 3.750 * NONBonded CCIS 0.105 3.750 0.013 3.750
- patch_file (str) – file name of a valid patch file for the parameters e.g. patch.top. The way we handle the patching is very manual and should be made more automatic.
- contact_cutoff (float) – the maximum distance in Å between 2 contact atoms.
- verbose (bool) – print or not.
Examples
>>> pdb = '1AK4_100w.pdb' >>> >>> # get the force field included in deeprank >>> # if another FF has been used to compute the ref >>> # change also this path to the correct one >>> FF = pkg_resources.resource_filename( >>> 'deeprank.features','') + '/forcefield/' >>> >>> # declare the feature calculator instance >>> atfeat = AtomicFeature(pdb, >>> param_charge = FF + 'protein-allhdg5-4_new.top', >>> param_vdw = FF + 'protein-allhdg5-4_new.param', >>> patch_file = FF + 'patch.top') >>> >>> # assign parameters >>> atfeat.assign_parameters() >>> >>> # only compute the pair interactions here >>> atfeat.evaluate_pair_interaction(save_interactions=test_name) >>> >>> # close the db >>> atfeat.sqldb._close()
-
read_charge_file
()[source]¶ Read the .top file given in entry.
This function creates:
- self.charge: dictionary {(resname,atname):charge}
- self.valid_resnames: list [‘VAL’,’ALP’, …..]
- self.at_name_type_convertor: dict {(resname,atname):attype}
-
read_patch
()[source]¶ Read the patchfile.
This function creates
- self.patch_charge: Dict {(resName,atName): charge}
- self.patch_type : Dict {(resName,atName): type}
-
read_vdw_file
()[source]¶ Read the .param file.
The param file must be of the form:
NONBONDED ATNAME 0.10000 3.298765 0.100000 3.089222
- First two numbers are for inter-chain interations
- Last two nmbers are for intra-chain interactions (We only compute the interchain here)
This function creates
- self.vdw: dictionary {attype:[E1,S1]}
-
get_contact_atoms
()[source]¶ Get the contact atoms only select amino acids.
The ligands are not considered.
-
_extend_contact_to_residue
()[source]¶ Extend the contact atoms to entire residue where one atom is contacting.
-
assign_parameters
()[source]¶ Assign to each atom in the pdb its charge and vdw interchain parameters.
Directly deals with the patch so that we don’t loop over the residues multiple times.
-
static
_get_altResName
(resName, atNames)[source]¶ Apply the patch data.
This is adopted from preScan.pl This is very static and I don’t quite like it The structure of the dictionary is as following
- { NEWRESTYPE: ‘OLDRESTYPE’,
- [atom types that must be present], [atom types that must NOT be present]]}
Parameters:
-
evaluate_charges
(extend_contact_to_residue=False)[source]¶ Evaluate the charges.
Parameters: extend_contact_to_residue (bool, optional) – extend to res
-
evaluate_pair_interaction
(print_interactions=False, save_interactions=False)[source]¶ Evalaute the pair interactions (coulomb and vdw).
Parameters:
-
compute_coulomb_interchain_only
(dosum=True, contact_only=False)[source]¶ Compute the coulomb interactions between the chains only.
Parameters:
Buried Surface Area¶
-
class
deeprank.features.BSA.
BSA
(pdb_data, chain1='A', chain2='B')[source]¶ Compute the burried surface area feature.
Freesasa is required for this feature. From Freesasa version 2.0.3 the Python bindings are released as a separate module. They can be installed using >>> pip install freesasa
Parameters: Example
>>> bsa = BSA('1AK4.pdb') >>> bsa.get_structure() >>> bsa.get_contact_residue_sasa() >>> bsa.sql._close()
-
get_contact_residue_sasa
(cutoff=5.5)[source]¶ Compute the feature of BSA.
- It generates following feature:
- bsa
Raises: ValueError
– No interface residues found.
-
FullPSSM¶
-
class
deeprank.features.FullPSSM.
FullPSSM
(mol_name=None, pdb_file=None, chain1='A', chain2='B', pssm_path=None, pssm_format='new', out_type='pssmvalue')[source]¶ Compute all the PSSM data.
Simply extracts all the PSSM information and store that into featuresParameters: - mol_name (str) – name of the molecule. Defaults to None.
- pdb_file (str) – name of the pdb_file. Defaults to None.
- chain1 (str) – First chain ID. Defaults to ‘A’
- chain2 (str) – Second chain ID. Defaults to ‘B’
- pssm_path (str) – path to the pssm data. Defaults to None.
- pssm_format (str) – “old” or “new” pssm format. Defaults to ‘new’.
- out_type (str) – which feature to generate, ‘pssmvalue’ or ‘pssmic’. Defaults to ‘pssmvalue’. ‘pssm_format’ must be ‘new’ when set type is ‘pssmic’.
Examples
>>> path = '/home/test/PSSM_newformat/' >>> pssm = FullPSSM(mol_name='2ABZ', >>> pdb_file='2ABZ_1w.pdb', >>> pssm_path=path) >>> pssm.read_PSSM_data() >>> pssm.get_feature_value() >>> print(pssm.feature_data_xyz)
-
deeprank.features.FullPSSM.
__compute_feature__
(pdb_data, featgrp, featgrp_raw, chain1, chain2, out_type='pssmvalue')[source]¶ Main function called in deeprank for the feature calculations.
Parameters: - pdb_data (list(bytes)) – pdb information
- featgrp (str) – name of the group where to save xyz-val data
- featgrp_raw (str) – name of the group where to save human readable data
- chain1 (str) – First chain ID
- chain2 (str) – Second chain ID
- out_type (str) – which feature to generate, ‘pssmvalue’ or ‘pssmic’.
PSSM Information Content¶
-
class
deeprank.features.PSSM_IC.
PSSM_IC
(mol_name=None, pdb_file=None, chain1='A', chain2='B', pssm_path=None, pssm_format='new', out_type='pssmvalue')[source]¶ Compute all the PSSM data.
Simply extracts all the PSSM information and store that into featuresParameters: - mol_name (str) – name of the molecule. Defaults to None.
- pdb_file (str) – name of the pdb_file. Defaults to None.
- chain1 (str) – First chain ID. Defaults to ‘A’
- chain2 (str) – Second chain ID. Defaults to ‘B’
- pssm_path (str) – path to the pssm data. Defaults to None.
- pssm_format (str) – “old” or “new” pssm format. Defaults to ‘new’.
- out_type (str) – which feature to generate, ‘pssmvalue’ or ‘pssmic’. Defaults to ‘pssmvalue’. ‘pssm_format’ must be ‘new’ when set type is ‘pssmic’.
Examples
>>> path = '/home/test/PSSM_newformat/' >>> pssm = FullPSSM(mol_name='2ABZ', >>> pdb_file='2ABZ_1w.pdb', >>> pssm_path=path) >>> pssm.read_PSSM_data() >>> pssm.get_feature_value() >>> print(pssm.feature_data_xyz)
Contact Residue Density¶
-
class
deeprank.features.ResidueDensity.
ResidueDensity
(pdb_data, chain1='A', chain2='B')[source]¶ Compute the residue contacts between polar/apolar/charged residues.
Parameters: Example
>>> rcd = ResidueDensity('1EWY_100w.pdb') >>> rcd.get(cutoff=5.5) >>> rcd.extract_features()
-
get
(cutoff=5.5)[source]¶ Get residue contacts.
Raises: ValueError
– No residue contact found.
-
-
class
deeprank.features.ResidueDensity.
residue_pair
(res, rtype)[source]¶ Ancillary class that holds information for a given residue.
Generic Feature Class¶
-
class
deeprank.features.FeatureClass.
FeatureClass
(feature_type)[source]¶ Master class from which all the other feature classes should be derived.
- Arguments
- feature_type(str): ‘Atomic’ or ‘Residue’
Note
Each subclass must compute:
- self.feature_data: dictionary of features in human readable format, e.g.
- for atomic features:
- {‘coulomb’: data_dict_clb, ‘vdwaals’: data_dict_vdw}
- data_dict_clb = {atom_info: [values]}
- atom_info = (chainID, resSeq, resName, name)
- for residue features:
- {‘PSSM_ALA’: data_dict_pssmALA, …}
- data_dict_pssmALA = {residue_info: [values]}
- residue_info = (chainID, resSeq, resName, name)
- self.feature_data_xyz: dictionary of features in xyz-val format, e.g.
- {‘coulomb’: data_dict_clb, ‘vdwaals’: data_dict_vdw}
- data_dict_clb = {xyz_info: [values]}
- xyz_info = (chainNum, x, y, z)
-
export_data_hdf5
(featgrp)[source]¶ Export the data in xyz-val format in an HDF5 file group.
Parameters: {[hdf5_group]} -- The hdf5 group of the feature (featgrp) – Note
- For atomic features, the format of the data must be:
- {(chainID, resSeq, resName, name): [values]}
- For residue features, the format must be:
- {(chainID, resSeq, resName): [values]}
-
export_dataxyz_hdf5
(featgrp)[source]¶ Export the data in xyz-val format in an HDF5 file group.
Parameters: {[hdf5_group]} -- The hdf5 group of the feature (featgrp) –
-
static
get_residue_center
(sql, centers=['CB', 'CA', 'mean'], res=None)[source]¶ Computes the center of each residue by trying different options
Parameters: {pdb2sql} -- The pdb2sql instance (sql) –
Keyword Arguments: - {list} -- list of strings (default (centers) – {[‘CB’,’CA’,’mean’]})
- {list} -- list of residue to be considered (res) –
Raises: ValueError
– [description]Returns: [type] – list(res), list(xyz)