A statistical method to predict protein pK a has been developed by using the 3D structure of a protein and a database of 434 experimental protein pK a values. Each pK a in the database is associated with a fingerprint that describes the chemical environment around an ionizable residue. A computational tool, MoKaBio, has been developed to identify automatically ionizable residues in a protein, generate fingerprints that describe the chemical environment around such residues, and predict pK a from the experimental pK a values in the database by using a similarity metric. The method, which retrieved the pK a of 429 of the 434 ionizable sites in the database correctly, was crossvalidated by leave-one-out and yielded root mean square error (RMSE) = 0.95, a result that is superior to that obtained by using the Null Model (RMSE 1.07) and other well-established protein pK a prediction tools. This novel approach is suitable to rationalize protein pK a by comparing the region around the ionizable site with similar regions whose ionizable site pK a is known. The pK a of residues that have a unique environment not represented in the training set cannot be predicted accurately, however, the method offers the advantage of being trainable to increase its predictive power.

Predicting protein pK(a) by environment similarity

STORCHI, LORIANO;
2009-01-01

Abstract

A statistical method to predict protein pK a has been developed by using the 3D structure of a protein and a database of 434 experimental protein pK a values. Each pK a in the database is associated with a fingerprint that describes the chemical environment around an ionizable residue. A computational tool, MoKaBio, has been developed to identify automatically ionizable residues in a protein, generate fingerprints that describe the chemical environment around such residues, and predict pK a from the experimental pK a values in the database by using a similarity metric. The method, which retrieved the pK a of 429 of the 434 ionizable sites in the database correctly, was crossvalidated by leave-one-out and yielded root mean square error (RMSE) = 0.95, a result that is superior to that obtained by using the Null Model (RMSE 1.07) and other well-established protein pK a prediction tools. This novel approach is suitable to rationalize protein pK a by comparing the region around the ionizable site with similar regions whose ionizable site pK a is known. The pK a of residues that have a unique environment not represented in the training set cannot be predicted accurately, however, the method offers the advantage of being trainable to increase its predictive power.
File in questo prodotto:
File Dimensione Formato  
22363_ftp.pdf

Solo gestori archivio

Tipologia: PDF editoriale
Dimensione 297.66 kB
Formato Adobe PDF
297.66 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11564/225430
Citazioni
  • ???jsp.display-item.citation.pmc??? 7
  • Scopus 19
  • ???jsp.display-item.citation.isi??? 17
social impact