Regulatory Network in Protein Phosphorylation
[Home] [About] [Browse Kinome] [Phosphorylation Prediction] [Expression Analysis] [Statistics] [Help] Version 1.0
Statistics:

Quick search:

 

Abstract

Protein phosphorylation catalyzed by kinases plays crucial regulatory roles in intracellular signal transduction. With the increasing number of experimental phosphorylation sites has been identified by mass spectrometry-based proteomics, the desire to explore the networks of protein kinases and substrates is motivated. Manning et al. have identified 518 human kinase genes, which provide a starting point for comprehensive analysis of protein phosphorylation networks. In this study, a knowledgebase is developed to integrate experimentally verified protein phosphorylation data and protein-protein interaction data for constructing the protein kinase-substrate phosphorylation networks in human. A total of 21110 experimental verified phosphorylation sites within 5092 human proteins are collected. However, only 4138 phosphorylation sites (~20%) have the annotation of catalytic kinases from public domain. In order to fully investigate how protein kinases regulate the intracellular processes, a published kinase-specific phosphorylation site prediction tool, named KinasePhos is incorporated for assigning the potential kinase. Additionally, time-coursed microarray expression data is subsequently used to represent the degree of similarity in the expression profiles of network members. A case study demonstrates that the proposed scheme not only identify the correct network of insulin signaling but also detect a novel signaling pathway that may cross-talk with insulin signaling network. The proposed web-based system, namely RegPhos, can let users input a group of protein names to be constructed the phosphorylation network associated with the information protein subcellular localization. Following figure demonstrates the system flow of RegPhos.

System Flow

The system flow of RegPhos is shown in Figure 1, including the collection of experimentally verified phosphorylation sites, identification of kinase-substrate interactions, and construction of intracellular phosphorylation networks. Microarray expression data is then used to validate the degree of similarity in the expression profiles of network members. The experimental verified phosphorylation sites are extracted from dbPTM (Lee, Huang et al. 2006) which has integrated version 7.0 of Phospho.ELM (Diella, Cameron et al. 2004), release 55.0 of UniProtKB/Swiss-Prot (Farriol-Mathis, Garavelli et al. 2004), and version 1.0 of PHOSIDA (Gnad, Ren et al. 2007). Especially, Human Protein Reference Database (HPRD) (Keshava Prasad, Goel et al. 2009), which integrates a wealth of information relevant to the function of human proteins in health and disease, is integrated in this work.
     Most of the experimental phosphorylation sites (~80%) do not have the annotation of catalytic kinases. To identify the catalytic kinase for each experimentally verified phosphorylation site without annotated kinase, we propose a method which incorporates computational models with protein associations for assigning the potential kinase. The association context for each kinase-substrate pair is investigated by the information of manually curated protein-protein interactions, functional associations (physical protein interactions, curated pathway, cooccurrence in literature abstracts, mRNA expression studies, and genomic context), and cellular co-localization. Logistic regression was adopted to evaluate the confidence value of protein-protein (kinase-substrate) interaction (Bebek and Yang 2007). In this study, a modified version of the Sharan et al. (Sharan, Suthram et al. 2005) method was utilized to evaluate the confidence values of the discovered kinase-substrate interactions. In the logistic regression model, we incorporate four sets of variables for a given interaction set, including (1) the prediction score of the kinase-specific SVM model, (2) the depth of interaction between kinase and substrate was observed, (3) the confidence score of the STRING functional association, and (4) the binary (0/1) protein subcellular localization data of interacting pairs.
     After the identification of catalytic kinase of experimental phosphorylation sites, the enriched kinase-substrate interactions can be used to construct the complete phosphorylation network. Graph-based method is adopted to formalize the construction of intracellular phosphorylation network to a shortest path problem in graph theory. In order to examine the degree of similarity in the expression profiles of network members, the human gene expression samples of Affymetrix GeneChip Human Genome U133 Array Set HG-U133A platform (GPL96) and Affymetrix GeneChip Human Genome U133 Plus 2.0 Array (GPL570) (Barrett, Troup et al. 2007), consisting of 22283 probe set for 12678 genes and 54681 probe sets for 18433 genes, respectively, are used to explore the co-expression of kinase and substrate genes.