Pengyi Yang - Omic Data Scientist
Pengyi Yang

About me

I obtained my PhD in bioinformatics from School of Information Technologies, The University of Sydney, in 2012. I then moved to the United States and completed an interdisciplinary Research Fellowship in Systems Biology Group, ESCBL, at National Institutes of Health on characterising transcriptomic and epigenomic regulations in embryonic stem cells (ESCs) using ultrafast sequencing data. I relocated back to Australia in late 2015 on a University of Sydney Postdoctoral Fellowship (DVCR) to pursue my own research in systems biology. I’m now affiliated with School of Mathematics and Statistics (SoMS); and Charles Perkins Centre, The University of Sydney. I have been offered a Lectureship in Statistics in April 2016 and is currently teaching STAT5003.


Our KinasePA shiny app is recently published in Proteomics. Read more: Text

Research interests

My research interests are in the broad areas of Computational and Systems Biology with a focus on cell signaling, epigenetic, and transcriptional networks. Specifically, I am interested in developing computational methods and statistical models to reconstruct and characterize signaling cascades, and epigenetic and transcriptional networks that underlie cellular homeostasis, proliferation, differentiation, and cell-fate decisions.

Studying biological pathways at a systems level is essential, for it is not always possible to understand the behavior of complex systems by scaling up properties of individual components. Systematic study of complex interactions in biological networks at a global viewpoint allows us to discover fundamental principles that are not intuitive and to uncover global properties that can only be discovered by integrating interactions between individual components. I am using systems biology approaches to integrate and analyze heterogeneous high-throughput “–omics” data with the goal of generating testable hypotheses and predictions. My ultimate objective is to discover critical cell signaling pathways that regulate epigenetic landscapes and gene expression programs controlling cell type identity. Results from these studies will contribute to the comprehensive understanding of the cross-talk among cell signaling, epigenetic, and transcriptional regulations.




✢: Co-first author
#: Corresponding/Co-corresponding author


Yang, P., Patrick, E., Humphrey, S., Ghazanfar, S., James, D., Jothi, R. & Yang, J. (2016). Positive-unlabeled ensemble learning for kinase substrate prediction from dynamic phosphoproteomics data. Proteomics, DOI: 10.1002/pmic.201600068 [Text]

Yang, P.#, Humphrey, S., James, D., Yang, J. & Jothi, R.# (2016). KinasePA: Phosphoproteomics data annotation using hypothesis driven kinase perturbation analysis. Bioinformatics, 32(2), 252-259. [Pubmed]


Yang, P.#, Zheng, X., Jayaswal, V., Hu, G., Yang, J. & Jothi, R. (2015). Knowledge-based analysis for detecting key signaling events from time-series phosphoproteomics data. PLoS Computational Biology, 11(8), e1004403. [Pubmed]

Pathania, R., Ramachandran, S., Elangovan, S., Padia, R., Yang, P., Cinghu, S., Veeranan-Karmegam, R., Fulzele, S., Pei, L., Chang, C., Choi, J., Shi, H., Manicassamy, S., Prasad, P., Sharma, S., Ganapathy, V., Jothi, R. & Thangaraju, M. (2015). DNMT1 is essential for mammary and cancer stem cell maintenance and tumorigenesis. Nature Communications, 6, 6910. [Pubmed]

Hoffman, N., Parker, B., Chaudhuri, R., Fisher-Wellman, K., Kleinert, M., Humphrey, S., Yang, P., Holliday, M., Trefely, S., Fazakerley, D., Stockli, J., Burchfield, J., Jensen, T., Jothi, R., Kiens, B., Wojtaszewski, J., Richter, E. & James, D. (2015). Global phosphoproteomic analysis of human skeletal muscle reveals a network of exercise-regulated kinases and AMPK substrates. Cell Metabolism, 22(5), 922-935. [Pubmed]


Oldfield, A., Yang, P., Conway, A., Cinghu, S., Freudenberg, J., Yellaboina, S. & Jothi, R. (2014). Histone-fold domain protein NF-Y promotes chromatin accessibility for cell type-specific master transcription factors. Molecular Cell, 55(5), 708-722. [Pubmed]

Yang, P., Patrick, E., Tan, S., Fazakerley, D., Burchfield, J., Gribben, C., Prior, M., James, D. & Yang, J. (2014). Direction pathway analysis of large-scale proteomics data reveals novel features of the insulin action pathway. Bioinformatics, 30(6), 808-814. [Pubmed]

Ma, X., Yang, P., Kaplan, W., Lee, B., Wu, L., Yang, J., Yasunaga, M., Sato, K., Chisholm, D. & James, D. (2014). ISL1 regulates peroxisome proliferator-activated receptor γ activation and early adipogenesis via bone morphogenetic protein 4-dependent and -independent mechanisms. Molecular and Cellular Biology, 34(19), 3607-3617. [Pubmed]

Lackford, B., Yao, C., Charles, G., Weng, L., Zheng, X., Choi, E., Xie, X., Wan, J., Xing, Y., Freudenberg, J., Yang, P., Jothi, R., Hu, G. & Shi, Y. (2014). Fip1 regulates mRNA alternative polyadenylation to promote stem cell self‐renewal. EMBO Journal, 33(8), 878-889. [Pubmed]

Yang, P.#, Yoo, P., Fernando, J., Zhou, B., Zhang, Z. & Zomaya, A. (2014). Sample subset optimization techniques for imbalanced and ensemble learning problems in bioinformatics applications. IEEE Transactions on Cybernetics, 44(3), 445-455. [IEEE Xplore] [PDF]


Humphrey, S., Yang, G., Yang, P., Fazakerley, D. J., Stöckli, J., Yang, J. & James, D. (2013). Dynamic adipocyte phosphoproteome reveals that Akt directly regulates mTORC2. Cell Metabolism, 17(6), 1009-1020. [Pubmed]

Yang, P., Liu, W., Zhou, B., Chawla, S. & Zomaya, A. (2013). Ensemble-based wrapper methods for feature selection and class imbalance learning. In Proceedings of the 17th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD),  Lecture Notes in Artificial Intelligence 7818, Springer Berlin Heidelberg, 544-555. [Text]

Yang, P., Yang, J., Zhou, B. & Zomaya. A. (2013). Stability of feature selection algorithms and ensemble feature selection methods in bioinformatics. In Biological Knowledge Discovery Handbook: Preprocessing, Mining and Postprocessing of Biological Data, Wiley, New Jersey, USA, 333-352. [PDF]


Yang, P., Humphrey, S., Fazakerley, D., Prior, M., Yang, G., James, D. & Yang, J. (2012). Re-fraction: a machine learning approach for deterministic identification of protein homologues and splice variants in large-scale MS-based proteomics. Journal of Proteome Research, 11(5), 3035-3045. [Pubmed]

Yang, P.#, Ma, J., Wang, P., Zhu, Y., Zhou, B. & Yang, J. (2012). Improving X! Tandem on peptide identification from mass spectrometry by self-boosted Percolator. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 9(5), 1273-1280. [Pubmed]

Wang, P., Yang, P. & Yang, J. (2012). OCAP: an open comprehensive analysis pipeline for iTRAQ. Bioinformatics, 28(10), 1404-1405. [Pubmed]


Yang, P.✢, #, Ho, J., Yang, J. & Zhou, B. (2011). Gene-gene interaction filtering with ensemble of filters. BMC Bioinformatics, 12, S10. [Pubmed]

Yang, P., Zhang, Z., Zhou, B. & Zomaya, A. (2011). Sample subset optimization for classifying imbalanced biological data. In Proceedings of the 15th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD),  Lecture Notes in Artificial Intelligence 6635, Springer Berlin Heidelberg, 333-344. [Text]


Yang, P.#, Ho, J., Zomaya, A. & Zhou, B. (2010). A genetic ensemble approach for gene-gene interaction identification. BMC Bioinformatics, 11(1), 524. [Pubmed]

Wang, P., Yang, P., Arthur, J. & Yang, J. (2010). A dynamic wavelet-based algorithm for pre-processing tandem mass spectrometry data. Bioinformatics, 26(18), 2242-2249. [Pubmed]

Yoo, P., Ho, Y., Ng, J., Charleston, M., Saksena, N., Yang, P. & Zomaya, A. (2010). Hierarchical kernel mixture models for the prediction of AIDS disease progression using HIV structural gp120 profiles. BMC Genomics, 11, S22. [Pubmed]

Yang, P.#, Zhang, Z., Zhou, B. & Zomaya, A. (2010). A clustering based hybrid system for biomarker selection and sample classification of mass spectrometry data. Neurocomputing, 73(13), 2317-2331. [Text]

Yang, P.#, Zhou, B., Zhang, Z. & Zomaya, A. (2010). A multi-filter enhanced genetic ensemble system for gene selection and sample classification of microarray data. BMC Bioinformatics, 11, S5. [Pubmed]

Yang, P., Yang, J., Zhou, B. & Zomaya, A. (2010). A review of ensemble methods in bioinformatics. Current Bioinformatics, 5(4), 296-308. [Text]

Li, L., Yang, P., Ou, L., Zhang, Z. & Cheng, P. (2010). Genetic algorithm-based multi-objective optimisation for QoS-aware web services composition. In Proceedings of the 4th International Conference on Knowledge Science, Engineering and Management (KSEM), Lecture Notes in Computer Science 6291, Springer Berlin Heidelberg, 549-554. [Text]


Yang, P.#, Xu, L., Zhou, B., Zhang, Z. & Zomaya, A. (2009). A particle swarm based hybrid system for imbalanced medical data sampling. BMC Genomics, 10, S34. [Pubmed]

Zhang, Z., Yang, P., Wu, X. & Zhang, C. (2009). An agent-based hybrid system for microarray data analysis. IEEE Intelligent Systems, 24(5), 53-63. [PDF]

Yang, P.# & Zhang, Z. (2009). An embedded two-layer feature selection approach for microarray data analysis. IEEE Intelligent Informatics Bulletin, 10(1), 24-32. [PDF]

Yang, P., Tao, L., Xu, L. & Zhang, Z. (2009). Multiagent framework for bio-data mining. In Proceedings of the 4th Rough Sets and Knowledge Technology (RSKT), Lecture Notes in Computer Science 5589, Springer Berlin Heidelberg, 200-207. [Text]


Zhang, Z. & Yang, P.# (2008). An ensemble of classifiers with genetic algorithm-based feature selection. IEEE Intelligent Informatics Bulletin, 9(1), 18-24. [PDF]

Yang, P. & Zhang, Z. (2008). A clustering based hybrid system for mass spectrometry data analysis. In Proceedings of the 3rd Pattern Recognition in Bioinformatics (PRIB), Lecture Notes in Bioinformatics 5265, Springer Berlin Heidelberg, 98-109. [Text]

Yang, P. & Zhang, Z. (2008). A hybrid approach to selecting susceptible single nucleotide polymorphisms for complex disease analysis. In Proceedings of BioMedical Engineering and Informatics (BMEI), IEEE, 214-218. [PDF]


Yang, P. & Zhang, Z. (2007). Hybrid methods to select informative gene sets in microarray data classification. In Proceedings of the 20th Australian Joint Conference on Artificial Intelligence (AI), Lecture Notes in Artificial Intelligence 4830, Springer Berlin Heidelberg, 811-815. [Text]


Level 5 West (5W83), D17
Charles Perkins Centre
School of Mathematics & Statistics
Faculty of Science
The University of Sydney
NSW, 2006

Mobile: +61-452536773
Email: pengyi DOT yang AT sydney DOT edu DOT au