Machine learning (ML) using big data is a powerful tool for predicting protein surface properties like hydrophobicity and binding affinity. Large amounts of data generated from diverse experimental techniques and molecular simulations provide an opportunity to build ML-based models capable of accurately predicting these properties, as well as identifying interaction hot spots and binding pockets. Proteins have distinct chemical and topographical features, which need to be encoded suitably in order to utilize the benefits of machine learning algorithms; hence, geometric data representations like point clouds, graphs, and voxel-based representations have demonstrated effectiveness in interaction prediction tasks. Moreover, generative models, such as GANs and VAEs, offer a platform to predict protein structure properties by generating new structures and interpolating between known structures with specific properties.
Selected References:
1.Sinha, Imee. Characterization of Protein Surface Hydrophobicity Using Molecular Dynamics Simulations and Deep Learning. Diss. Rensselaer Polytechnic Institute, 2022.
2.Kingma, Diederik P., and Max Welling. "Auto-encoding variational bayes." arXiv preprint arXiv:1312.6114 (2013).
3.https://lilianweng.github.io/posts/2018-08-12-vae/