B i o A I L a b

Welcome To BioAi−Lab

Introduction

In this study, we propose AcidNetPro, a novel deep learning framework for acidophilic protein classification. Our approach employs a three-stage pipeline to achieve highly accurate predictions. First, we utilize the ESMC (ESM-C) protein language model to generate high-quality protein embeddings that capture rich semantic representations of amino acid sequences. These embeddings serve as the foundation for our feature representation. To address the challenge of limited training data and improve model generalization, we implement DCGAN-GP (Deep Convolutional Generative Adversarial Network with Gradient Penalty) for data augmentation. This generative approach creates synthetic protein embeddings that maintain the statistical properties of real acidophilic proteins while expanding the training dataset diversity. Finally, we employ a Lightweight Sparse Mixture of Experts (LSMoE) transformer architecture for feature optimization and classification. The sparse MoE mechanism enables the model to selectively activate relevant expert networks based on input characteristics, leading to more efficient and accurate predictions while maintaining computational efficiency. Our experimental results demonstrate that AcidNetPro significantly outperforms existing acidophilic protein prediction methods across multiple evaluation metrics. The integration of advanced protein language models, generative data augmentation, and sparse expert networks creates a powerful framework for protein property prediction that can be extended to other biological classification tasks.



Framework