Avinash Yaganapu, Sai Phani Parsa, and Mingon Kang (all Computer Science) published a paper, "," in Bioinformatics.
Many biological interaction datasets (e.g., protein-compound interactions) suffer from a fundamental problem: we often only observe positive examples. Reliable negative samples are rarely available, which makes it difficult to train conventional machine learning models. In our new work, we address this challenge by developing BIN-PU, a novel positive鈥搖nlabeled learning framework for predicting bacterial protein鈥揷ompound interactions. Instead of requiring negative samples, our approach generates reliable pseudo labels and allows deep learning models to learn effectively from positive-only datasets. Using bacterial cytochrome P450 datasets, the framework shows substantial improvements over existing approaches and strong generalization across datasets.
More broadly, this work highlights how AI methods can unlock biological insights even from incomplete datasets, which are common in many areas of computational biology.