Engineered Human Therapies

An AI Approach That Balances Genomic Discovery with Privacy

KAUST's groundbreaking AI model adeptly balances the acceleration of genomic discoveries with the crucial need for individual privacy protection

Feb 29, 2024

Striking a harmonious balance between the relentless pursuit of medical breakthroughs and the imperative of personal privacy, a visionary team at King Abdullah University of Science and Technology (KAUST) is embarking on an unprecedented journey. Their mission navigates the intricate landscape of genomic data, leveraging the prowess of artificial intelligence (AI) for monumental medical discoveries while staunchly guarding the sanctity of individual privacy. This endeavor marks a pivotal chapter in the evolving narrative of medical research, where AI's potential is harnessed without compromising the confidentiality that every individual's data deserves. Findings from the new study were published recently in Science Advances.

“Omics data is a treasure trove that can reveal much about a person’s health,” says Xin Gao of KAUST. “But AI, especially deep learning, can betray the confidence of this data. We’re striving for a harmony between using this data and respecting individual privacy.”

Traditionally, researchers have encrypted data to protect privacy. But encryption is a bit like a heavy curtain—it needs to be pulled back for AI training, and this process can be cumbersome. Even after training, the AI model can unintentionally hold onto private details, limiting its use to highly secure environments.

Another method, akin to dividing a dance into smaller steps, involves breaking the data into packets for separate training. Known as local training or federated learning, this approach still risks a misstep, potentially leaking private details into the AI model. Here, differential privacy comes into play, ensuring each data step is masked to protect privacy. But this technique, while safe, can lead to a “noisy” model, muddying the precision needed for gene-based research.

Juexiao Zhou, a Ph.D. student in Gao’s group, illuminates their novel solution. “We’ve added a new move to this privacy dance—a decentralized shuffling algorithm. It’s like having many dancers, each blindfolded, ensuring no one knows the other's steps, thus maintaining privacy.”

This innovative approach, PPML-Omics, was tested on three multi-omics tasks using deep-learning models. The results were impressive. PPML-Omics not only outperformed other methods in efficiency and effectiveness but also stood strong against sophisticated cyberattacks.

“We applied PPML-Omics to analyze data from three sequencing technologies and addressed the privacy concern in three major tasks of omic data under three representative deep learning models,” the authors wrote. “We examined privacy breaches in depth through privacy attack experiments and demonstrated that PPML-Omics could protect patients’ privacy. In each of these applications, PPML-Omics was able to outperform methods of comparison under the same level of privacy guarantee, demonstrating the versatility of the method in simultaneously balancing the privacy-preserving capability and utility in omic data analysis.”

Gao adds a note of caution, “As we increasingly apply deep learning to biological data, we must remember it’s like a sponge—it can absorb a lot of private information. Our responsibility is to ensure that this doesn’t happen.”

In this dance of discovery and discretion, the KAUST team is setting a new rhythm, one where the power of AI in medical research can be fully realized without stepping on the toes of privacy.