[Studio Molekuul's Images/Canva]

Advanced AI Reveals How Proteins Act in Their Natural Contexts

Harvard researchers developed an AI model that overcomes the limitations of traditional protein analysis by considering cell and tissue environments
Emerging Technologies
Health & Medicine
by
|
August 20, 2024

A fish on land may still move its fins, but the outcome is vastly different when that fish is in water. This analogy, attributed to the renowned computer scientist Alan Kay, highlights the importance of context in understanding complex phenomena.

This principle is embodied in a groundbreaking AI tool called PINNACLE, the first of its kind, which applies Kay’s insight to studying proteins in their natural environments—within the specific tissues and cells where they function. Unlike traditional AI models, which often analyze proteins in isolation, PINNACLE considers the broader context, overcoming key limitations of earlier methods.

The development of PINNACLE, led by researchers at Harvard Medical School, marks a significant advance in the field of artificial intelligence, as detailed in Nature Methods.

“The natural world is interconnected, and PINNACLE helps identify these linkages, which we can use to gain more detailed knowledge about proteins and safer, more effective medications,” said study senior author Marinka Zitnik, assistant professor of biomedical informatics in the Blavatnik Institute at HMS. “It overcomes the limitations of current, context-free models and suggests the future direction for enhancing analyses of protein interactions.”

This innovation could significantly advance the understanding of proteins’ roles in health and disease, leading to more precise and personalized treatments. Moreover, PINNACLE is freely accessible to scientists worldwide.

A Significant Advancement

Understanding the intricate interactions between proteins and their neighboring biological molecules is a complex task. Current tools, while useful for analyzing the structure and properties of individual proteins, do not account for the influence of the surrounding cellular environment. These tools generate context-free protein representations, lacking the crucial information provided by the specific cell and tissue types in which proteins exist.

Proteins, composed of twenty different amino acids, are fundamental to numerous biological processes, such as oxygen transport, muscle contraction, digestion, and immune defense. The human body contains between 20,000 to hundreds of thousands of different proteins, all of which interact in complex networks within cells and tissues.

PINNACLE’s strength lies in its ability to account for the varying functions of proteins across different cell and tissue types. For example, the same protein might behave differently in a healthy lung cell compared to a diseased colon cell. PINNACLE can identify how these proteins interact within various contexts, revealing insights that single-protein models miss.

This context-sensitive approach enables PINNACLE to predict precise drug targets for malfunctioning proteins that contribute to disease, complementing traditional models by analyzing protein interactions in their specific cellular environments.

By enhancing the understanding of protein functions, PINNACLE can help researchers decode critical cellular processes and disease mechanisms. This capability is particularly valuable for identifying “druggable” proteins and predicting how different drugs might affect various cell types, making it an essential tool for scientists and drug developers.

Improving the drug discovery process is vital, as Zitnik notes, given that developing a new drug can take 10-15 years and cost up to a billion dollars. The path from discovery to market is fraught with challenges, with nearly 90 percent of drug candidates failing to become medicines.

Training and Expanding PINNACLE

The researchers trained PINNACLE using human cell data from a comprehensive multiorgan atlas, along with extensive networks of protein-protein interactions, cell type-to-cell type interactions, and tissues. As a result, PINNACLE can generate detailed protein representations that encompass 156 cell types across 62 tissues and organs, producing nearly 395,000 multidimensional representations—far surpassing the 22,000 representations possible with current models. Each cell type in PINNACLE’s dataset contains context-rich networks of about 2,500 proteins.

The model’s capacity is still growing. While it currently covers most human cell types, there are many more to be explored, including rare or difficult-to-study cells like brain neurons. To further expand PINNACLE’s cellular repertoire, Zitnik plans to leverage a data platform containing tens of millions of cells sampled from across the human body.

Related Articles

No items found.

Advanced AI Reveals How Proteins Act in Their Natural Contexts

by
August 20, 2024
[Studio Molekuul's Images/Canva]

Advanced AI Reveals How Proteins Act in Their Natural Contexts

by
August 20, 2024
[Studio Molekuul's Images/Canva]

A fish on land may still move its fins, but the outcome is vastly different when that fish is in water. This analogy, attributed to the renowned computer scientist Alan Kay, highlights the importance of context in understanding complex phenomena.

This principle is embodied in a groundbreaking AI tool called PINNACLE, the first of its kind, which applies Kay’s insight to studying proteins in their natural environments—within the specific tissues and cells where they function. Unlike traditional AI models, which often analyze proteins in isolation, PINNACLE considers the broader context, overcoming key limitations of earlier methods.

The development of PINNACLE, led by researchers at Harvard Medical School, marks a significant advance in the field of artificial intelligence, as detailed in Nature Methods.

“The natural world is interconnected, and PINNACLE helps identify these linkages, which we can use to gain more detailed knowledge about proteins and safer, more effective medications,” said study senior author Marinka Zitnik, assistant professor of biomedical informatics in the Blavatnik Institute at HMS. “It overcomes the limitations of current, context-free models and suggests the future direction for enhancing analyses of protein interactions.”

This innovation could significantly advance the understanding of proteins’ roles in health and disease, leading to more precise and personalized treatments. Moreover, PINNACLE is freely accessible to scientists worldwide.

A Significant Advancement

Understanding the intricate interactions between proteins and their neighboring biological molecules is a complex task. Current tools, while useful for analyzing the structure and properties of individual proteins, do not account for the influence of the surrounding cellular environment. These tools generate context-free protein representations, lacking the crucial information provided by the specific cell and tissue types in which proteins exist.

Proteins, composed of twenty different amino acids, are fundamental to numerous biological processes, such as oxygen transport, muscle contraction, digestion, and immune defense. The human body contains between 20,000 to hundreds of thousands of different proteins, all of which interact in complex networks within cells and tissues.

PINNACLE’s strength lies in its ability to account for the varying functions of proteins across different cell and tissue types. For example, the same protein might behave differently in a healthy lung cell compared to a diseased colon cell. PINNACLE can identify how these proteins interact within various contexts, revealing insights that single-protein models miss.

This context-sensitive approach enables PINNACLE to predict precise drug targets for malfunctioning proteins that contribute to disease, complementing traditional models by analyzing protein interactions in their specific cellular environments.

By enhancing the understanding of protein functions, PINNACLE can help researchers decode critical cellular processes and disease mechanisms. This capability is particularly valuable for identifying “druggable” proteins and predicting how different drugs might affect various cell types, making it an essential tool for scientists and drug developers.

Improving the drug discovery process is vital, as Zitnik notes, given that developing a new drug can take 10-15 years and cost up to a billion dollars. The path from discovery to market is fraught with challenges, with nearly 90 percent of drug candidates failing to become medicines.

Training and Expanding PINNACLE

The researchers trained PINNACLE using human cell data from a comprehensive multiorgan atlas, along with extensive networks of protein-protein interactions, cell type-to-cell type interactions, and tissues. As a result, PINNACLE can generate detailed protein representations that encompass 156 cell types across 62 tissues and organs, producing nearly 395,000 multidimensional representations—far surpassing the 22,000 representations possible with current models. Each cell type in PINNACLE’s dataset contains context-rich networks of about 2,500 proteins.

The model’s capacity is still growing. While it currently covers most human cell types, there are many more to be explored, including rare or difficult-to-study cells like brain neurons. To further expand PINNACLE’s cellular repertoire, Zitnik plans to leverage a data platform containing tens of millions of cells sampled from across the human body.

RELATED ARTICLE
Sign Up Now