Visual comparison of the difference in structural prediction performance of AlphaFold2 (orange) against BaseFold (cyan) in the CASP15 and CAMEO competitions. Exemplified here with protein targets T1113 (bacteriophage T7 polymerase inhibitor, left) and 8SSD (methionine synthase, right), BaseFold’s predictions are much closer to the laboratory-validated structures (beige). The white arrows highlight areas where AlphaFold2’s predictions are significantly inaccurate.

Basecamp Research Unveils BaseFold: Revolutionary Deep Learning Model for Protein Structure Prediction

Basecamp Research introduces BaseFold, a groundbreaking deep learning model designed to accurately predict 3D structures of large, complex proteins, surpassing industry benchmarks like AlphaFold2.
Emerging Technologies
by
|
March 12, 2024

Basecamp Research, a renowned leader in AI-driven protein and biological system design, has unveiled a groundbreaking achievement in the realm of protein structure prediction. Their latest innovation, BaseFold, a deep learning model, surpasses existing AI tools in accurately forecasting the 3D configurations of intricate proteins. This accomplishment, including its comparative superiority to the industry-standard AlphaFold2, has been detailed in a recent publication on bioRxiv.

BaseFold stands out by enhancing AlphaFold2 through integration with BaseGraph, a meticulously curated dataset by Basecamp Research. Derived from partnerships with over 25 biodiversity-rich nations, BaseGraph provides a comprehensive foundation for biological AI, enabling BaseFold to excel in predictive accuracy. Notably, the reported enhancements mark just the initial strides, with Basecamp Research committed to continuous refinement through an expanding network of global biodiversity collaborations.

Moreover, Basecamp Research is set to collaborate with NVIDIA to optimize and deploy BaseFold within NVIDIA BioNeMo, a cutting-edge AI platform tailored for drug discovery endeavors.

While traditional methods like X-ray crystallography remain prevalent in determining protein structures, AlphaFold2's emergence in 2020 revolutionized the application of AI in biotechnology. Subsequently, various structure prediction models, such as CollabFold and RoseTTAFold, have emerged, albeit with reliance on public protein databases perceived as inadequate for modern biotech AI applications.

Basecamp Research's BaseFold addresses this limitation head-on, striving to achieve crystallography-level accuracy, particularly for larger and more complex proteins. By harnessing the extensive evolutionary data within BaseGraph, BaseFold significantly improves upon existing models, as evidenced by its performance in predicting structures from the CASP15 and CAMEO projects.

Key findings from the publication underscore BaseFold's transformative impact:

  • Leveraging BaseGraph, BaseFold elevates the accuracy of predicted structures by up to six-fold compared to AlphaFold2.
  • Notably, BaseFold demonstrates up to a three-fold improvement in modeling small molecule interactions with protein targets.
  • By enabling more dependable 3D structure predictions and small molecule docking, BaseFold paves the way for advancements in drug discovery, particularly for proteins underrepresented in public datasets.

This paradigm shift holds immense promise in expediting drug discovery initiatives, empowering researchers to develop advanced therapeutic molecules with unprecedented precision through AI-driven insights into molecular interactions.

"We have redesigned and rebuilt the entire data acquisition process, making us the first team ever to collect and annotate biodiversity data with the same quality as human clinical genetic data — all purpose-built for the AI era," said Dr. Phil Lorenz, CTO of Basecamp Research. "BaseGraph, the most diverse and comprehensive dataset of its kind, is the core driver of our advances in AI. The results of this publication prove that more diverse, representative genomics data allows for step-change algorithm improvements without the need for extensive lab-in-the-loop infrastructure. Our database is growing every week, and as a result, BaseFold is improving every week, too."

"AlphaFold is one of the most useful AI tools in drug discovery, and for good reason. It enables researchers to better predict how medicines may interact with proteins in the body, shaving off years of work. However, AlphaFold still has significant room for improvement – particularly when being used to predict large, complex and underrepresented proteins, which are often the most critical for the development of new therapeutics. Even just a few percentage points of error can have major implications in accurately predicting protein-molecule interactions," said Dr. Glen Gowers, co-founder of Basecamp Research.

"We know that when it comes to AI, the best data produces the best outcomes, and it's rewarding to know that the new, purpose-built foundational dataset that we have built is already having widespread implications for drug development and human health," Dr. Gowers added. "We're not stopping here, though – we are continuing to scale our biodiversity partnerships and apply this data advantage across more and more biological AI models."

The full preprint can be found here: https://www.biorxiv.org/content/10.1101/2024.03.06.583325v1

Related Articles

No items found.