OpenFold SoloSeq model prediction vs experimentally resolved structure for PDB protein 4B9Z (Graphic: Business Wire)

OpenFold Biotech AI Research Consortium Unveils Groundbreaking Tools for Protein Research

In a notable advancement for the field of biotechnology, the OpenFold Biotech AI Research Consortium has announced the launch of two innovative tools: SoloSeq and OpenFold-Multimer.
Emerging Technologies
February 22, 2024

In a notable advancement for the field of biotechnology, the OpenFold Biotech AI Research Consortium has announced the launch of two innovative tools: SoloSeq and OpenFold-Multimer. These developments mark significant strides in protein research, offering faster and more precise protein structure predictions, improved models of protein interactions, and enhancements in the design of therapeutic proteins.

SoloSeq, integrating a new protein Large Language Model (LLM) with the OpenFold structure prediction software, stands as the first fully open-source protein LLM/structure prediction AI tool. Developed on Amazon Web Services (AWS), SoloSeq distinguishes itself by releasing crucial training code, enabling other organizations to fine-tune or develop new models using their proprietary data. This approach opens new doors for scientific exploration that were previously hindered by the limitations of closed-source models.

OpenFold-Multimer, on the other hand, offers an open-source software solution for generating high-quality models of protein/protein complexes. This tool represents a leap forward in understanding the intricate dance of proteins and their interactions, crucial for the development of new therapeutics.

The collaborative effort led by Professor Mohammed AlQuraishi at Columbia University, alongside researchers Sachin Kadyan, Kevin Zhu, Christina Floristean, Dingquan Yu, Gustaf Ahdritz, and Jennifer Wei, underscores the consortium's commitment to advancing open science. By making these tools accessible, OpenFold aims to catalyze scientific progress and facilitate further enhancements to these potent research instruments.

“OpenFold-Multimer and SoloSeq are particularly useful for designed proteins that don't exist in nature. These are the tools that we need to cure diseases,” said Brian Weitzner, Ph.D. Director of Computational and Structural Biology at Outpace and co-founder of OpenFold. “OpenFold’s commitment to open science includes releasing training code and data sets, making these tools the most accessible to the community in order to accelerate scientific advances and facilitate further improvements to these powerful tools.”

SoloSeq eliminates the need for a pre-computational step, significantly accelerating the process of protein structure prediction. This efficiency is achieved by leveraging the LLM's ability to rapidly summarize evolutionary information, a method reminiscent of how AI models like ChatGPT generate text based on vast training data.

The integration of LLM technology into SoloSeq offers several advantages, including the ability to handle inputs of non-natural proteins and the facilitation of large-scale screenings where speed is paramount. Additionally, SoloSeq's open-source nature, including the release of training code, sets it apart from previous models, enabling customization and new model development by a broad range of organizations.

Similarly, OpenFold-Multimer opens up new avenues for the creation and refinement of protein/protein complex models. With its fully open-source training code, users can not only generate new structures but also retrain or fine-tune models with proprietary data, further enhancing the tool's utility and applicability.

Related Articles

No items found.