Align and ATCC Partner to Launch World’s Largest AI-Ready Microbial Phenotyping Dataset

A new open-access collaboration aims to power AI-driven biology with standardized, high-quality microbial data across thousands of strains and conditions.
AI & Digital Biology
by
|
June 18, 2025

The Align Foundation, a nonprofit focused on accelerating predictive biology through AI-driven research infrastructure, has announced a landmark collaboration with ATCC, the leading global nonprofit supplier of microbial strains and biological standards. Together, they will create the world’s largest public, AI-ready microbial phenotyping dataset, designed to accelerate machine learning applications in biological research.

The project will generate high-quality phenotypic data for 1,000 phylogenetically diverse microbial strains across 1,000 cultivation conditions, forming a foundational dataset to help AI models more effectively link genotype to phenotype. This effort combines Align’s scalable, high-throughput experimental platform with ATCC’s authenticated microbial collection and deep genomics expertise, filling a critical gap in current research: the lack of large, standardized, and public datasets gathered under consistent conditions.

“Our vision at Align is to build the research infrastructure needed to make biological data collection frictionless, scalable, and shareable,” said Erika DeBenedictis, PhD, co-founder of Align. “Collaborating with ATCC — an organization synonymous with biological quality and reproducibility — is an incredible opportunity to create a large-scale, public resource that can help enable the next generation of AI-driven biological discovery. We’re honored to work alongside them, and this opportunity would not be possible without their diverse and trusted biomaterials.“

The dataset will include a wide array of cultivation conditions, such as atmospheric and temperature variations, as well as undefined, semi-defined, and defined media types. It will also incorporate a variety of metabolic supplements—including carbon sources, vitamins, cofactors, and metals. Each strain’s growth and morphology will be carefully measured and linked to genomic data, providing a training and validation foundation for predictive AI models focused on microbial physiology.

“The reliability of AI-driven biological insights depends entirely on the quality of the data—and ultimately, the source materials—used to train the predictive models. At ATCC, we are committed to providing reference datasets alongside our trusted biological reference materials so that these future insights can be physically reproduced and validated in the lab,” said Ruth Cheng, PhD, president and CEO of ATCC. “Our collaboration with Align is an important step towards enabling researchers to reliably apply AI in biology by building a dataset that is traceable to the authenticated microbial resources at ATCC.”

This pioneering initiative supports a shared commitment to reproducibility, scalability, and accessibility in science. Phenotypic data will be hosted on Align’s Phenome Portal, with links to ATCC’s Genome Portal, enabling researchers to seamlessly navigate between genotype and phenotype.

Why It Matters

Today’s biological datasets are often fragmented, inconsistent, or siloed. By creating an open, standardized microbial phenotyping resource, Align and ATCC are removing barriers for the research community and accelerating the development of predictive biological models. These models have wide-reaching implications for human health, sustainability, and economic innovation.

This collaboration marks a major step toward making data-driven biology a reality. Researchers and organizations are invited to explore and contribute to this growing open-access resource. Visit alignbio.org/datasets-microbes or contact contact@alignbio.org to learn more or get involved.

About The Align Foundation
Founded in 2021, the Align Foundation is a nonprofit advancing life sciences by enabling large, open biological datasets. Align specializes in high-throughput experimentation and automation partnerships, and hosts open competitions to benchmark progress in the field. Supported by philanthropic funding, Align is building the infrastructure needed to unlock predictive, data-driven breakthroughs in biology. More at alignbio.org.

Media Contact:
Naomi Hagelund
The Align Foundation
comms@alignbio.org

About ATCC
ATCC is a globally recognized nonprofit standards organization and a leading provider of authenticated cell lines, microorganisms, and biological reference data. With over 100 years of scientific contributions, ATCC supports academia, industry, and government researchers by offering the largest and most diverse collection of biological materials and model systems. Headquartered in Manassas, Virginia, with research centers in Maryland, ATCC is committed to driving innovation in science and public health. Learn more at atcc.org.

Related Articles

No items found.

Align and ATCC Partner to Launch World’s Largest AI-Ready Microbial Phenotyping Dataset

June 18, 2025

Align and ATCC Partner to Launch World’s Largest AI-Ready Microbial Phenotyping Dataset

A new open-access collaboration aims to power AI-driven biology with standardized, high-quality microbial data across thousands of strains and conditions.
by
June 18, 2025

The Align Foundation, a nonprofit focused on accelerating predictive biology through AI-driven research infrastructure, has announced a landmark collaboration with ATCC, the leading global nonprofit supplier of microbial strains and biological standards. Together, they will create the world’s largest public, AI-ready microbial phenotyping dataset, designed to accelerate machine learning applications in biological research.

The project will generate high-quality phenotypic data for 1,000 phylogenetically diverse microbial strains across 1,000 cultivation conditions, forming a foundational dataset to help AI models more effectively link genotype to phenotype. This effort combines Align’s scalable, high-throughput experimental platform with ATCC’s authenticated microbial collection and deep genomics expertise, filling a critical gap in current research: the lack of large, standardized, and public datasets gathered under consistent conditions.

“Our vision at Align is to build the research infrastructure needed to make biological data collection frictionless, scalable, and shareable,” said Erika DeBenedictis, PhD, co-founder of Align. “Collaborating with ATCC — an organization synonymous with biological quality and reproducibility — is an incredible opportunity to create a large-scale, public resource that can help enable the next generation of AI-driven biological discovery. We’re honored to work alongside them, and this opportunity would not be possible without their diverse and trusted biomaterials.“

The dataset will include a wide array of cultivation conditions, such as atmospheric and temperature variations, as well as undefined, semi-defined, and defined media types. It will also incorporate a variety of metabolic supplements—including carbon sources, vitamins, cofactors, and metals. Each strain’s growth and morphology will be carefully measured and linked to genomic data, providing a training and validation foundation for predictive AI models focused on microbial physiology.

“The reliability of AI-driven biological insights depends entirely on the quality of the data—and ultimately, the source materials—used to train the predictive models. At ATCC, we are committed to providing reference datasets alongside our trusted biological reference materials so that these future insights can be physically reproduced and validated in the lab,” said Ruth Cheng, PhD, president and CEO of ATCC. “Our collaboration with Align is an important step towards enabling researchers to reliably apply AI in biology by building a dataset that is traceable to the authenticated microbial resources at ATCC.”

This pioneering initiative supports a shared commitment to reproducibility, scalability, and accessibility in science. Phenotypic data will be hosted on Align’s Phenome Portal, with links to ATCC’s Genome Portal, enabling researchers to seamlessly navigate between genotype and phenotype.

Why It Matters

Today’s biological datasets are often fragmented, inconsistent, or siloed. By creating an open, standardized microbial phenotyping resource, Align and ATCC are removing barriers for the research community and accelerating the development of predictive biological models. These models have wide-reaching implications for human health, sustainability, and economic innovation.

This collaboration marks a major step toward making data-driven biology a reality. Researchers and organizations are invited to explore and contribute to this growing open-access resource. Visit alignbio.org/datasets-microbes or contact contact@alignbio.org to learn more or get involved.

About The Align Foundation
Founded in 2021, the Align Foundation is a nonprofit advancing life sciences by enabling large, open biological datasets. Align specializes in high-throughput experimentation and automation partnerships, and hosts open competitions to benchmark progress in the field. Supported by philanthropic funding, Align is building the infrastructure needed to unlock predictive, data-driven breakthroughs in biology. More at alignbio.org.

Media Contact:
Naomi Hagelund
The Align Foundation
comms@alignbio.org

About ATCC
ATCC is a globally recognized nonprofit standards organization and a leading provider of authenticated cell lines, microorganisms, and biological reference data. With over 100 years of scientific contributions, ATCC supports academia, industry, and government researchers by offering the largest and most diverse collection of biological materials and model systems. Headquartered in Manassas, Virginia, with research centers in Maryland, ATCC is committed to driving innovation in science and public health. Learn more at atcc.org.

RECENT INDUSTRY NEWS
RECENT INSIGHTS
Sign Up Now