How Automation and Machine Learning Help Explore the Dark Corners of the Genome

Oct 22, 2019

If you synthesized every variant of a 79-nucleotide long piece of DNA (479), their mass would be greater than the mass of the earth. If you did the same thing, but with 126 nucleotides, their mass would be greater than the observable universe. I first read about these “hyper-astronomical” numbers in a review by Ard Louis, Professor of Theoretical Physics at the University of Oxford.Given that the E. coli genome is five million bases long, and the human genome is 600 times longer, how can engineers possibly know which precise modifications are best to overproduce some chemical, or which edits to make to stop the spread of a cancer? Nature has barely begun to explore this “coding” space, and we might be naïve to think that we could do better.In synthetic biology, the Design-Build-Test cycle is used to create organisms with desired properties. But, given the possible combinations of DNA sequences, it is nearly impossible to design the perfect DNA sequence for our desired application. A growing tide of computer-aided biology promises to accelerate this pipeline, using automated liquid handlers to Build (and in some cases Test), and feeding the resulting data into machine learning algorithms, which Design the next set of experiments.Machine learning is just a set of algorithms that computers use to perform specific tasks without a set of direct instructions. Feed the computer lots of high-quality data, and it will tell you which experiment to perform next.The Echo Liquid Handler developed by Labcyte, now part of Beckman Coulter Life Sciences, can help provide that data.“Moving liquid with sound waves, the Echo Liquid Handler is contact-free and capable of transferring 2.5 or 25 nanoliter increments precisely and accurately. For the synthetic biologist, you can combine a range of fluids from oligonucleotides, master mixes with enzymes to lysates for TX-TL expression without the need to calibrate the instrument,” says John Lesnick, Senior Scientist at Beckman Coulter. The whole process is also remarkably fast, with the Echo transferring hundreds of droplets per second. This speed and precision enables scientists to test a far greater number of variants than would ever be possible by hand. The data collected here, once fed into machine learning algorithms, can be used to answer challenging questions, like:Which guideRNA will best edit this gene? Which promoter should I use to express this protein?How can I modify this enzyme to maximize its catalytic turnover?Many high-powered synthetic biology companies are already using this “computer-aided” approach to rewire biology at breakneck pace. Genome editing company Inscripta used machine learning algorithms to develop an all-in-one platform that can make hundreds of precise genome edits simultaneously in living cells. Zymergen is producing high-performance materials not found anywhere in nature by using machine learning to dictate which genetic modifications to make in an organism.We are entering an era of computer-aided synthetic biology, where machines can run experiments, analyze the data, and design the next experiments. It may help us explore the “darkest corners” of genomes, and create incredible chemicals and products once inaccessible to nature.

Genome Editing and Protein Design Get Boosts from Machine Learning

Given the “hyper-astronomical” combinations that a DNA sequence can adopt, how can scientists find the best combination for their application?Consider Inscripta’s digital genome engineering platform, which was recently used to create a 200,000-edit library of an E. coli biosynthesis pathway. The system, which is an enclosed device that was formally announced at the SynBioBeta conference, uses CRISPR/Cas9 to make thousands of parallel edits at specific regions of the genome.

Richard Fox, Executive Director of Data Science at Inscripta.“The basic CRISPR technology has at least two key features,” says Dr. Richard Fox, Executive Director of Data Science at Inscripta. “One is the ability to cut a gene, then paste and repair… So it's probably not a stretch to imagine that we're using data to derive the rules that optimize the editing processes.”To determine which guideRNAs are best for each edit, they leveraged machine learning. “We generate a lot of data to figure out which designs work better than others, and part of improving our system performance is to use that data to empirically determine, along with statistics and machine learning, which guides are best for cutting,” explains Fox.But the utility of machine learning doesn’t stop there. Inscripta is also using their gene editing platform to inform protein engineering and directed evolution, the same method that earned Frances Arnold, Professor at the California Institute of Technology, the 2018 Nobel Prize in Chemistry.“We spent a fair bit of time working in the field of protein engineering, especially through methods like directed evolution. Specifically, we generated genotype and phenotype data around enzymes and other proteins,” says Fox, referring to how a DNA sequence encoding a protein can impact the observable characteristics of an organism (i.e., its phenotype).Inscripta, for example, can make many different edits in the DNA sequence encoding an enzyme, and then run experiments to measure the phenotypes that result. If the enzyme is responsible for producing a bright pigment, then certain edits will cause it to produce more or less of that pigment. By feeding that information into machine learning pipelines, the algorithm can predict which edits will maximize the desired phenotype.Forty minutes northwest of Inscripta, in Emeryville, California, Zymergen is using similar strategies to engineer the future of molecules and materials.

Creating the (Un)natural with Machine Learning

“Biology is an incredibly powerful, multi-purpose tool, and it can be aimed at any number of different endpoints,” says Aaron Kimball, CTO of Zymergen.Within a 310,000 square-foot space in Emeryville, Zymergen is using engineered organisms to produce a slew of chemicals and materials. Though inspired by nature, many of the materials they create are not found anywhere else on earth. Just like Inscripta, their engineering process requires the exploration of hyper-astronomical spaces to find the DNA combinations that work best to produce a desired compound.“To build the strains that we use for full-scale fermentation, we use lab automation systems. There’s a loop that we use…design, build, test and analyze. In the design phase, we design many genetic edits that we believe will be beneficial, then we physically make those edits in our strains, and then we test them for different responses,” says Kimball.Once liquid handlers are used to build the strains, the resulting data informs the next round of experiments, says Kimball. “In the learning phase, we update any models that we made and feed that information into the next round of design…so the build phase and the test phase are, essentially, entirely performed on lab automation systems so that we can perform this work at scale.”In Zymergen’s case, the objective is to engineer organisms that can produce custom designed materials and chemicals at scale. Often, seemingly innocuous edits in a single gene can lead to superior strains with higher production capabilities. But finding which edits to make requires a constant dialogue between data and algorithms.“We have one of the largest libraries of DNA available in the world…as well as an electronic database that we can search. So we can use machine learning algorithms to search that database for homologs to genes that might be beneficial replacements over the genes that are naturally encoded in an organism,“ says Kimball. “We might also [use machine learning to] search more heavily into the ‘dark matter’ of the genome, the genes of no known function…and then combine beneficial edits or mutations.”Despite being founded in 2013, Zymergen has already used this automation-meets-machine-learning approach with remarkable success. Although the company hasn’t revealed their customers, co-founder and CEO Joshua Hoffman has previously stated that their clients “have sold half a billion dollars worth of products made with our bugs in the last couple of years.”

Zymergen CEO Joshua HoffmanSynthetic biologists want to create a biological future that moves our civilization away from petrochemicals, demonstrates the promise of renewables, and produces high quality products for less. Considering the hyper-astronomical possibilities of genetic sequences, it is clear that humans are poorly equipped to test this vast space.But maybe high-throughput liquid handlers, like the Echo, coupled with machine learning algorithms, can help us learn from biology, sift through the dark corners of genomes, and continue advancing this ‘biologized’ industrial revolution.All trademarks are properties of their respective owners.