ForceGen: Using a Diffusion Model to Help Design Novel Proteins

Although proteins are composed out of only a small number of distinct amino acids, this deceptive simplicity quickly vanishes when considering the many possible sequences across a protein, not to mention the many ways in which a single 1D protein sequence can fold into a 3D protein shape with a specific functionality. Although natural evolution has done much of the legwork here already, figuring out new sequences and their functionality is a daunting task where increasingly deep learning algorithms are being applied. As [Bo Ni] and colleagues report in a research article in Science Advances, the hardest challenge is designing a protein sequence based on the desired functionality. They then demonstrate a way to use a generative model to speed up this process.

They set out to design proteins with specific mechanical properties, for which they used the known unfolding characteristics of various protein sequences to train a diffusion model. This approach is thus more akin to the technology behind image generation algorithms like DALL-E than LLMs. Using the trained diffusion model it was then possible to generate likely sequences of which the properties could then be simulated, with favorable results.

As a large data set aid, such a diffusion model could conceivably be very useful in fields even beyond protein synthesis, automating tedious tasks and conceivably speeding up discoveries.

This post was originally published on this site