protein-folding contest seeks next big breakthrough

Protein structure model of DNA polymerase I. An enzyme that participates in the DNA replication

A protein’s function is determined by its 3D shape.Credit: Leonid Andronov/Alamy

“In some sense the problem is solved,” computational biologist John Moult declared in late 2020. The London-based company DeepMind had just swept a biennial contest co-founded by Moult that tests teams’ abilities to predict protein structures — one of biology’s grandest challenges — with its revolutionary artificial-intelligence (AI) tool AlphaFold.

Two years later, Moult’s competition, the Critical Assessment of Structure Prediction (CASP), is still walking in AlphaFold’s long shadow. Results from this year’s edition (CASP15) — which were unveiled over the weekend at a conference in Antalya, Turkey — show that the most successful approaches to predicting protein structures from their amino-acid sequences incorporated AlphaFold, which relies on an AI approach called deep learning. “Everyone is using AlphaFold,” says Yang Zhang, a computational biologist at the University of Michigan in Ann Arbor.

Yet AlphaFold’s progress has opened the floodgates to new challenges in protein-structure prediction — some included in this year’s CASP — that might require new approaches and more time to fully tackle. “The low-hanging fruit has been picked,” says Mohammed AlQuraishi, a computational biologist at Columbia University in New York City. “Some of the next problems are going to be harder.”

Matchmaking

CASP started in 1994, aiming to bring rigor to the field of protein-structure prediction — progress on which would accelerate efforts to understand the building blocks of cells and advance drug discovery. During the year of a contest, teams are tasked with using computational tools to predict the structures of proteins that have been determined using experimental methods such as X-ray crystallography and cryo-electron microscopy, but not yet released.

Entries are assessed according to how well predictions for entire proteins, or independently folding subunits called domains, match the experimental structures. Some of AlphaFold’s predictions at CASP14 were more or less indistinguishable from the experimental models — the first time such accuracy had been achieved.

Since its unveiling at CASP14, AlphaFold has become omnipresent in life-sciences research. DeepMind released the software’s underlying code in 2021 so that anyone could run the program, and an AlphaFold database updated this year holds predicted structures — of varying quality — for almost every protein from all organisms represented in genome databases, a total of more than 200 million proteins.

AlphaFold’s success and newfound ubiquity presented a challenge to Moult, who is at the University of Maryland, Rockville, and his colleagues as they planned this year’s CASP. “People say, ‘Oh, we don’t need CASP anymore, the problem was solved.’ And I think that’s exactly the wrong way round.”

At CASP15, the most successful teams were those that had adapted and built on AlphaFold in various ways, leading to modest gains in predicting the shape of individual proteins and domains. “The accuracy is already so high that it’s hard to get much better,” says Moult.

Protein complexes

To make the competition more relevant in a post-AlphaFold world, Moult and his team added new challenges and tweaked some existing ones. New tests include determining how proteins interact with other molecules such as drugs and predicting the multiple shapes that some proteins can assume. For the past decade, CASP has included ‘complexes’ of multiple interacting proteins, says Moult, but accurately predicting the structure of such molecules has taken on added emphasis this year.

“That is the right thing to do,” says Zhang, because predicting the structures of single proteins or domains — the bread and butter of past CASPs — has largely been solved by AlphaFold. Determining the shape of protein complexes, in particular, represents an important new challenge for the field, because there is a lot of room for improvement, says Arne Elofsson, a protein bioinformatician at the Stockholm University.

AlphaFold was initially designed to predict the shape of individual proteins. But, within days of its public release, other scientists showed that the software could be ‘hacked’ to model how multiple proteins interact. In the months since then, researchers have come up with myriad approaches to improve AlphaFold’s ability to tackle complexes. DeepMind even released an update called AlphaFold-Multimer with that goal in mind.

Such efforts seem to have paid off, because CASP15 saw a marked increase in the number of accurate complexes, compared with previous contests, mainly due to methods that adapted AlphaFold. “It’s a new game for us to be close to experimental accuracy with complexes.” says Moult. “We’ve got some failures too.”

For instance, teams made stunningly accurate predictions of a viral molecule of unknown function made up of two identical intertwined proteins. This kind of shape bamboozled pre-AlphaFold tools, says Ezgi Karaca, a computational structural biologist at the Izmir Biomedicine and Genome Center in Turkey, who assessed the complex predictions. The standard version of AlphaFold failed to accurately model the shape of a giant, 20-chain bacterial enzyme, but some teams predicted the protein’s structure by applying extra hacks to the network, Karaca adds.

Meanwhile, teams struggled to predict complexes involving immune molecules called antibodies — including several attached to a SARS-CoV-2 protein — and related molecules called nanobodies. But there were glimpses of success in some teams’ predictions, says Karaca, suggesting that hacks to AlphaFold will be useful for predicting the shape of these medically important molecules.

Time out

This year’s CASP was also notable for the absence of DeepMind. The company did not state its reason for not participating, but released a short statement during CASP15 congratulating the teams that did. (At the same time, it rolled out an update to AlphaFold to help researchers benchmark their progress against the network.)

Other researchers say the competition is a considerable time commitment, which the company might have felt was better spent on other challenges. “It would have been nice for us if they had participated,” Moult says. But he adds that “because the methods are so good, they couldn’t do another big leap”.

Making big improvements to AlphaFold will take time, say researchers, and will probably require new innovations in machine learning and protein-structure prediction. One area under development is the application of ‘language models’, such as those used in predictive-text tools, to the prediction of protein structures. But these methods — including one developed by the social-networking giant Meta — did not perform nearly as well at CASP15 as did tools based on AlphaFold.

Such tools might, however, be useful for predicting how mutations alter a protein’s structure — one of several key challenges in protein-structure prediction that has emerged as a result of AlphaFold’s success. Thanks to this, the field is no longer focused on one single goal, AlQuraishi says. “There’s a whole slew of these problems.”

Leave a Reply

Your email address will not be published. Required fields are marked *