Predict all biology down to proteins: how AI knows more about our body than we do

Not only humans can predict a genetic disease and create a new medicine, but also

artificial intelligence, and the latter copes with the task much faster.

How do biologists study human proteins?

For more than 10 years molecular biologist Martin Becktogether with colleagues tried to create a detailed model of nuclear-pore complexes. Beck calls it the largest molecular machine in human cells. The nuclear pore complex controls the flow of molecules that flow in and out of the cell nucleus to where the genome resides. Hundreds of such complexes exist in all cells. Each of them consists of more than 1,000 proteins.

These thousands of pieces form over 30 proteins.building blocks that intertwine in different ways: so working with them is even more difficult. In 2016, a team led by Beck at the Max Planck Institute for Biophysics (MPIB) reported that they had created a model that describes about 30% of the nuclear pore complex and almost half of the building blocks, in other words, Nup proteins.

How do algorithms learn human proteins?

In July 2021, DeepMind, part ofAlphabet, the parent company of Google, introduced an AI called AlphaFold. This software could predict the three-dimensional shape of proteins from their genetic sequence. The calculations were carried out with high accuracy. This has seriously changed the tasks and interests of biologists.

Using AlphaFold, scientists were able to predictwhat is the shape of the human Nup protein. And later they published a model that describes 60% of the nuclear pore complex. This helped to understand how the complex stabilizes pores and controls the flow of molecules.

AlphaFold mania: why has AI for studying the structure of proteins become so popular?

According to biologists, the entire scientific communitygripped AlphaFold-mania. AI has been used for a huge amount of research. In some cases, the algorithm helped save time or provided new data. But AI also has limitations: some scientists find its predictions too unreliable for their work. Nevertheless, everyone admits that the algorithm copes with data processing better than anyone else.

Even the algorithm developers themselves do not alwayshave time to figure out what other areas it can be used in: so they often just watch as teams of researchers make new discoveries using AlphaFold: from drug discovery and protein development to the origin of complex life.

In the middle of July 2021 in the public domainan open source AI code has appeared, as well as other information that will help scientists begin to conduct such research as part of their work. A week later, DeepMind announced that it had used AlphaFold to predict the structure of nearly all the proteins humans make. In total, the team studied 365,000 biological structures. Today, this database has expanded to almost a million structures.

In 2022, DeepMind plans to releasein total more than 100 million predictions about the biological structure of organisms. This is almost half of all known proteins - also this number is hundreds of times higher than the proteins that could be determined experimentally.

The tool has already helped researchers find newpotential protein partners. The scientists used AlphaFold to predict the structures of 65,000 human protein pairs. And another group used AlphaFold and RoseTTAFold to model the interactions between almost every pair of proteins encoded by yeast. As a result, they found more than 100 previously unknown nuclear pore complexes.

How does Alpha Fold work?

AlphaFold uses deep neural networksLearning: This is a computing structure that works like the human brain and recognizes patterns in data. The algorithm was trained using hundreds of thousands of protein structures that were determined experimentally. If the AI ​​encounters a new sequence, it first searches databases for similar cases.

AlphaFold cycles through the hints, tryingsimulate a 3D image of amino acids. Experts believe that this software successfully applied new ideas for machine learning. In particular, they note a unique mechanism called “attention” - it determines which amino acid compounds are most important for work at a given moment.

What is the problem with AlphaFold?

AlphaFold predictions depend on information aboutrelated protein sequences, so AI has some limitations. For example, the algorithm cannot predict the effect of mutations on the shape of a protein, in particular those that cause diseases.

It also cannot detect how proteins changeshape due to other interacting proteins or molecules, such as drugs. But the model can evaluate the confidence of its own prediction for each amino acid unit of the protein. As of August 2022, DeepMind reports that more than 400,000 people have used AlphaFold EMBL-EBI databases.

How will such algorithms help ordinary people?

Researchers from pharmaceutical and biotech firms are excited about AlphaFold's potential and say the algorithm is helping to discover new drugs.

Karen Akinsanya, Development Leaddeveloping therapeutics at Schrödinger, notes that she and her colleagues have worked with AlphaFold with results that can be called successful. They were able to develop compounds that could potentially become drugs.

Read more:

MIT builds stationary heat engine that outperforms turbines

After ten years of work, scientists questioned the standard model of physics

See what sunrise looks like on Mars