Anomaly hunters: how CERN searches for rare particles using Yandex algorithms

Andrey Ustyuzhanin— Head of the Research and Educational Laboratory of Big Data Analysis Methods at the National Research University Higher School of Economics.

Head of joint projects between Yandex and CERN. Participates in the development of the EventIndex and EventFilter services, which Yandex has provided for the LHCb experiment since 2011. 

Graduated from Moscow Institute of Physics and Technology in 2000, candidate of physical and mathematical sciences. One of the judges of the Microsoft Imagine Cup international final, before that he was a mentor of the MIPT team that won the cup in 2005.

How to look for anomalies in the data of the Large Hadron Collider

What are data anomalies?

— If we talk about data obtained usingLarge Hadron Collider (LHC), these may be discoveries that do not fit into standard ideas about how particle decays occur there after proton collisions. These discoveries will be anomalies. 

For example, if we are talking about asset quoteson the exchange, then anomalies there may be due to the fact that a certain hedge fund decided to pump an asset or Wall Street Bets decided to earn extra money and set up their own distributed hedge fund. That is, the physics is completely different, and the manifestation of this physics in the data is also not similar to other cases.

Therefore, if we talk about anomalies, we first need to understand what data and what physics we are talking about. 

— Then let's clarify with a focus on colliders.

- Here it is a little easier, although it also arisesfork. The fact is that there is data on what kind of processes occur with particles inside the detector. And there is data on how this collider works. People who are primarily interested in discovering new particles or laws are mainly interested in the first type of data. But the fact is that everything that happens in physics goes through a rather long chain of collecting and processing this information. And if any of the nodes of this chain begins to behave not as well as we imagined, that is, it goes beyond certain limits of the permissible, this introduces a distortion in the measurements. We can see anomalies in the place where, in general, they were not in physics.

Discoveries that do not fit into the standard ideas about how particle decays occur there, arising after the collision of protons, will be anomalies

To avoid such unpleasant events, peoplethey write special data quality control systems that monitor all the data in the measuring instruments and try to exclude from consideration those periods of time when there is a suspicion that something is going wrong. 

One of the examples that people like to talk aboutphysicists from the LHC, was that in the early stages of the collider's operation they noticed anomalies that did not fit into physical concepts. There was not yet the LHC, but its previous version. As a result, physicists found that the correlation is very serious with the train schedule on the railway, which is located nearby. And if you make adjustments associated with these fluctuations, you get a non-physical picture of the world. 

It is necessary to take into account external factors and be able tounderstand which of them need to be compensated correctly. The simplest solution: let's throw out the data that doesn't fit into the usual picture of the world. More complex stories are to try to return these anomalies, using understandable and physical principles, to normal data and try to benefit from them. 

Throwing out data is a waste of budgetary funds. Each kilobyte-megabyte has a certain price.

Andrey Ustyuzhanin, Head of the Research and Educational Laboratory for Big Data Analysis Methods at the National Research University Higher School of Economics

- And, accordingly, how can the anomaly be detected in these data using a machine learning system?

— There are two groups of such algorithms, whichwork with anomalies. The first group of one-class classification methods includes algorithms that use information only about those events that are marked as good. That is, they are trying to build a convex hull that encloses whatever we think is right. The logic is this: everything that goes beyond this shell, we will consider anomalies. That is, for example, 99% of the data is covered by such a shell, and everything else looks like something suspicious.

Another group of algorithms relies on partialmarking what we consider wrong. Essentially, there is a set of events that are known to have undesirable results. And then the search for anomalies comes down to a two-class classification problem. This is a regular classifier that can be built on the principles of neural networks or decision trees. 

The nuance is that usually in tasksanomalies, the sample is not balanced. That is, the number of positive examples significantly exceeds the number of negative ones. Under such conditions, standard classification algorithms may not work as well as we would like. The default loss function treats instances that qualify correctly equally, and may overlook the fact that among 10,000 correct results there are a hundred that qualify incorrectly. This hundred just represents those negative examples that are most interesting. It is clear that this can be combated, for example, by assigning more weight to negative examples, and taking into account errors with their classification with much more weight.

Loss function- a function that, in the theory of statistical decisions, characterizes losses due to incorrect decision-making based on observed data.

Contribution of our laboratory to solving the problemAnomaly detection is to propose methods that combine the features of the first and second approaches. That is, the task of working with one-class and two-class classification. Such a combination becomes possible if we build generative models of anomalous examples. 

Using approaches such as generativeadversarial networks or normalizing flows, we can learn to recover those examples that are labeled as negative and generate an extra sample that will allow the regular classifier to work with the augmented synthetic sample more efficiently. This approach works well for both tabular data and images. There was an article about this last year, which describes how such a system is built, and gives practical examples of its use.

— You mentioned working with images. How does it work in this case?

— There are examples in which we showed the workthis algorithm. They simply chose one of the classes of images: for example, handwritten numbers. And they said that zero is some kind of anomaly. And they asked the neural network, which decides that zeros are not like everything else, to be assigned to the negative class. Naturally, these can be not only zeros, but also, for example, numbers within which there are closed cycles - 068 - or numbers with horizontal intersections. Or simply images rotated at some angle relative to the rest of the sample. 

“We can simulate physics under certainexternal parameters with good accuracy and say what observable characteristics will describe the correct signal events, for example, the decay of the Higgs boson "

There is a dataset called an omniglot -letters written in different fonts. There are a huge number of fonts: from Futurama, Gothic, handwritten from unpopular alphabets - Sanskrit or Hebrew. We can say that the letters in Sanskrit are an anomaly, the letters written in a certain handwriting are also.

We ask the system to learn to distinguish everythingthe rest from these anomalous symbols. The main thing is that they are much smaller than everything else. This is the difficulty of working with them for conventional machine learning algorithms.

Symbiosis of physics and IT: how machine learning is used in LHC research

— What tasks of the LHC are solved with the help of machine learning?

— One big task we are working with isis to accelerate computational processes that simulate physical collisions and particle decays. The fact is that the decision about whether given events are similar to certain physical decays or not is made after analyzing a fairly large number of simulated decays. We can simulate physics at certain external parameters with good accuracy and say what observable characteristics will describe the correct signal events, for example, the decay of the Higgs boson. 

But there are certain caveats:We do not always know the parameters under which these decays need to be generated. As a rule, there is a certain idea about this. And the challenge of finding the right physics is to distinguish signal events from background events, which may be associated either with the incorrect operation of recovery algorithms, or with the physics of other processes that are very similar to what we are trying to find. Machine learning algorithms do a good job of this, but it's a well-known story. 

But to train such algorithms, it is requireda rather large statistical sample of simulated events, and the computation of these synthetic data requires certain resources. Because the simulation of one event takes about a minute or even ten minutes of computing time of modern computer centers. Due to the fact that the number of real events that physicists will work with will increase by orders of magnitude in the coming years, the number of synthesized events should also increase. Now computing resources are barely enough to cover the needs of researchers. Because to simulate one event, we have to calculate the interaction of microparticles with the structure of the detector and simulate the response that we will see on the sensors of this detector with very high accuracy.

The idea of ​​acceleration is to train the neural networkon events that were simulated using a certified package - GMT 4, which simulates everything that happens inside the collider detectors. This neuron will learn to compare the inputs, the parameters of the particles that we want to simulate, and the outputs - those observable characteristics that the detector produces. Neural networks today already cope quite well with the task of data interpolation. And several projects in our laboratory are aimed at precisely this. That is, to restore the characteristics of decays from the available synthetic sample, that is, to make such second-order synthetics. But there is a nuance: the advantage of neural networks is that we can fine-tune them using real data. That is, make this setting more accurate for a specific physical decay. 

People who are engaged in full-fledged physicalsimulation, they spend their time and effort on this, but with neurons it turns out a little less labor-intensive. And from the results that we did for the LHTV experiment at CERN and the project with the MPD experiment in Dubna at the Nica accelerator, it became clear that neural networks can achieve very high accuracy in covering the phase space of simulated events. They significantly speed up the calculation process: orders and even hundreds faster than an honest simulation.

— How does the neural network itself learn? 

— There are no differences in the learning process.But there is one peculiarity: for a neural network, in addition to the training sample, it is necessary to formulate quality criteria, that is, set a loss function that would best correspond to the task that this network should cope well with. In addition, the quality of the work of such a neural network is not assessed by researchers: it can be adequately assessed in terms of the computational steps that occur at a later stage of data processing. 

To determine whether a simulation is good or not, we canonly after we pass the events through the chain of their analysis, reconstruction, and we understand that the same characteristics that we originally laid in them are restored from them. This means that, for example, using a simple MSE Mean Squared Error metric is not enough.

MSE Mean Squared Error- measures the root mean square difference between the estimated values ​​and the actual value.

The behavior of the neural network needs to be assessed further, infeatures on parameter ranges that may not have been present in the training set. Building such models that behave well beyond the parameter values ​​known at the training stage is a large and theoretical task. 

Neural networks are good in the places in which theyknew something at the training stage. Outside of them, they can give out whatever they want. In our case, this is especially sensitive, because the correctness of the physical interpretation of the reality around us depends on it. 

“If a dark matter particle decays into particles with which we know how to interact, it can be assumed that this dark matter particle really was”

- That is, the neural network is looking for rare events that can occur at the collider?

— Based on the work of generative models, that is,First, we are talking about the synthesis of everything that can happen. We do this with miniature models. And at the output of such networks, we can build a model that will look for what we need: what we managed to generate on a generative neural network.

How to search for dark matter and why neural networks are needed for this

— Can a similar search principle be applied to dark matter?

- The fact is that dark matter can be searched fordifferent ways. One way is to build a proper detector that can isolate fairly well from the effects of ordinary matter. That is, to block the signal that comes from particles known to physicists. This is just a method of elimination: if the detector sees something other than noise, then it sees something that we have never seen before. One possibility would be that these are dark matter particles.

If, for example, a dark matter particledecays into particles with which we know how to interact, and it is clear that traces of decay could not appear from anywhere except from it, then we can assume that this particle of dark matter really was.

Such experiments are discussed and planned.One of them is called SHiP (Search for Hidden Particles). And, by the way, for such an experiment, the approaches that I spoke about are also applicable. It requires simulation and algorithms for recognizing rare approaches. But since the luminosity of this experiment is much lower (luminosity is the number of particles that are planned to be detected per unit time), the need to simulate a large number of similar events is not as acute as in the case of the Hadron Collider detectors. Although, for example, the task associated with assessing the quality of the work of a protective system against particles known to physics requires the simulation of a fairly large number of events. This is necessary in order to make sure that the protection works well with the enormous number of incoming particles of various types.

SHiPis an experiment aimed at finding hiddenparticles, including dark matter particles, in a stream of particles from the SPS accelerator filtered by magnetic fields, a five-meter layer of concrete and metal. 

There are other ways to search for dark matter,related to observations of space phenomena. In particular, one approach is to build sensitive elements that recognize the direction of very weakly interacting particles depending on the angle of incidence of this particle. The logic of the experiment is that it is possible to place the sensitive elements so that they are oriented along the vector of motion of the Solar system, that is, towards the constellation Cygnus. Then we will be able to distinguish particles that move in the Earth's coordinate system from particles that move differently. Like the motionless ether, which is distributed in outer space according to its own laws, in no way connected with the orientation and direction of movement of the planets. It's just that instead of ether, it is assumed that there are dark matter particles. They can weakly interact with the sensors of our experiment. And by analyzing their readings, it is possible to derive patterns of angular distributions of interacting particles. If we see that there is a serious component that does not depend on the position of the Earth in space, this will indicate the existence of previously unknown particles. And perhaps these will be candidates for dark matter particles. 

In such an experiment, simulation is quite important,because to build an algorithm for recognizing signal events, you need to imagine what the signal of interest to us looks like. Therefore, the tasks associated with fast simulation and the search for anomalies are relevant and applicable there.

They speak different languages, but the goals are common

Let's talk about working at CERN. What is it like for an IT person to work with physicists? What features are associated with working in such a cross-scientific space as the LHC?

- Good question.Indeed, people speak different languages: it comes to the point that the same concepts are graphically represented in different ways. For example, ROC curves, which machine learning specialists are accustomed to, are usually drawn in physics rotated by 90 degrees. And the coordinates are not called True Positive Rate and False Negative Rate, but Signal efficiency and Background rejection. Moreover, if Signal efficiency is still Precision, then Background rejection is one minus True Negative Rate. 

ROC-curve (from the English receiver operating characteristic, receiver operating characteristic)— a graph that allows you to evaluate the quality of the binaryclassifications. Displays the relationship between the shares of objects from the total number of attribute carriers, correctly classified as carrying the attribute, and the shares of objects from the total number of objects that do not carry the attribute, incorrectly classified as carrying the attribute.

It is clear that such things may be onsurfaces and are relatively easy to get used to, but the main challenges lie in understanding some of the basic assumptions that researchers make when writing their papers. And, as a rule, they are beyond what they write about. That is, this is some secret knowledge that is transmitted during a person’s training in graduate school, in the process of working on his research projects, it is formed in his mind. 

For people from another field of science, it's likedifferent cultural environment. For them, these assumptions may not be so obvious. Due to the fact that the lexicon turns out to be quite extensive and different, the construction of a dialogue can be delayed or even be unproductive. Therefore, here, as recommendations, one can probably advise either asking people to go beyond what they are used to and formulate the problem in the most abstract terms from physics. We do this partly when we organize competitions as part of our IDAL Olympiad. In the process of dialogue, we find a setting that would not require deep immersion in physics, but at the same time would be interesting for machine learning specialists.

This year we had a joint project withan Italian laboratory that is looking for dark matter. They provided synthetic data for the Olympics to find this dark matter. There really is no dark matter there, because the decays of known physics were simulated: collisions of electrons and helium ions.  But collisions of dark matter particles could be very similar to some of these collisions. They are very difficult to simulate and even more difficult to interpret. Therefore, especially for people who are not specialists in this field, we decided not to pull out this data and limit ourselves to only those that are similar. The algorithms we will see work on approximate data, but can also be applied to real data.

Andrey Ustyuzhanin. Photo from the speaker's archives

To sum up, one way is to agree on clear terms for everyone, and the other is to spend time and effort, attend summer schools, participate in practical research projects.

Books about machine learning and physical experiments recommended by Andrey Ustyuzhanin:

  • Deepak Kar,Experimental Particle Physics: Understanding the measurements and searches at the Large Hadron Collider.
  • Ilya Narsky,Statistical Analysis Techniques in Particle Physics: Fits, Density Estimation and Supervised Learning. 
  • Giuseppe Carleo,Machine learning and the physical sciences. 

- Are there any contradictions between the values ​​of physicists and IT specialists: for example, is the nature of interactions more important to someone, or, on the contrary, accuracy?

— If we talk specifically about accuracy, probablythere is no ambiguity. But this is more likely due to the fact that IT specialists do not understand the nature of the data. It’s just that if we measured the data with an accuracy of a millimeter, then there is no point in calculating the area with an accuracy of square microns. In the case of complex neural networks, we are faced with the fact that they produce information accurate to the last sign in the mantissa, but there is no more meaning in these signs than in the accuracy that was at the input. 

Well, maybe a general wish for peoplethat are concerned with evaluating the accuracy of models is to give not only absolute characteristics, but also the limits of acceptable ranges or the spread in which these values ​​were obtained. Actually a good recommendation not only for those who interact with physicists or with biologists. This is, in principle, the correct way to maintain a presentation of the results obtained.

And if we talk about how much they can bedifferent expectations on one side and on the other, then these are all working issues, in fact. If there is interest on both sides, they can be resolved simply and well. That is, machine learning is now in demand among physicists in a broad sense, because it provides more accurate tools for working with their data. And it works in the opposite direction, because for machine learning specialists it can be much more interesting to see how their algorithms help in the discovery of new particles, for example, as is the case in our laboratory. We worked for a long time to create an algorithm that would determine the type of particle. And recently there was news about the discovery of new tetraquarks, and our algorithms took a direct part in their discovery. 

Therefore, for people from IT, conditionally from Data Science,Computer Science, feeling the usefulness of the algorithms they develop is very important. Therefore, at our faculty, for example, there is an International Laboratory of Bioinformatics. 

Such interactions become increasinglymore and more normal. I don’t know if they can already be considered mainstream or if we still have to wait, but one way or another this story is inevitable. Even if you look at the workshops organized as part of today's leading conferences on artificial intelligence, the workshop on the use of AI in the physical sciences takes a leading place in the number of interested people. 

Read more:

American satellite "saw" an unusual message from Earth

Published video from the rocket, which was launched from an experimental accelerator

The monster at the center of our Galaxy: look at the photo of a black hole in the Milky Way