How computer vision will win queues and empty shelves in supermarkets - Valery Babushkin, X5 Retail Group

“It's easy to grow by 20% if you have opened one and a half times more shops”

- In your speech you

said that the revenue of X5 Retail Group reached 1.286 trillion rubles in 2017, and reducing costs even for a small share leads to huge profits. How does X5 expand?

- The turnover at X5 will still grow. In general, the market seeks to consolidate in retail. Currently, we have three leading retailers occupying about 20% of the market, and we see in the countries of developed capitalism that this share will be about 70–75%.

Every day on average X5 opens up six newstores. While we communicate, X5 opens a new store (laughs). Indeed, things are going well, unlike some other market players. If you look at the open data, one of them, with an increase in the area of ​​12%, the turnover increased by only 84%. It is easy to count: they begin to work in the negative. There is such an indicator, LFL - Like for like, a comparison of the same stores a year. At X5 on it, though small, but a plus. That is, it is easy to grow by 20% if you have opened one and a half times more shops, but this is a negative growth in fact. If you grow up due to the fact that old stores work better and new ones open, then this is quite positive.

- What do you think, what share of the merits of your team in this?

- While not very big, because the teamformed not so long ago. Let's be frank, the growth of X5 in 2017 is unlikely due to the fact that we use data analysis with the help of the team created in 2018.

The head of our directorate is Anton Mironenkov, the man who participated in the creation of the X5 company. He was engaged in the merger of "Crossroads" and "Pyaterochka", after which the X5 appeared.

We consider the big data direction to be strategic. The future of retail is connected with how quickly retailers learn how to monetize, use the data that we generate in sufficiently large quantities every day to optimize processes and improve customer experience. Therefore, we decided to allocate all this in a separate direction and give a greater focus so that it develops faster.

Anton Mironenkov, Head of Big Data X5

Within this Directorate, we have our own power,Cluster, developers, testers, analysts, projects, products - everything you need. We have already done some things, and this is a very big progress for a period of less than a year. We clearly understand that we will give a fairly large profit to the company, but again, these results will be visible only after a year.

All information in the check - if you bought vodka, then you are over 18 years old.

- If I come to the "Crossroads" and make a purchase, what of all this will you take for analysis?

- Check. Your products characterize you pretty well. If you buy diapers, you probably have a small child. If vodka, then you are over 18 years old. A person can buy chips, and it will be with a certain probability a teenager of 16 years. And if you bought a diary, then either you or your family has a child from seven to 17 years old. This is a lot of information.

Imagine: you come to the store, look at some products and understand that the store is expensive, cheap or medium price category. In Pyaterochka there are from 4 to 8 thousand unique products. It is unlikely that you go with a notebook and write down the prices for all the many goods, and then look at the average prices for goods in the city and draw a conclusion. Just look at the five to ten products. And so what products you look at, we are also engaged.

Products that people look at also change withby time. A simple example: 20 years ago there were no products related to mobile communications. Now you can, not in all stores, but buy a SIM card. 20 years ago in Russia there were generally times a little harder than they are now, and consumption was completely different.

- How is the creation of customer profiles to offer them discounts?

- There are two products: customer profile and loyalty. Client profile is such a task when you don't have some markup and use different approaches. We use various approaches to clustering - starting from standard statistics, calculate some Z-speeds, robust deviations from the median, and ending with Word2vec, superimposed on checks, and “translating” a person into a type of vector averaged through TF-IDF over Word2vec.

Z-scores, Z-score - statistical evaluation, which expressesthe distance (measured as the standard deviation) of a given level from the mean over the data set. In particular, the Z-score is an output indicator of the company's creditworthiness and the risk of its bankruptcy.

Robust deviations from the English. robust, “robust” - the stability of estimates in relation to emissions in the data. Considered relative to the median.

Word2vec - A tool that allows you to represent words in the form of vectors.

TF-IDF - The term in statistics, denoting the degree of importance of the word in the body of texts.

If you have any model that makesa personal proposal, then let us assume that clustering is successful if, after adding attributes, the quality of the models improves. Here you can calculate the economic effect, and some kind of metric.

- In what part of the stores are your products used?

- In all. We tested the personalized discount for half a million users in order to understand its effect in all 14 thousand X5 stores. We collect online reports from all of these stores. We have a product promo, which is present in all stores. We have an assortment matrix, we have a demand prediction. They make sure that in the store, firstly, there is a chicken and, secondly, the chicken does not go out.

Now we will start to do computer vision, itwill not be in all stores at first. Let's start with the largest ones - it makes sense to test only in them. The task is quite simple, the benefits from it are clear. There is a product, it may be absent on the shelf, and it may lie in the warehouse, and at that moment the goods are not bought. This is very bad. The store bought it, but can not sell. At best, the user will not buy the product, and at worst, he will turn around and leave, because he does not need to come to where he will buy two goods from three, and for the third he will go to another store. He will come immediately to the store where you can buy everything. And this is solved with the help of computer vision. The camera is put, she discovers that you have a little product left. A notification comes to the person responsible for this, he goes to the warehouse for this product.

The second task is the turn. We know that we have a queue in the store. Either you are standing in line, dissatisfied and wasting time that nobody likes, or go to the store, look at the queue, turn around and leave. If the reason for the queue is that the state is understaffed, nothing can be done about it. And if the problem is that the conditional saleswoman is sitting in the back, resting and drinking tea, and the director calls her. The store is already in line, and until it reaches, sits at the computer, turns it on, starts to pull the cashier, time will pass. Still looking at her, she is nervous, people too. This cashier must go out before the queue is formed, so that by the time of exit people have already gone to the cashier. It is quite easy to solve using computer vision.

We will test it at about 150stores, and most likely in Moscow. Firstly, we ourselves are in Moscow, and, secondly, there is more traffic here. Then it becomes clear how to make a better user experience and what is the use of the X5.

“I really don't like the word data scientist.”

- Do you expand your management?

- Of course, managers see that we are giving results. No one allows you to expand the team twice if you are not working well. By itself, this fact speaks of our effectiveness.

- You said that you have 32 people working, how many more will you be recruiting?

- Still somewhere 20-30. We will now use computer vision and speech technology as part of my management. There will be two new departments, that is, this is plus ten people, in my opinion, another 10–15 are agreed for next year. There are so-called project rates. We expect it to be 30–36 plus, somewhere over 60 people. These are specifically the people who are engaged in data analysis and machine learning.

- Who are you inviting to work?

- I really do not like the word "data scientist"because it does not carry any information. You can come to ten companies where you are looking for a data scientist, and these will be ten completely different positions. I like the word analyst. My department names speak for themselves: there is a machine learning department, a data analysis department, a R & D group, that is, research, a computer vision department, a speech technology department, and an out-of-product analytics group to solve problems that arrive outside of some existing product line. .

I am looking for people who can program inPython, know probability theory and mathematical statistics, if I need modeling, then machine learning skills are required. But the most important thing is the ability of a person to think and analyze. I increasingly come to the idea that analytical thinking and critical is something that is very difficult to teach. If by 20-25 years there is already some worldview, it is unlikely to change.

- Did you understand this in X5?

- Not that the X5 led me to this. I also look at people, communicate, see how they work. As you know, the best interview is a trial period. And at some point you see that this is simply not for this person. That is, it seems that he graduated from mekhmat, it seems he is not a fool, but not him. There is no right attitude, do not see things. It was in Daniel Kaneman's book “Thinking, Fast and Slow”, where he described what corresponds to critical thinking. This includes a pessimistic view of the world, and it is more an innate quality than that acquired, unfortunately or fortunately.

- If an analyst arrives, and after a trial period, you understand that he is suitable, what can a person expect?

- Standardly in IT there are gradations - junior, middle,Senor and intern. Above is rarely encountered - it is a staff or presenter. I believe that there is an inflation of a senior position: we have a lot of them, but in fact they rarely fall short of the middle middle.

If you take the average salary on the market, juniorreceives somewhere between 120–150 thousand rubles before taxes per month, middle - up to 250 thousand. Seniors about 400 thousand rubles. Top bar: I personally held the offer in the hands of the lead developer, it was more than 600 thousand rubles.

“Data science is really some kind of“ cherry on a cake ””

- How did you start machine learning?

- The university did not have a machinelearning. Because I graduated from high school in 2012, at about the same time there was another rise in things related to it. Did not have time. He graduated from two universities, the last one is the University of Applied Sciences in Karlsruhe, a magistracy in mechatronics. Prior to that, he studied at the Moscow Institute of Chemical Engineering, now it is called the Moscow Polytechnic Institute. I did not study machine learning there.

Funny thing: Now the interviewer of those who complete data science, and it seems that their level is weaker and lower than those of the guys who graduated from physics, engineering, computer science, and then machine learning “screwed” on it. Maybe this is a slight shift, because the guys who learned it themselves were initially strong, learned something new and came. And data science is really some kind of “cherry on a cake”, and if there is no “cake” itself, but there is a “cherry”, then this is not so interesting.

- How did you learn this?

- There is an old saying that there are two on the Courseraserious course, even a half. This is Hinton's course on machine learning and neural networks (the course is no longer available on Coursera, but you can watch it on YouTube - “High-Tech”) and Daphne Koller's course on probabilistic non-graphical models.

The Koller course is video lectures,which she reads to graduate students at Stanford. Therefore, to call him not entirely serious does not turn the language. Hinton’s course lasts 16 weeks, and Koller has three courses of five to six weeks. And I gathered my strength into a fist, went through the first course and realized that the second and third is not ready to pass.

But Coursera is not the only means. I read a lot of books. Now, by the way, I finished the Bradley Efron book on statistics (American statistician, laureate of the National Medal of Honor of the USA - the highest state award for American scientists - "Hi-Tech"). Prior to that, the book by Jena Godfellow (American machine learning specialist, works in Google Brain - "High Tech") on deep learning. This is a continuous learning process. Coursera is only one of the resources, Kaggle (an online community of computer science experts who regularly hosts high-tech competitions) is another, but the main thing is reading, reading, reading and testing. If you read and did not understand, this is bad. If you understand how this works, you can do anything.

It is like with the multiplication table. Imagine that a person does not understand the multiplication table, but has learned it by heart. He is asked: "Six to six?" - "36". - "Seven on eight?" - "56". - “Well, well, the last question, 10 on 11?” - The man says: “I don’t know, this was not in the multiplication table”. Well, that's it. I often meet with such people. 10 to 11 is much easier to calculate, but this is not in the table, you need to understand the principle. If you understand the principles, then everything is much easier.

Everything else depends on the person. It seems that we learn any thing ourselves. We just help and do not interfere with other people. All this is a matter of self-discipline.

- Tell us about your course on data science at HSE.

- This is a free course, it is within the standardprograms, on it I tell basic simple things which for many people - revelation. For example, what are the metrics, why do they exist at all, how do they differ from each other, in what cases is it necessary, how to test your idea of ​​what an A / B test is. This is what I brought out for myself that it is important for people to know and what they really need in their work.

- How do you see the future of retail in five to ten years?

- If we are talking about food retail,the hypermarket format will die off. This can be seen now in the States, how large shopping centers die there, and in Russia, by the way, too. Previously, what was the pattern of consumption? We come to the shopping center, to the cinema, to the food court, we will buy something else. Now we come home, ivi, Okko, Netflix, Yandex. Food, Delivery Club, delivery from a restaurant, shopping online. We must go towards personalization.

- What will it mean for the consumer?

- Man uses what? That which can afford, and that it is convenient. Accordingly, it is necessary to reduce costs, keep the same quality or increase it. This is where personalization comes to mind.

- A person buys what he can afford. Now the real incomes of the population are falling, costs are being reduced.

- In this situation, store economy formatsfeel better and grow. There are two ways to solve many problems for retailers. Either automation or hire ten more. In the short term, the second path is a winning strategy, because integration is expensive, for a long time, something goes wrong, you can lose the premium. Now imagine that you are the director of a department with a very large premium, and you can lose it. You will work in the company in two years, when the result of this automation will be known or not, and they will praise you for them - it is not clear. And the prize you now can be. Therefore, we hire ten people yet. But this leads to a big loss in the long run.