How computer vision will win queues and empty shelves in supermarkets - Valery Babushkin, X5 Retail Group

“It's easy to grow by 20% if you have opened one and a half times more shops”

— In your speech, you

said that X5 Retail Group's revenue reached 1.286 trillion rubles in 2017, and the reduction in costs by even a small share leads toHow is X5 expanding?

— X5's turnover will continue to grow.In general, the market is striving for consolidation in retail.We have three leading retailers occupying about 20% of the market, and we see in the countries of developed capitalism that this share will beabout 70–75%.

Every day on average X5 opens up six newstores. While we communicate, X5 opens a new store (laughs). Indeed, things are going well, unlike some other market players. If you look at the open data, one of them, with an increase in the area of ​​12%, the turnover increased by only 84%. It is easy to count: they begin to work in the negative. There is such an indicator, LFL - Like for like, a comparison of the same stores a year. At X5 on it, though small, but a plus. That is, it is easy to grow by 20% if you have opened one and a half times more shops, but this is a negative growth in fact. If you grow up due to the fact that old stores work better and new ones open, then this is quite positive.

- What do you think, what share of the merits of your team in this?

— It's not very big yet, because the team was formed not so long ago.Let's be honest, X5's growth in 2017 is unlikely to be driven by the fact that we're applying data analytics with a team created in 2018.

The head of our directorate is Anton Mironenkov, a man who took part in the creation of the X5 company. He was involved in the merger of Perekrestok and Pyaterochka, after which X5 appeared. 

We consider the direction of big datastrategic. The future of retail depends on how quickly retailers learn to monetize and use the data that we generate in large quantities every day to optimize processes and improve customer experience. Therefore, we decided to separate all this into a separate direction and give it greater focus so that it develops faster.

Anton Mironenkov, Head of Big Data X5

Within this directorate we have our own capacities,cluster, developers, testers, analysts, projects, products - everything you need. We have already done some things, and this is a lot of progress for less than a year. We clearly understand that we will give the company a fairly large profit, but again, these results will be visible only in a year.

All information in the check - if you bought vodka, then you are over 18 years old.

- If I come to the "Crossroads" and make a purchase, what of all this will you take for analysis?

- Check. Your products characterize you pretty well. If you buy diapers, you probably have a small child. If vodka, then you are over 18 years old. A person can buy chips, and it will be with a certain probability a teenager of 16 years. And if you bought a diary, then either you or your family has a child from seven to 17 years old. This is a lot of information.

Imagine: you come to the store, look at some products and understand that the store is expensive, cheap or medium price category. In Pyaterochka there are from 4 to 8 thousand unique products. It is unlikely that you go with a notebook and write down the prices for all the many goods, and then look at the average prices for goods in the city and draw a conclusion. Just look at the five to ten products. And so what products you look at, we are also engaged.

Products that people look at also change withby time. A simple example: 20 years ago there were no products related to mobile communications. Now you can, not in all stores, but buy a SIM card. 20 years ago in Russia there were generally times a little harder than they are now, and consumption was completely different.

- How is the creation of customer profiles to offer them discounts?

- There are two products: customer profile and loyalty. Client profile is such a task when you don't have some markup and use different approaches. We use various approaches to clustering - starting from standard statistics, calculate some Z-speeds, robust deviations from the median, and ending with Word2vec, superimposed on checks, and “translating” a person into a type of vector averaged through TF-IDF over Word2vec.

Z-score, Z-score- a statistical estimate that expressesThe distance (measured as standard deviation) of a given level from the mean of the data set. In particular, the Z-score is an output indicator of a company's creditworthiness and the degree of its risk of bankruptcy.

Robust deviations from the English. robust, “robust” - the stability of estimates in relation to emissions in the data. Considered relative to the median.

Word2vecis a tool that allows you to represent words as vectors.

TF-IDF- a term in statistics denoting the degree of importance of a word in a corpus of texts.

If you have any model that makesa personal proposal, then let us assume that clustering is successful if, after adding attributes, the quality of the models improves. Here you can calculate the economic effect, and some kind of metric.

- In what part of the stores are your products used?

- In all.We tested the personalized discount on half a million users to understand its effect in all 14 thousand X5 stores. We collect interactive reporting from all these stores. We have a promotional product that is available in all stores. We have an assortment matrix, we have a demand forecast. They make sure that, firstly, there is chicken in the store and, secondly, that the chicken is not rotten.

Now let's start doing computer vision, itwill not be available in all stores at first. Let's start with the largest ones - it makes sense to test only in them. The task is quite simple, the benefits are clear. There is a product, it may not be on the shelf, but it may lie in the warehouse, and at that moment the product is not purchased. This is very bad. The store bought it, but cannot sell it. In the best case, the user will not buy the product, and in the worst case, he will turn around and leave, because he does not need to come to where he will buy two of the three products, and go to another store for the third. He will come straight to the store where he can buy everything. And this is solved using computer vision. A camera is placed and it detects that you have little product left. A notification comes to the person responsible for this, he goes to the warehouse to buy this product.

The second task is the turn. We know that we have a queue in the store. Either you are standing in line, dissatisfied and wasting time that nobody likes, or go to the store, look at the queue, turn around and leave. If the reason for the queue is that the state is understaffed, nothing can be done about it. And if the problem is that the conditional saleswoman is sitting in the back, resting and drinking tea, and the director calls her. The store is already in line, and until it reaches, sits at the computer, turns it on, starts to pull the cashier, time will pass. Still looking at her, she is nervous, people too. This cashier must go out before the queue is formed, so that by the time of exit people have already gone to the cashier. It is quite easy to solve using computer vision.

We'll be testing it at about 150stores, and most likely in Moscow. Firstly, we ourselves are in Moscow, and secondly, there is more traffic here. Then it will become clear how to improve the user experience and what is the benefit of X5.

“I really don't like the word data scientist.”

- Do you expand your management?

- Of course, managers see that we are giving results. No one allows you to expand the team twice if you are not working well. By itself, this fact speaks of our effectiveness.

- You said that you have 32 people working, how many more will you be recruiting?

- Still somewhere 20-30. We will now use computer vision and speech technology as part of my management. There will be two new departments, that is, this is plus ten people, in my opinion, another 10–15 are agreed for next year. There are so-called project rates. We expect it to be 30–36 plus, somewhere over 60 people. These are specifically the people who are engaged in data analysis and machine learning.

- Who are you inviting to work?

— I really don’t like the word “data scientist”because it does not carry any information. You can come to ten companies where they are looking for a data scientist, and these will be ten completely different positions. I like the word "analyst". My department names speak for themselves: there is a machine learning department, a data analysis department, an R&D group, that is, research, a computer vision department, a speech technology department and a non-product analytics group for solving those problems that come outside of any existing product direction.

I am looking for people who can program inPython, know probability theory and mathematical statistics, if I need modeling, then machine learning skills are required. But the most important thing is the ability of a person to think and analyze. I increasingly come to the idea that analytical thinking and critical is something that is very difficult to teach. If by 20-25 years there is already some worldview, it is unlikely to change.

- Did you understand this in X5?

- Not that the X5 led me to this. I also look at people, communicate, see how they work. As you know, the best interview is a trial period. And at some point you see that this is simply not for this person. That is, it seems that he graduated from mekhmat, it seems he is not a fool, but not him. There is no right attitude, do not see things. It was in Daniel Kaneman's book “Thinking, Fast and Slow”, where he described what corresponds to critical thinking. This includes a pessimistic view of the world, and it is more an innate quality than that acquired, unfortunately or fortunately.

- If an analyst arrives, and after a trial period, you understand that he is suitable, what can a person expect?

- Standardly in IT there are gradations - junior, middle,senior and trainee. It is rarely found above - this is the staff or the leader. I believe that there is an inflation of the senior position: we have a lot of them, but in fact they rarely reach the average middle position.

If you take the average salary on the market, juniorreceives somewhere between 120–150 thousand rubles before taxes per month, middle - up to 250 thousand. Seniors about 400 thousand rubles. Top bar: I personally held the offer in the hands of the lead developer, it was more than 600 thousand rubles.

“Data science is really some kind of“ cherry on a cake ””

— How did you start doing machine learning?

— There was no computer science at the university at Because I graduated from university in 2012, around the same time there was another rise in things related to it. Did not have time. Graduated from two universities, the last one was the University of Applied Sciences in Karlsruhe, master's degree in mechatronics. Before that, he studied at the Moscow Institute of Chemical Engineering, now called the Moscow Polytechnic. I was not involved in machine learning either there or there.

Funny thing: Now the interviewer of those who complete data science, and it seems that their level is weaker and lower than those of the guys who graduated from physics, engineering, computer science, and then machine learning “screwed” on it. Maybe this is a slight shift, because the guys who learned it themselves were initially strong, learned something new and came. And data science is really some kind of “cherry on a cake”, and if there is no “cake” itself, but there is a “cherry”, then this is not so interesting.

- How did you learn this?

— There is an old saying that on Coursera there are twoserious course, even one and a half. This is Hinton’s course on machine learning and neural networks (the course is no longer available on Coursera, but can be viewed on YouTube - Hi-Tech) and Daphne Koller’s course on probabilistic non-graphical models.

The Koller course is video lectures,which she reads to graduate students at Stanford. Therefore, to call him not entirely serious does not turn the language. Hinton’s course lasts 16 weeks, and Koller has three courses of five to six weeks. And I gathered my strength into a fist, went through the first course and realized that the second and third is not ready to pass.

But Coursera is not the only option.I read a lot of books. By the way, I have now finished Bradley Efron’s book on statistics (American statistician, winner of the US National Medal of Honor - the highest state award for American scientists - “High-Tech”). Before that, a book by Ian Godfellow (American machine learning specialist, works at Google Brain - Hi-Tech) on deep learning. It's a continuous learning process. Coursera is just one resource, Kaggle (an online computer science community that regularly hosts competitions - "High-Tech") is another, but the main thing is to read, read, read and check. If you read it and don't understand it, that's bad. If you understand how it works, you can do anything.

It's like the multiplication table.Imagine that a person does not understand the multiplication table, but has learned it by heart. They ask him: “Six by six?” - “36”. - “Seven by eight?” - “56”. - “Okay, last question, 10 to 11?” — The man says: “I don’t know, it wasn’t in the multiplication table.” That's it. These are the people I often meet. 10 by 11 is much easier to calculate, but this is not in the table, you need to understand the principle. If you understand the principles, then everything will be much easier.

Everything else depends on the person. It seems that we learn any thing ourselves. We just help and do not interfere with other people. All this is a matter of self-discipline.

— Tell us about your data science course at HSE.

- This is a free course, it is within the standardprograms, on it I tell basic simple things which for many people - revelation. For example, what are the metrics, why do they exist at all, how do they differ from each other, in what cases is it necessary, how to test your idea of ​​what an A / B test is. This is what I brought out for myself that it is important for people to know and what they really need in their work.

- How do you see the future of retail in five to ten years?

— If we are talking about food retail, thenThe hypermarket format will die out. This can be seen now in the States, how large shopping centers are dying out there, and in Russia, by the way, too. What was the consumption pattern before? We go to the mall, to the cinema, to the food court, and buy something else. Now we come home, ivi, Okko, Netflix, Yandex.Eda, Delivery Club, delivery from a restaurant, online shopping. We need to move towards personalization.

- What will it mean for the consumer?

- Man uses what? That which can afford, and that it is convenient. Accordingly, it is necessary to reduce costs, keep the same quality or increase it. This is where personalization comes to mind.

- A person buys what he can afford. Now the real incomes of the population are falling, costs are being reduced.

— In such a situation, store economy formatsfeel better and grow. There are two ways to solve many problems for retailers. Either automation or hire ten more people. In the short term, the second way is a winning strategy, because integration is expensive, time-consuming, and if something goes wrong, you can lose your bonus. Now imagine that you are the director of a department with a very large bonus, and you may lose it. It is unclear whether you will work in the company in two years, when the results of this automation become known, or not, and they will praise you for them. And you may already have a bonus. Therefore, we are hiring ten more people. But this leads to a big loss in the long run.