Let's voice: how AI quietly replaced operators in call centers

Smart "talkers"

The voice is a natural communication tool. Many people want to resolve issues orally, and not

in writing, simply because it's faster.In business communication with customers, this is a convenient and native way of interaction. But not every company can expand the call center staff in proportion to the growth rate of the customer base. Automation is becoming an effective way to scale live communication with customers. It allows you to keep the usual ways of communication and cover a larger number of contacts without sacrificing quality.

Voice technologies are used in many areas,and they are suitable for any audience: children are attracted by an interactive "talker", young people appreciate the voice control of smart devices, and an assistant reads the news to the elderly. But voice assistants are most in demand in those industries where there are a lot of point communications with customers - in finance, retail, and telecom.

“Voice technologies are used in many areas”

Major companies use voicetechnology is not the first year. Since 2017, Bank of America has been running Erica, a virtual assistant. Since 2018, Mercedes-Benz has been introducing a digital User Experience (MBUX) complex that understands voice commands. Retailer Walmart has launched an application with the Ask Sam voice assistant, which helps customers with product search. According to Adobe Analytics, 91% of brands are already investing heavily in voice solutions and plan to increase investment. The Russian speech AI market will grow from 38% to 81% in the next five years and reach $561 million in 2025, Just AI predicts.

I believe - I do not believe

Business evaluate the effectiveness of implementationvoice technologies, focusing on the level of customer satisfaction and brand loyalty. But many customers view innovation with restrained enthusiasm. According to Voicebot.ai, only 45% of users want to see voice assistants in mobile applications. The main reasons for dislike, according to Neuro.net, are the poor quality of answers and the synthetic speech of voice assistants. These problems are typical for interfaces built on technologies of the past generation. Modern machine learning algorithms make it possible to synthesize voices devoid of soullessness.

Another limiting factor is thatvoice technologies have become widespread both in "good" scenarios from the client's point of view, and in "bad" ones. There are not so many companies specializing in the development of voice interfaces on the market yet, and the number of voices that they can offer is limited. It turns out that if today a person is bothered by advertising or fraudulent calls, and tomorrow a useful call rings out, communication will not be successful, because "all robots have one voice." If the reputation of the voice assistant is damaged, the effectiveness of calls useful to the client drops to zero. Therefore, Brand Voice is created - a unique brand voice.

“A unique voice is an important part of a brand, aslogo or corporate font. More and more of our customers are using this feature and interacting with customers in unique voices. We record a set of phrases with a certain intonation in the voice of a company employee or an announcer. And numerous dynamic data - phone numbers or addresses - the self-learning system automatically generates, reproducing the employee's voice and maintaining realistic intonations. This is how companies automate communications, but retain customer loyalty and increase conversion: people are pleased that they are spoken to in a lively voice, and they are willing to conduct a dialogue.”

Ivan Artemiev, MTT Product Director

Speak model

The cost of the finished Brand Voice starts from 150thousand rubles and depends on the scope and complexity of the voice synthesis model. The process of creating a solution consists of two parts - technical and logical, each is the responsibility of a separate product team.

An important step in this part is the choice of voice, onon which speech will be synthesized. The voice should intonationally reflect those brand attributes that it is important for the company to promote. A professional announcer or dubbing actor will need to speak up to 40 hours of language constructions under the recording. The recording should be of high quality, without unnecessary noise, and the pronunciation should be correct, because the voice robot model will be trained on this material.

To train the model and implement a full-fledgedsynthesis takes from a month to six months, depending on the complexity. But technology is advancing, and recording time in the studio is gradually decreasing. It is possible that in the future it will be possible to get a good voice robot using only 2-3 hours of the original audio.

“The cost of a finished Brand Voice starts from 150,000 rubles”

Learning artificial intelligence

When the recording is ready, the training beginsvoice model. She processes the recorded material, learns to reproduce her voice, and as a result, she is able to synthesize speech from any arbitrary text.

To solve this class of problems,Transformers is a deep neural network architecture introduced in 2017 by Google Brain researchers. The most famous transformers are the GPT (Generative Pre-trained Transformer) neural networks of the non-profit organization OpenAI. This technology, for example, allows you to most accurately fill in a gap or predict the next word in a phrase based on previous words.

According to this principle, voice Brands are created.Voice solutions. The trained model is run on a huge amount of data - several models are launched with different parameters and the best one is selected at the output. It is important that the robot correctly “translates” the text into voice, does not make mistakes in pronunciation and intonation. To improve the quality of synthesis, the model is further trained for specific use cases, which allows you to get the most natural-sounding voices.

Where is the logic?

The semantic content of the robot, its business logic andscenarios of interaction with people are created in close conjunction with the customer. In order for a voice assistant to bring maximum benefit to a business, you need to have a good understanding of how this business is organized, with what questions and in what situations the client will contact the assistant.

Inventing cases from scratch is a bad idea, logicThe interaction with the client must be real. If an assistant meets a person on the telephone line, then the script is based on a consulting, selling or some other script - a sequence of actions of a call center employee in a dialogue with a client. When preparing a script for a voice assistant, it helps to analyze the requests of real users, interviews with employees who communicate with them regularly, or UX experiments aimed at finding out the real needs of people.

“If an assistant meets a person on the telephone line, then the script is based on a consulting, sales or some other script”

Many customers are trying to voicethe assistant helped clients solve issues that are difficult for them to handle on their own. For example, at the mercy of the robot it is better to transfer functions that are “deeply” hidden or not obvious when working in a mobile application.

Irina Stepanova, designer-analyst of Just AI conversational interfaces: “You need to understand that in different channels there is a chat,application, phone - the client behaves differently. Therefore, first of all, you need to carefully study the customer journey map in those channels where you plan to implement a voice assistant. In the visual interface, the client has fewer ways to make a mistake - almost everything that the service has to offer is in front of his eyes. In the voice interface, the user does not feel the limitations of the service so well, and it is necessary to provide that a person can voice a request to the assistant with a long phrase, in which it will be necessary to highlight significant phrases by which the program will determine the essence of the request. A separate task is to design an offtopic script for which there is no ready-made script. The client can ask anything. What makes a robot human is the variability of answers, when it answers the same question in different ways.”

One of the challenges in developing voiceinterface - discoverability: how to tell what the assistant can do and what can help with? Here it is necessary to act proactively - to voice skills and abilities and guide the user through the scenario, suggesting further steps, helping him in dead-end branches when he gets into "processing of unrecognized requests". You can also talk about the assistant's abilities outside the assistant himself: in advertising, mailing lists, and using other marketing tools.

The voice assistant should not only bringbenefit, but also be an interesting conversationalist. The developers are always trying to put as much as possible into the “brain” of Brand Voice, endowing it with character and personality.

Learning is a continuous process

The development of the voice model does not stop even afterits commissioning. After six months of work, the quality of the model improves, and after a year it develops beyond recognition. If the client has allowed logging, that is, recording information about events during the operation of the voice assistant, then all error data is collected and used to retrain the model. Logging may be required when the assistant cannot recognize specific words and phrases or makes mistakes in their pronunciation, for example, in the names of medicines or in the assortment of a delivery service.

Creating a Brand Voice usually takes place in the cloudenvironment and requires the use of personal data, which often raises security concerns among customers. And although distrust of the clouds is an outdated stereotype, if it is important for the client that the data does not go beyond the company's perimeter, their processing can be done strictly within the organization's IT circuit. Personal data is also used during logging, to ensure their confidentiality, the data is anonymized.

Creation of new work scenarios and additional trainingmodels for Brand Voice is an ongoing process. In fact, by ordering a ready-made voice solution, the client receives a service that is constantly being improved. A truly high-quality voice assistant can not only notice the staff of an entire call center, but also become a bright accent that adds individuality to the image of the company.

Read more

Elon Musk's Noah's Ark Will Take One Million People to Mars

Astronomers from Japan have found an unknown structure in the galaxy

Saber of unknown origin found in Greece. Scientists puzzled by a strange artifact