Ruslan Sabitov, National Clearing Center - about big data in banking and problems with legacy software

NCC- a non-bank credit organization that performs the functions of a clearing organization and central

counterparty on the markets of the Moscow Exchange..As a central counterparty, NCC assumes risks for transactions concluded by participants during exchange trading, acting as an intermediary between the parties. The main and main function of NCC is to ensure stability in the financial market segments it serves through the implementation of a modern risk management system that meets international standards.

Reporting is easier with big data

The Bank of Russia requires all financial participants— banks, financial organizations, including NCC, report on their activities. They report how much money is on the balance, how it is transferred to individual accounts, and how many transactions have taken place. All this should be prepared and provided to the Central Bank on a regular basis. To simplify the task, we implemented an automatic system based on solutions from the Neoflex company. This company has long specialized in building financial reporting systems for regulatory authorities.

When I worked at Binbank, we also hadNeoflex solutions, but on older technologies. It was an Oracle database, and all processing and reporting was done on it. And NCC implemented a solution based on big data-Hadoop technology, which we use as the main data storage and information processing system.

Oracle databases- one of the most popular databases in the world,used by IT market leaders - Facebook, Twitter, YouTube. For example, MySQL databases based on Oracle solutions are often the embedded database of choice distributed by thousands of software vendors and hardware manufacturers.

Oracle Database or Oracle RDBMS— object-relational database management systemOracle data. For more than 40 years, Oracle has helped companies, governments, and other organizations around the world collect, organize, and use data.

Hadoopis an open source project locatedmanaged by the Apache Software Foundation. Hadoop is used for reliable, scalable and distributed computing, and is also used as a general purpose file store that can accommodate petabytes of data.

The paradigms of Hadoop and Oracle are completely different. You can, of course, say that this is a further development. But in fact, this is going aside, because the essence of big data is different from ordinary databases. Hadoop does not imply transactionalism, but as a repository and information processing system - this is the most successful solution. License costs are reduced. Very good scalability, and, accordingly, high performance.


Different banks use different forms of reporting,but there are basic, inherent to all. For example, NCC submits about 200 reporting forms. Of these, at the moment we have implemented about 20 forms created using Hadoop. There are two systems of data sources. From them we collect detailed data for reporting. This is an automated banking system. The second is the central counterparty. This is not for banks, but specialized for us. After that, the data is aggregated into Hadoop. First, they are filled in their original form, then processed, cleaned, prepared, aggregated - and transferred to a separate layer. It creates a complete data portfolio for each client and for each storefront. And on the basis of Oracle is already "wrapper", through which we obtain information that already exists in the system itself. And based on this data, we build full-fledged reports that are suitable for submission to the regulatory authorities.

Data lake- centralized storage, allowingstore all data and structures. Data can be stored right as it is, with different types of analytics - from dashboards and visualization to big data processing, real-time analytics and machine learning for better decision making.

"We untie the hands of the analysts"

One of the reasons why we chose Hadoop isthere was a project to implement Data Lake in the Moscow Exchange Group. The product is still young and constantly evolving. Not only financial departments will store data there - accounting, for example, but also technical departments that will process their data using Hadoop facilities - the same logs from financial systems. The Moscow Exchange generates a very large amount of data. In order to process this, standard central databases (central databases - Hi-Tech) are no longer suitable. They simply cannot cope with this flow.

Data Lake solution simplifies this task.and increase productivity. Now we have the reporting of the Central Bank, tax reporting, Rosfinmonitoring will be added here. This implies storing another type of data. These may simply be scans of documents required for tax. Store binary files in the database is very expensive and unwise. Therefore, a solution was chosen from the point of view of big data.

In the future, this solution is possiblefuture. We have a group of companies, each has its own tasks for processing its data. Perhaps in the future this will all be merged into one cluster, in which all tasks within the group will be processed.


Another direction is reducing costs forstorage of archival data. Storing them on Exadata is quite expensive. Once data is migrated to Hadoop power, it will be cheaper, easier to analyze, and provide better performance for analysts. At the moment, they are limited by the fact that the resource allocated for their tasks is quite narrow - due to the excessive load on the main system.

How legacy software can lead to bank default

In any bank, the landscape of software usedsoftware is very diverse. Starting from some of our own developments and ending with industrial solutions that remain as legacy software that can no longer be gotten rid of. He was originally chosen. Switching from it to other systems is very expensive. Because of this, you have to constantly pull and support him. Our solution allows you to get rid of some system problems through the use of new technologies.

For example, we prepared reports based onautomated banking system. But it had limitations - performance and exclusive use of resources when calculating one report. Accordingly, performance in terms of calculating one shape over time was extremely low. Sometimes one form took us up to six hours or more.


You can't just walk away from moments like this:Replacing an automated system in a bank is like a disaster. There are a lot of nuances here that need to be taken into account. It is simply impossible to migrate from one software to another in one day. In my practice, there was a case when a bank switched from two or three automated banking systems located in the regions to a single one, which, in turn, was located in the head office. This process lasted several months.

Also in NCC - when implementing this projectthere were a lot of people involved. These are analysts and financial officers who conducted an analysis of a particular approach. In addition, a large number of technical personnel were involved - it is necessary to prepare the infrastructure, deploy and create maintenance regulations.

Banks that have been operating for a long time definitely have legacy software.I worked in four banks, all of them had such software. Starting with software written under DOS and ending with large software that is no longer possible to refuse, since it is very strongly integrated into business processes. If you remain on old systems, the productivity and competitiveness of a financial organization decreases, and the risks increase—if not default, then license revocation.

For banks and companies that have just started theirdevelopment, there is more scope for choosing technologies. Including NCC, since we are a relatively young company. The software used almost everywhere is modern.