Ruslan Sabitov, National Clearing Center - about big data in banking and problems with legacy software

NCC - non-bank credit organization that performs the functions of a clearing organization and a central

counterparty in the markets of the Moscow Exchange .. As the central counterparty, NCC assumes the risks of transactions concluded by participants during exchange trading, acting as an intermediary between the parties. The main and main function of NCC is to ensure stability in the serviced segments of the financial market by implementing a modern, risk management system that meets international standards.

Reporting is easier with big data

The Bank of Russia requires all financial participants- banks, financial organizations, including NCC, to report on their activities. They report how much money is on the balance sheet, how it beats into separate accounts, how many transactions have passed. All this should be prepared and provided to the Central Bank on a regular basis. To simplify the task, we have implemented an automatic system based on solutions from Neoflex. This company has long specialized in building financial reporting systems for regulators.

When I worked at Binbank, we also hadNeoflex solutions, but on older technologies. It was an Oracle database, and all processing and reporting was done on it. And NCC implemented a solution based on big data-Hadoop technology, which we use as the main data storage and information processing system.

Oracle databases - one of the most popular bases in the world,used by IT market leaders - Facebook, Twitter, YouTube. For example, MySQL database solutions based on Oracle solutions are often chosen as an embedded database distributed by thousands of software vendors and hardware manufacturers.

Oracle Database or Oracle RDBMS - object-relational database management systemOracle data. For over 40 years, Oracle has been helping companies, governments, and other organizations from around the world collect, organize, and use data.

Hadoop - an open source project locatedrunning the Apache Software Foundation. Hadoop is used for reliable, scalable and distributed computing, and is also used as a general-purpose file storage that can accommodate petabytes of data.

The paradigms of Hadoop and Oracle are completely different. You can, of course, say that this is a further development. But in fact, this is going aside, because the essence of big data is different from ordinary databases. Hadoop does not imply transactionalism, but as a repository and information processing system - this is the most successful solution. License costs are reduced. Very good scalability, and, accordingly, high performance.


Different banks use different forms of reporting,but there are basic, inherent to all. For example, NCC submits about 200 reporting forms. Of these, at the moment we have implemented about 20 forms created using Hadoop. There are two systems of data sources. From them we collect detailed data for reporting. This is an automated banking system. The second is the central counterparty. This is not for banks, but specialized for us. After that, the data is aggregated into Hadoop. First, they are filled in their original form, then processed, cleaned, prepared, aggregated - and transferred to a separate layer. It creates a complete data portfolio for each client and for each storefront. And on the basis of Oracle is already "wrapper", through which we obtain information that already exists in the system itself. And based on this data, we build full-fledged reports that are suitable for submission to the regulatory authorities.

Data lake - centralized storage, allowing storageall data and structures. Data can be stored right the way it is, with various types of analytics - from dashboards and visualization to big data processing mode, real-time analytics and machine learning for better decision making.

"We untie the hands of the analysts"

One of the reasons why we chose Hadoop,there was a project to introduce Data Lake in the Moscow Exchange Group. The product is still young, constantly evolving. Not only financial departments will add data to it - accounting, for example, but also technical departments that will process their data on Hadoop's facilities - the same logs from financial systems. On the Moscow Stock Exchange generated a very large amount of data. In order to process this, standard CDBs (central databases - “High-tech”) are no longer suitable. They just can not cope with this stream.

Data Lake solution simplifies this task.and increase productivity. Now we have the reporting of the Central Bank, tax reporting, Rosfinmonitoring will be added here. This implies storing another type of data. These may simply be scans of documents required for tax. Store binary files in the database is very expensive and unwise. Therefore, a solution was chosen from the point of view of big data.

In the future, this solution is possiblefuture. We have a group of companies, each has its own tasks for processing its data. Perhaps in the future this will all be merged into one cluster, in which all tasks within the group will be processed.


Another direction - reducing costsstorage of archival data. To keep them on Exadata is quite expensive. After transferring data to Hadoop's power, it will be cheaper, it will be easier to analyze, and higher performance will appear for analysts. At the moment, they are limited by the fact that the resource allocated for their tasks is rather narrow - due to excessive load on the main system.

How legacy software can lead to bank default

In any bank landscape used softwareThe provision is very diverse. Starting from some of their own development and ending with industrial solutions, remaining as legacy software, from which you can not get rid of. He was originally chosen. Switching from it to other systems is very expensive. In view of this, one has to constantly drag him along. Our solution allows you to get rid of some of the sores of the systems through the use of new technologies.

For example, we prepared reports based onautomated banking system. But she had limitations - performance and exclusive use of resources when calculating a single report. Accordingly, the performance in terms of calculating one form over time was extremely low. Sometimes we have one form took up to six hours or more.


From such moments just can not escape: Replacing an automated system in a bank is like a disaster. There are a lot of nuances that need to be taken into account. Just because in one day migration from one software to another is impossible. In my practice, there was a case when the bank switched from two or three automated banking systems that were located in the regions to a single one, which, in turn, was located at the head office. This process lasted several months.

Also in NCC - when implementing this projectthere were a lot of people involved. These are analysts and financial officers who conducted an analysis of a particular approach. In addition, a large number of technical personnel were involved - it is necessary to prepare the infrastructure, deploy and create maintenance regulations.

Banks that have been working for a long time definitely have legacy software. I worked in four banks, everywhere there was such software. Starting with software written under DOS and ending with large software, which can no longer be abandoned, since it is very much integrated into business processes. If you stay on the old systems, the performance and competitiveness of the financial institution decreases, the risks increase - if not the default, then the license withdrawal.

Banks and companies just starting outdevelopment, there is more room for technology selection. Including NCC, as we are a relatively young company. Software is used almost everywhere modern.