Big data is a trendy topic right now, not only in financial services, but across all industries. However, it’s certainly not a new topic. Over five years ago, The Economist magazine featured an article called The Data Deluge that explained how quickly the quantity of information in the world is growing and how hard it is to store and analyze this data. The author further explained the benefits of monitoring data (i.e. credit card companies identifying fraud) and the challenges posed by the data deluge (i.e. privacy infringement). The solution: Make more data available in the right way by requiring greater transparency, take information security more seriously, and require annual security audits. This author had a glimpse into the future, wouldn’t you say?
To further prove that big data is not a new concept…
- In 1965, Gordon Moore, cofounder of Intel, published a paper called “Cramming More Components into Integrated Circuits.” His claim being that 2x transistors could be crammed onto a silicon die every 18-24 months. This was coined Moore’s Law. To us, this means that computing power doubles every couple of years.
- In 1996, futurist George Gilderwrote a paper on resource scarcity and how network capacity would be abundant in the future (Guilder’s Law). He was right – network capacity has increased at a rate of 3x every year.
- In 2005, Mark Kryder, CTO of Seagate, wrote about disk capacity. His claim, known as Kryder’s Law, was that disk capacity increases 1000x every 10 years, so about 2x every year.
- Also in 2005, Google created MapReduceand the rest is history…
Data creation capability continues to increase exponentially, from systems that are tapped for decision making (transaction systems and market data systems) to newer sources of data like e-weather, e-news, blogs, social media, etc. Somewhere in the middle lie the systems within a bank that have copious amounts of data, but are not accessed, such as email, weblogs, bilateral paper contracts, and phone transcripts. All of this data is relevant to decision making and thus analytics.
The financial services industry is one of the most data driven industries. The right data can help retain current customers, gain new customers, determine which customers are the most profitable, and so on. In relation to Financial Crime and Compliance Management, this data can help trace fraudulent activities, terrorist financing, insider trading, and the like. And yet, analysts in this industry will tell you that 70-90% of data is not utilized by banks.
Viewing data as an asset is most important. The Chief Data Officer role was created specifically for this – for figuring out how to monetize data (as well as managing data for regulatory purposes). And since the realm of large scale data management, computing, and predictive analytics is now a “science” and is being taught at universities, the supply for such talent is growing along with the demand. Therefore banks can hire people with these skills for a lower cost.
With the right people in place and a huge amount of data coming in, financial institutions are creating data lakes and dumping data into them. Unlike a data warehouse, data lakes allow raw extracts from all sources (traditional source system extracts, weblogs, emails, etc.) to be dumped directly into cheap storage. No data transformation is done. Enterprise Metadata servers are used to identify which elements are stored where and which ones are authoritative. As a crucial initial step, implementing the right analytical platform can provide governance and modeling capabilities, and therefore data lakes can run analytical use cases directly as opposed to replicating the data into different data marts. With the right platforms to govern the access, quality, lineage and curation of these big data repositories in place, Financial Crime and Compliance are emerging as some of the first use cases to be tackled with these data lakes. The use cases that are emerging can be broadly classified into:
- Cases where large amount of historical data needs to be processed, such as performing optimizations (below the line analysis) or look-back reviews. These are possible at a fraction of the cost of performing these exercises earlier in traditional relational systems.
- Cases where the hitherto sheer volume of data would make it impossible or cost prohibitive for institutions to run models and deterministic rules on all data. The financial institutions used to sample or reduce the volume of data to be processed using some risk-based justification. Now they are able to process ALL data.
- To solve problems such as terrorist financing, human and drug trafficking and trade finance, where bringing in external sources of data, most cases of which are unstructured or semi-structured, increased the accuracy of detection than just looking at financial transaction patterns.
As the quantity of available data will most definitely continue to increase, especially in the financial services industry, implementing an analytical platform with governance and modeling capabilities will significantly help banks address financial crime and compliance. Governance principles are more easily applied now that data silos have been broken down and modelers are creating datasets directly on data lakes. A cultural evolution within recent years has embedded analytics into decision and operational processes, making data management even more important. Hopefully, with the right analytical platform in place, financial institutions can create data-based products and services to satisfy consumers’ needs.
As always, I would love to hear your comments.