Data Archival with MongoDB using Tag Aware Sharding
I met a customer who is using a Relational Database as a back-end for a popular core banking
solution. They had a major challenge with the housekeeping of General Ledger (or Transaction
Posting) tables. These are those tables in any core banking solution which grow the most as bank
operates. The tables get a new entry for every new financial transaction e.g. ATM Withdrawal,
Account Transfer, Charges Collection, Interest, Application, POS Transaction etc. Huge table size not only take huge space, but also make queries run slower. Again, these are the tables which get lots of hits in terms of inquiries viz. Mini statement, monthly statement, charges accrual processes (which adds up the total transaction count/value and apply charges) and many such processes in the End of Day Batches or End of Day Reports.
Hence, it’s important to keep these tables thin and smaller. Many core banking solutions provide
archiving/housekeeping features which would purge these tables. But while these tables get
multiple queries for current, relevant data, they face most of the historical inquiries e.g. a customer
applying for Credit card may ask for the last few months or in some cases few years of account
statement. To address these requirements, I have seen two common approaches used by technology
teams –
- Restore a database from archives and generate the statement
- Fetch the historical statement from an Archive Database
The main issue with both the approaches is that they tend to be slow as the size of DB increases.
Every time a customer requests for older statement, the bank’s IT department struggles hard to live up to bank’s SLA.
This is where I see a case for using MongoDB. Specifically the Tag Aware Sharding feature of
MongoDB. I can tag my data as hot, warm and cold. That is, I can say ‘hot’ data is the data which is
not more than 1 year old. Since I know that most of the historic inquiries will be on this data, I would
store that on a faster disks and powerful machines. For most of the banks ‘warm’ data would age
between 1 and 3 years. I would store them on a bit slower disks. The ‘cold’ data is the data is which
is least frequently requested and can be stored on tape drives.
What’s best is as the records move from ‘hot’ tier to ‘warm’ tier, MongoDB would take care of
their movement. This could be an effective data archival strategy and plus it offers you horizontal
scalability, i.e. you can add more nodes in ‘warm’ or ‘cold’ tier, as the archive data grows. Moreover,
your IT team doesn’t need to talk to multiple nodes, which are storing the data, they can just query
from the central query router. If you would like to read more details and ‘know-hows’ Tag Aware
Sharding, you may find this [link: https://docs.mongodb.org/manual/core/tag-aware-sharding/]
helpful.
When I discussed this with my Business Development team, we realized that the problem isn’t
unique to the banking industry. It is relevant in the Telecom segment where bills and accounting
details of many months have to be maintained. Whenever we work with our customers, we try
to understand the underlying business problems and build a solution which would be helpful and
scalable for them. If the story seems relevant to you, drop us a note at success@ashnik.com