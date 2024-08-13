Cloud Fundis solved the problem by containerising data processing for Capitec Bank.

Data is the lifeblood of the digital age, and the ability to quickly and accurately analyse ever-increasing volumes of data is often what sets organisations apart from their competition.

The sheer volume of data, coming from various different modern and legacy systems, can make the analytics process a complex, time-consuming and expensive task – even with the assistance of sophisticated cloud services.

Capitec Bank found itself in a position where it has to process thousands of tables from its data lake every day. The content ranges from card processing data to financial reporting, and originated in a variety of systems, both legacy platforms and modern database management systems like PostgreSQL and Microsoft SQL.

The bank is migrating to the cloud, but it still runs some of its systems on-premises and on a broad range of technologies.

“There are a lot of interactions and dependencies between the various systems,” explains Hamish Whittal, founder of Cloud Fundis.

Due of the volume, scale and complexity of its data analytics requirements, Capitec Bank – like many other customers in the same position – found that data processing using standard cloud services was very costly, often contributing the lion’s share of the cloud costs for the business. Migration from on-premises to the cloud can be both challenging and expensive.

“It’s a reality that every customer is cost-aware and that the cost of processing large volumes of data can quickly become painful,” Whittal says.

Cloud Fundis is an IT consultancy in the analytics and data environment, offering AWS solutions. It developed a solution for Capitec Bank that not only makes their data processing task quicker and easier, but has also reduced the costs by a minimum of 50% (in some cases we’re seeing upward of 80% savings).

“For any modern business, processing analytics data using standard AWS services like AWS Glue, AWS EMR and AWS EMR on EKS is expensive,” Whittal explains.

Cloud Fundis solved the problem by containerising data processing for Capitec Bank.

“We created a new service, based off the successes we had for Capitec Bank – dubbed BrightSpark – built on native services such as AWS ECS, AWS EKS, AWS Batch. Our big change was creating a simple-to-use, custom-written API (application programming interface) with which customers can interact to manage and run their jobs. The core components of BrightSpark can be deployed into a customer's existing AWS estate, allowing them full control over important aspects like data security, governance, etc,” explains Alex Morton, DevOps lead at Cloud Fundis.

“From the customer’s side, they simply send a Spark or Python script through our engine, which modifies and optimises the code. This transformed and optimised code can then be executed using BrightSpark.”

The difference with BrightSpark is that the customer jobs are completing at substantially less cost, so significantly less is spent on cloud compute.

"Optimised code allows customers to process data faster, enabling quicker decision-making and enhancing competitiveness," Whittal adds.

The system was first conceptualised about eight months ago and went into proof of concept (POC) two months later. After various tweaks and modifications, it went live approximately six months after the first concept.

The results have exceeded expectations, with Capitec Bank’s cost to extract and transform thousands of datasets each day dropping by more than 50%. By automating and simplifying the process, customers save time and are also able to free up valuable human resources.

“We have taken away the complexity,” Morton says. “The customer doesn’t need to have the containerisation skills; they simply make an API call with the code and we take care of the technicalities and orchestration of services like Kubernetes, EKS and ECS.”

Importantly, BrightSpark also includes a full reporting capability, so customers can see exactly what each job costs in time, compute and money.

With such promising early results, Cloud Fundis is now working on expanding the scope of BrightSpark. “Today, it is generalisable to Apache Spark and standard Python jobs,” Morton says. “But it is a fairly universal platform and, in future, will be able to handle additional data science processes, streaming jobs and Jupyter notebooks. And this will pave the way for more cost savings across many workload applications.”

Whittal emphasises that the re-platformed solution is still built on standard AWS services such as AWS ECS, AWS EKS and AWS Batch.

Cloud Fundis only works with AWS as its cloud provider. “It is a great platform, easy to use and with great support,” he says.

While developing BrightSpark, the Cloud Fundis team found AWS to be massively supportive and responsive.

“We had to lean on AWS support a bit to get the project across the line,” Morton says. “Whenever we ran into issues during our build they were fast and efficient in fixing them. Some of the functionality that we needed isn’t a common use case for these services, but AWS was proactive in helping us to do it.”

The company believes its relationship with First Distribution will help it develop and further market its BrightSpark service.