Subscribe

Lightning-fast analytics unlocks immense power of big data

By Paul Morgan, Business Unit Lead at Altron Karabina

Johannesburg, 09 Nov 2022

Our bodies are made up of amazing systems that, when healthy, work seamlessly towards the common goal of keeping us alive and reacting appropriately to environmental stimuli. Our senses feed a variety of information into our brain and our nervous system does the rest. The site of transmission of electric nerve impulses is called a synapse – a crucial element in our bodies, connecting various systems and enabling fast insights and decision-making.

It’s no surprise that Microsoft chose the word synapse for Azure Synapse Analytics, the vendor’s flagship database platform that enables hyper fast analytics. Synapse is a full analytics ecosystem, with an ability to quickly process massive amounts of inputs and turn data into insights, much like the synapses in our brains.

Many organisations have historically used Microsoft’s SQL Server data platform for their data warehouses. It’s been a reliable and trustworthy platform for hosting both application databases and data warehouses. In many ways, SQL Server is the Land Rover of database platforms. Synapse, on the other hand, is the Ferrari of the analytics world and is designed only for analytics workloads.

With most legacy data warehouses, data engineering teams dealt with structured data complete with rows and columns that were all in a fairly fixed form. These days, we have a wide range of data sources, including structured data, documents, web application APIs, streams from various devices and sensors, text files from various sources and much more. There are entirely new families of data that are not always easy for a platform such as SQL Server to manage to ingest – and this is where Synapse is positioned. Synapse uses Synapse Pipelines, which have a wide array of connectors, to bring in a massive variety of data formats. In addition, the Spark engine built into Synapse can also use languages such as Python to ingest more complicated source types, such as PDF files or even video files.

As well as dealing with many different types of data, the Synapse platform can be used for both traditional analytics and advanced analytics, such as machine learning. In traditional data environments, businesses would often extract data from their data warehouse and move it into their statistical analysis environment (using tools such as SAS). The statistics team would feed the data into their machine learning model, which would run overnight and then produce a forecast. This forecast would then be extracted from the SAS system and be brought into the operational ERP tool, and then that would be used for demand and supply planning. That, in turn, would be fed back into the data warehouse. In the Synapse environment, machine learning models can run on data stored in the Azure data lake, or the Synapse SQL environment. There is no longer any need to move data between environments – it is all one Synapse Analytics environment.

As an example, consider a real-world example of a Synapse build for an FMCG manufacture that partners with Altron Karabina.

The manufacturer supplied products to different retailers across South Africa, and the data generated from each of these retailers was different. The client wanted clear visibility and the ability to predict and make supply chain decisions based on real insights. It required that Altron Karabina ingest huge amounts of different data from various retailers in different formats. Additional data sets included promotional activity data, a large portion of which was in Excel format.

With the Synapse build, Altron Karabina was able to bring the different data sources together, and use historical and current data to calculate forecasts such as ‘how many of this SKU would be sold in this particular store over the next six weeks’. This ability fundamentally changed thinking and planning around efficient supply chain management, and in turn, was synced with promotional activities across regions.

In summary, the business could now pull in a vast array of different datasets, both internal and external. It was able to use machine learning to produce forecasts, and use Power BI to visualise the data to make sense for product managers to make decisions. All components of the solution were accessed through the same Synapse environment – ingestion, transformation, machine learning notebooks and Power BI dashboards.

Synapse can also enable cost savings compared to the complexities of multiple big data environments. These costs are not only saved in the actual environment design but are recouped in lower labour costs as IT shops now don’t have to upskill different sets of people to handle different functions. The platform is designed to ramp up or scale down, and so being consumption-based, enables greater cost control. The value of time savings, and the ability to proactively respond to crises or opportunities quicker, contribute to cumulative cost savings over time.

Speed and performance are where the Ferrari analogy comes into its own: quicker decision-making, quicker insights, quicker in the build phase, and when in use in a live environment, the ability to work through millions of lines of data and get results immediately – even when changing parameters or scenarios. The engine revs up and entire datasets are churned into insights within seconds. This is quite literally a game-changer – the ability to produce far more insights, from different data sets, in far less time, all in one place. 

Share

Editorial contacts