Data-driven decision-making made possible using a modern data stack


Johannesburg, 08 Mar 2019
Read time 5min 30sec

WeDoTech is a team of individuals whose mission is to simplify life using modern technology.

Norah Wulff, Head of Technology at WeDoTech, is a staunch advocate of the modern data stack tools entering the South African market. "WeDoTech's core focus is on providing companies with smart, effective solutions and, when it comes to analytics, Snowflake, Fivetran and Looker fit perfectly with this mantra."

Wulff has been involved in delivering BI solutions to companies for over 15 years, and has witnessed first-hand how the traditional data warehousing technologies cause projects to lose momentum, mostly due to factors that can be mitigated today.

Some common hurdles to overcome are:

ITWeb BI & Analytics Summit 2019

The ITWeb BI & Analytics Summit is SA's premier event for BI, analytics, data, AI and data science professionals to get up to speed on the latest developments in BI and analytics, network and exchange ideas on best practice. For more information, go to: https://v2.itweb.co.za/event/itweb/business-intelligence-and-analytics-summit-2019/#.

* Planning for the required hardware based on the estimated load and usage is often a thumb-suck exercise and requires a significant upfront capital expenditure. All too often, data requirements are curbed to reduce the required storage and computing power. The opposite approach can also be adopted where high-spec servers are acquired to handle the significant load that takes place for a few days of the month, sitting idle for the remainder of the time.
* Finding and retaining the necessary database administration skills to ensure the data is readily available when needed, and indexed accordingly, is challenging and costly.
* The development of the processes to extract, transform and load the data into a data warehouse can be costly and extremely time-consuming.
* Maintaining these processes also requires a lot of effort, especially when critical processes go wrong due to changes to the source systems or hardware limitations.
* The impressive visualisations and insights that were sold to you on purchasing your BI tool of choice becomes a distant memory when you have to wait three to six months (if you're lucky) to see the results, and hopefully the results are what you were expecting.

With a modern data stack, companies can quickly realise value from their data initiatives. With a modern data warehousing approach, the traditional ETL (extract, transform and load) process has now become a faster, more agile ETL approach, where all data is loaded and then the relevant data is transformed into useful information.

The new data stack (The New Data Stack: Fivetran, Snowflake and Looker)

The shift in focus to load all data has been made possible with modern cloud data warehousing solutions such as Snowflake. Snowflake's architecture has significantly reduced the cost of running a high-performance data warehouse by separating the storage component from the computing engine.

Storage on Snowflake is relatively cheap, enabling companies to load all of their data assets (structured and semi-structured) into a single platform. Snowflake uses state-of-the-art compression technology, making this even more affordable. The main cost of Snowflake lies in the computing power, which is billed on a per-second basis. Companies only pay if the computing engine (referred to as a virtual warehouse) is running a query, and can be automatically suspended once the results have been returned. This approach significantly reduces the idle time of the data warehouse, resulting in huge cost savings.

Snowflake architecture (https://www.snowflake.com/product/architecture/)

Companies can allocate different sized computing engines based on the usage requirements. Standard reports that run off of aggregated information can opt for a smaller-sized machine. Super-users such as data scientists can be allocated extremely powerful computing servers to ensure time is not wasted waiting for the results of complex queries. Data can be securely shared with third parties within minutes and without having to move the data. Another huge benefit to adopting Snowflake is the ability for companies to 'test the waters' and assess the usage and value of the warehouse, reducing the need for large upfront capital expenditure.

While this all might sound too good to be true, Fivetran provides companies with the ability to load data into the warehouse after a five-minute configuration. A recent case study on the Durban-based company the Ignition Group estimated the company will save R6 million, thanks to the implementation of Fivetran and Snowflake. The technology has reduced the time-frame of data science initiatives from two to three months, down to two to three weeks.

Using Fivetran's available connectors, companies can load data from third-party cloud applications (such as Salesforce, Xero and Google Analytics), internal company databases (MySQL, Oracle, SQL Server, Postgres), files, and events (such as Segment, Google Analytics 360, Kafka, Snowplow, and Webhooks). Fivetran creates and maintains the schema and tables for you and keeps the data up to date. Fivetran encrypts the data end to end in transit and at rest, and deletes the data from their systems after 24 hours.

The ease and efficiency of the modern data tools such as Snowflake and Fivetran is complemented by the data platform Looker. Looker is a complete data platform that enables companies to integrate, explore, visualize, and deliver data across the organization. Looker offers an excellent data governance semantic layer that ensures a single source of truth. Looker's lean technology pushes queries down onto high-performing data warehouses such as Snowflake, leveraging the power of the database to deliver fast results.

Looker architecture (https://looker.com/platform/overview)

Looker makes data-driven decision-making possible by embedding data into all of the processes of the organisation. Users can access the row-level detail behind the numbers, and advanced users can view the SQL queries that are generated to deliver to the results. All of the source code is stored securely in a git repository, making version control easy. Out-of-the-box models are available through Looker Blocks, accelerating access to analytics in the organisation. While Looker is not cheap, it is an excellent enterprise offering and one to be evaluated.

WeDoTech will be showcasing these tools at the upcoming BI Summit on 12 and 13 March 2019. If you are not able to attend the event, WeDoTech will be hosting two free morning workshops in Cape Town (4April 2019, Workshop17 Watershed) and Johannesburg (5April 2019, Workshop17 Sandton).

Book your spot now to test-drive the tools yourself. Register via the Snowflake Web site, or e-mail norah@wedotech.co.

Have your say
Facebook icon
Youtube play icon