What is a data fabric?

Johannesburg, 21 Sep 2021

Bhavesh Bhana, DataOps Technical Specialist, IBM.

Before digital technologies can genuinely support people and processes, companies must facilitate access to data. Yet, for such a small word, data represents an enormous challenge. Not only is there a lot of it, but the quantities keep growing - and extend well beyond organised formats into unstructured data. Beyond those concerns, access is not enough. A data environment must also cultivate a data-centric culture. The right data needs to reach the right people and systems in the right context.

"Data is an integral element of digital transformation for enterprises," says Bhavesh Bhana, DataOps Technical Specialist at IBM. "But as organisations seek to leverage their data, they encounter challenges resulting from diverse data sources, types, structures, environments and platforms."

The multidimensional data predicament becomes more complicated when organisations adopt hybrid and multi-cloud architectures, he adds. "For many enterprises today, operational data has largely remained siloed and hidden, leading to an enormous amount of dark data."

One thread to tie data together

These predicaments are very current, prompting modern solutions. Suitably, data fabric, a concept coined less than a decade ago, is rising as the best answer to resolving data access and movement issues.

To understand what a data fabric is, we must look at what else organisations have tried. The two most common ways to address data access are either through point-to-point integration or the introduction of data hubs. But according to Bhana, neither are suitable when data is highly distributed and siloed.

"Point-to-point integrations add exponential cost for any additional endpoint that needs to be connected, meaning this is a non-scalable approach. Data hubs allow for easier integration of applications and sources but exacerbate the cost and complexity to maintain quality and trust of data within the hub."

As the name suggests, a data fabric covers different data sources in a blanket fashion, particularly across hybrid- and multi-cloud landscapes, agnostically creating business-ready data access that can scale yet keep costs and complexity down. Data fabric architectures exist specifically to address the challenges of hybrid data environments. A data fabric effectively lets you have your cake and eat it: striking a balance between decentralisation and globalisation by acting as the virtual connective tissue among data endpoints.

Data fabrics are dynamic through platforms, automation and federated governance - all thoroughly cloud-era innovations. If deployed successfully, it creates a network of instantly available information to power a business.

The anatomy of data fabrics

Suffice to say, data fabric architecture is incredibly useful for businesses that want to exploit distributed data from different sources and types, including unstructured and metadata, and managed by a central platform, Bhana adds.

"The core of the data fabric architecture is a data management platform that enables the full breadth of integrated data management capabilities - including discovery, governance, curation and orchestration. However, a data fabric advances and evolves from traditional data management concepts such as DataOps, which only focuses on establishing practices, to increase the level of data operationalisation. It is built upon a distributed architecture and advanced technology able to address the needs that arise from extreme diversity and distribution of data assets."

We can categorise data fabrics through four capabilities:

Knowledge, insights and semantics: The data fabric provides a data marketplace and shopping experience. It automatically enriches discovered data assets with knowledge and semantics for improved user experiences.

Unified governance and compliance: It allows local management and governance of metadata but supports a global unified view and policy enforcement. Policies application, data asset classification and curation, and queryable access routes for catalogued assets are automated.

Intelligent integration: The data fabric platform provides automated flow and pipeline creation across distributed data sources, enables self-service ingestion and data access (with enforcement of data protection policies), and automatically determines best-fit execution for optimal workloads distributions and correcting changes in data classification (aka, schema shift).

Orchestration and lifecycle: Finally, the data fabric must enable the composition, testing, operation and monitoring of data pipelines. It must infuse artificial intelligence to automate tasks, and self-tune, self-heal and detect source data changes.

There is an even more straightforward way to identify a data fabric architecture's value, Bhana concludes.

"A data fabric connects data anywhere and removes data limits on where workloads can run. It's managed and scaled through an AI-supported platform and makes curated data available with the optimum balance of cost, performance and compliance. Ultimately, a successful data fabric takes much of the complexity and headaches of enabling diverse data sources into a cohesive landscape that serves the business and its people."

To find out more about how the data fabric addresses data access problems, click here attend an in-depth webinar on the subject.