Metadata solves big data juggernaut
As organisations try to get their arms around an ever-increasing amount of data in their networks, some are turning to metadata to more effectively manage their big data.
So says Greg Milliken, vice president of marketing at enterprise content management solutions provider M-Files, who is of the view that the "data within the data" can be analysed to help companies better organise and find documents, instead of users having to manually search through folders within networks.
Wikipedia defines metadata as "structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use or manage an information resource", says Milliken.
Examples of the most basic metadata associated with documents include informational properties such as author, date created, date modified and file size, he explains.
"Traditional libraries provide an early example of metadata applied in the physical form to books, but in our increasingly digital world, the primary purpose of metadata remains the same - to more efficiently find, manage and track information. This, in fact, continues to be the essence of what makes metadata so critical to today's most widely used applications," Milliken says.
He points out that in the era of big data where organisations are amassing large and disconnected document and data repositories, metadata serves as the bridge that connects a company's structured data (information within ERP, CRM and other database applications) with its unstructured content (Word documents, Excel files, PDFs, photos, videos etc), which results in a powerful and more effective way to access, organise and track large amounts of business information.
"In addition to dealing with massive volumes of information, big data problems also encompass when data is being processed at high velocity, or when data varies widely. Metadata helps address all these problems in that it helps identify important characteristics of information, so in addition to identifying the desired important information more quickly and effectively, it also quickly helps identify what information is not desired, effectively reducing the amount of information that has to be dealt with, thus speeding up the entire process and eliminating the wasted time trying to process information that is not applicable."
According to Milliken, metadata also provides clarity about the origin and history of data, as well as helping to ensure that workflows and business processes are properly followed and administered.
For example, he explains, metadata may include information on the development and life cycle of a document, including the users, processes and applications involved in its creation and revision, as well as its ultimate archival, retention and destruction. This can include granular details that drill down to the exact timestamp of changes and actions, such as reviews and approvals, as well as the access permissions involved in performing them, he adds.
"In other words, metadata organises and tracks the entire digital life cycle of important business information, including the processes, procedures and users that affect it, providing a precise audit trail that can prove invaluable - or mandatory, in highly-regulated industries - to a business at any point in time."
He also notes that businesses increasingly rely on information to make smarter and faster decisions for competitive advantage, and success often relies upon quick access to the most accurate versions of documents and data sets.
However, the ability for companies to provide employees with quick and easy access to the precise information they need to do their jobs has become more difficult in the era of big data because the amount of information and the systems for storing and managing this information is growing larger and more complex every day, he states.
"In this environment, more and more businesses are challenged with providing their staff with quick access to information that resides within their various internal business applications and databases. As a result, employees waste enormous amounts of time every day searching for the information they need," says Milliken.
"Massive amounts of structured data and unstructured content often reside within multiple and disconnected platforms, applications, locations and devices, often with different interfaces or varying levels of complexity to learn and remember. This not only makes it harder for employees to find and access the information they need to perform their jobs, but also creates compliance and security risks," he concludes.