Subscribe

Making sense of the terrain

Text analytics helps to make sense of unstructured data, gaining a better understanding of the customer.
By Anne Botha, Technical manager of PBT.
Johannesburg, 07 Apr 2008

Value as a selling proposition is as much a technique as an understanding of what a customer wants. A lot of what we understand about a customer is sourced from traditional operational systems.

We look at orders that are placed, colours that are chosen, sizes that are taken (and at times returned for others), and we draw conclusions about what is "wanted" or not. It is the length and breadth of our captured engagement with the customer and forms the basis of our analysis of their behaviour and our understanding of the relationship.

The relationship is somewhat sparse for detail if considering the wealth of things we don't know about someone when they choose to buy something (or decide against it).

We are not part of conversations which happen that describe how things really went, or how the product or service could have been better, or what they preferred instead and why.

These aspects are so important to understand because they enable us to develop better brand awareness, investigate new product innovation, market research for new service/product and a deeper picture of recurring issues/problems.

The challenge in digging out and revealing this information is interesting in that this information is captured in an unstructured way. Unstructured data groups together are all the things that are not captured in a tabular or comma-delimited form. It is usually found as a document, or a comment somewhere on a forum, the rant that ensues when things go wrong, an e-mail that is sent to a colleague to ask if the Web site is back up, the video file copied from the server to your computer to watch a webinar on the latest BI trends in 2008 or a gadget blog read everyday because you love mobile technology.

Lacking structure, needing analysis

In a lot of ways we lack that human step in the process that listens and absorbs both the good and bad feedback.

Anne Botha is technical manager at PBT.

Unstructured data is precisely that, unstructured. It doesn't come 'prepacked' as traditional data does with existing tables and apparent context existing in the data and associated tables. It poses as interesting modelling challenge in that the transaction aspect of it represents the dimensions (or context if you prefer of the event) and the event itself (which would be the fact in this case if wanting to compare with standard dimensional modelling technique) is encased within an informal structure.

Thus, a different analytic approach is needed as what is lacking is appropriate structure and semantic association. Text analytics (or mining if you prefer) allows us a view into what exists in an unstructured state.

In essence, it is a variation on data mining, where the focus is on semantics as opposed to number association and formula calculation. Take the update of a document after it has been uploaded into a document management system as an example: the transaction in the database encapsulates the metadata around the moment regarding time of upload, person who uploaded the document, name of document, size changes and date. It does not tell you what changed - and that is the fundamental difference.

On a basic starting level, text analytics can make something like finding a document easier through the creation of a taxonomy and classification structure. It can further streamline this discovery process with audits on documents if integrated into a formal document management system to ensure (and at times enforce) a "one source of truth" document (as opposed to have several copies lying everything).

Beyond this, there are discovery systems which do funky things like generate metadata from documents and classify the documents "automagically", the development provision of platforms for content applications that will allow the application functionality to be isolated from the data so that change can be managed within the application layer without affecting the data, data integration that allows access to repositories and application-specific formats and integration with enterprise level applications.

Fundamentally, businesses and customers have actually grown apart in that many of the interactions between the business and the customer is done through some sort of system. In a lot of ways we lack that human step in the process that listens and absorbs both the good and bad feedback. Now it sits on some sort of system, either within the "walls" of the business or on an outside forum (which the business might not even be aware of).

There is massive competitive advantage to be gained from being able to pull that type of information out, surface it and be able to analyse it, to get a more realistic picture of what is happening with customers.

* Anne Botha is technical manager at PBT.

Share