About
Subscribe

Data quality is not the real issue: intelligent search software is

By Geoff Holloway
Johannesburg, 09 Jun 2003

Expanding databases, increasing identity fraud, the shift toward customer relationship management and the proliferation of data on the web are all putting pressure on search and matching systems.

That`s the message from Geoff Holloway, president and CEO of (SSA). Holloway was in South Africa to deliver a presentation at a conference hosted by , SSA`s South African distributor.

SSA develops and markets software products which significantly enhance an organisation`s ability to search, find, match and group identity data within its computer systems and network databases. Its South African customer base includes many large corporates.

"The trick is not to have perfectly accurate data - in the real world, that`s impossible - it`s to get the benefit out of existing data despite its poor quality," Holloway said. "Intelligent search software is able to make matches of data even in the face of errors and variations. The system needs to behave like a first-class user, finding all the relevant data, assessing it and deciding whether a match can be made."

In addition to the natural error and variation that unavoidably occurs in identification data such as names, addresses and dates of birth, systems also need to overcome fraudulent modification of data. This type of error is more aggressive as it is introduced deliberately to defeat or control matching systems.

"Whether the process is an online inquiry, a batch-matching process or a criminal record search, the system needs to mimic the human expert in finding all candidate records. The solution lies in finding a balance between performance and quality, between under-matching and over-matching," he said.

Holloway offered the following rules of thumb for making the most of your data:

* Retain and use original raw data wherever possible
* Don`t automatically believe the order, format or parsing of the data you are offered
* Use smart indexing that supports your data
* Use intelligent matching software that works in line with the real world

He also warned against a number of techniques which are popular but often ineffective. These include exact name searches, searching with wildcards, keying partial words to save time, text retrieval software and name search and match codes. Although these methods will produce some results, they tend to be inefficient, inaccurate and incomplete.

"The answer is to overcome the error and variation in the identity data while maintaining acceptable performance and not missing candidates or generating too many false matches," Holloway said. "Such a solution needs intelligent, scalable algorithms which, through the use of fuzzy keys and search strategies, return all the candidates an expert user would select."

SSA`s product range includes SSA-NAME3, Identity Systems (IDS) and a Data Clustering Engine (DCE). Holloway`s presentation focused on IDS, which provides online and batch searching, matching and duplicate discovery for all type of identification data. High-performance indexes are automatically maintained without changes to existing application programs. The design of IDS makes it the ideal solution for organisations implementing the latest technologies, including web-based applications.

Share

Editorial contacts

Petra Peacock
C-Cubed Communications
(011) 794 4665
petrap@iafrica.com
Laura Selmer-Olsen
Bateleur
(011) 463 5519