`Intelligent` searching, matching software offers hope to embattled insurance fraud investigators

Johannesburg, 11 Jun 2002

Explosive growth in the volume of historical claims data presents insurers with a major obstacle in their fight against insurance fraud. Sam Selmer-Olsen, a director of , exclusive SA agent for (SSA) identification software products, analyses the problem and offers a cost-effective solution.

The primary focus in the fight against insurance fraud is in the areas of non-disclosure in policy applications and in claims processing to reduce the "one incident, multiple claims" or "one identity/one address, multiple claims" scenarios.

In this fight, insurance fraud investigation systems depend upon data about the names, addresses and other identification attributes of the people and organisations involved in policy applications and insurance claims, including applicants, the insured, payees, claimants, physicians, lawyers, witnesses, agents, employees and other parties,

Historical claims data is typically loaded into large-scale claims repositories. This data is then made available to insurance fraud investigators for case-led inquiries and claims audits; to claims processors for claims screening; to policy application systems for pre-screening new applications.

It is a fact of life that all such identification data suffers from unavoidable error and variation. It is normal that spelling and typing errors occur, nicknames and abbreviations are used, words are missing and out of order. Often the entity committing the fraud or perpetrating the crime is in fact trying to defeat existing matching algorithms by subjecting the identification data to deliberate, abnormal or extreme variation.

Despite this error and variation, insurance fraud investigators must be able to search, match, link and examine pieces of information from multiple sources, both internal and external, to discover connections that would otherwise remain hidden. Policy applications systems must be able to thoroughly search claims history and previous application data to determine if the organisation has had prior dealings with this identity.

There may also be a need to screen policy applications for compliance reasons, against national watch and alert lists, such as OFAC (Office of Foreign Assets Control) and the DPL (Denied Persons List). Underwriters, brokers, agents, primary insurers, and other are prohibited from engaging in transactions that involve various OFAC blocked countries and entities. These types of lists require extremely thorough screening, especially for commercial transactions where the penalties can be high.

All of this identity data requires sophisticated indexing that performs well regardless of the quality, format or country of origin of the data. It must be possible to reliably search or mine the large-scale databases using the names of people and companies, as well as their addresses and other identity data.

The technology that supports such searching and matching must be able to ensure candidates are found despite the unavoidable or deliberate variation and error in name, address and other identity data. The technology must cope well with data and any quality and completeness, as the source of such data can vary greatly in reliability.

Such search technology must not require that the data be cleaned or formatted, for reasons that include:

* The data may not be legally changed without approval of the customer or source organisation.

* Statistical techniques for enhancing data are "good for statistics" but introduce error that can be destructive for matching.

* Many cleaning techniques are not reversible, eg changing Bobby to Robert; changing St to Street when it is possible that it could be Saint.

* The user believes the transformed data is true and base decisions on it.

* Rejecting invalid data, which simply means it cannot be used for any purpose and all business value of that data is lost.

It is also highly desirable, and often essential that the search technology be able to search and match data from any country, and potentially in any character set.

Some solutions to the searching and matching requirements of such systems require skilled investigators who know when and how to vary a search or change the search data to cause the system to work more successfully. Boolean-based and wild-card searches are examples of these. A far better solutions uses automated search strategies that satisfy all permutations and variations of the search... the real solutions needs to be designed to find all the candidates regardless of the way the search data was entered, regardless of the quality of the data stored in the database, and regardless of the experience of the user. Such search strategies must of course provide real-time searching of all name and identity data. Online usage must satisfy the investigators and source systems need for fast response without any loss of quality of search, despite the quality of the data.

Another aspect of this problem area is the ranking of the results returned to the searcher. While diligent investigators can use sophisticated search tools well, it is not possible for the average user to spend day after day simply browsing historical data and do a good job selecting candidate matches; even the diligent user can get ineffectual at the job if it is a continuous activity. To better automate the searching, matching and screening process it is necessary that computer systems are designed to "mimic" the very best users when choosing among the possible matches. In the same way as human operators use names, addresses, dates, identity numbers and other data, the system must be able to use matching algorithms that effectively rank, score or eliminate the candidates.

The volume of data that is today available to these systems is growing explosively. Today it is time to invest in the core objective of these systems, that is to make sure that the highly valuable data that is stored in these systems can in fact be found, despite its error and variation. Similarly the value of high-end tools of investigation that provide "link analysis", "data clustering", or visualisation" can be significantly improved if they make use of the very best search and matching algorithms.

Editorial contacts