Subscribe

Is your database a data dumping ground?


Johannesburg, 07 Oct 2002

Alan Raubenheimer, CEO of Embarcadero Africa, examines some of the ways the value of corporate databases can be preserved.

The relational database remains the foundation on which most large corporations build their business processing. Many now find themselves with a database that, while ideally scoped up-front, soon begins to exhibit problems:

* A database design that no longer fulfils business needs;

* Uncontrolled data growth and resultant poor performance; and

* Shrinking batch windows and increased processing requirements.

Vendors have introduced various innovations to help companies live with growing legacy databases, but few companies can keep moving to the latest versions or introduce the changes required to leverage the benefits newer versions promise.

Consequently, they opt to live with what they have, throw more hardware at the problem or change their business to work around these problems - none of which is ideal in a competitive market.

Clearly, it is not good enough to allow the database to become a dumping ground for data, and there has to be a consistent strategy for managing its growth and performance.

Know your systems

After a while, many companies cannot identify redundant data or objects (such as stored procedures and packages). The reason is that systems have been allowed to evolve without reconciling them to the data design. At the very least, the fundamental database (entity relationship, or ER) design must be established, along with the design of the major applications. An ER and UML modelling solution will assist in this.

Pruning

Once the interaction between front-end and database systems is established, the bold step of getting rid of the dead wood must be taken - with insight and caution.

This involves backing up and dropping tables and stored procedures that are no longer used and possibly implementing an archiving strategy which allows free access to historical data.

Examine the processes that write the large amounts of data: it is often not so complex to change processes that are dumping Megabytes of unused data into the corporate database.

Indexing

Effective indexing is still the single most important - and cheapest - factor in providing optimal performance.

Unfortunately, there seems to be a common perception that indexes are cast in stone, and few DBAs are willing to allow their indexes to be changed (with good reason - poor indexes can lead to disastrous under-performance). Many database developers also do not fully understand the right approaches to indexing.

Here are a few suggestions to using indexes properly:

* Document every index access per table: this is a tedious task, but once completed, adding new indexes, making index changes and setting standards can be undertaken from a position of knowledge.

* Impose a standard of performance: no database code should be taken live if it does not use indexes effectively.

* Index temporary tables: much processing happens around temporary tables. These should be indexed the same as any permanent table.

* Index for the situation: despite the fact that most indexes are a permanent feature in the database, one can still build and drop indexes to improve performance for specific tasks, such as large, long running batch processes and complex sorting. This simple approach often yields dramatic performance improvement.

* A related issue is that of declarative referential integrity (building primary and foreign key constraints in the design). This is often one of the prime contributors to poor performance and an unwillingness to change indexes. Simply put, primary and foreign key constraints build indexes that impose a very rigid construct in the database. With just a little forethought it is possible to make the indexing approach more flexible while maintaining referential integrity.

When it comes to performance, my philosophy is "Never accept the mundane". If a process is performing badly, fix it. It will only deteriorate and ultimately cost you big. To this end, there are a number of powerful solutions that allow developers to debug, identify bottlenecks and build optimal database code.

Data marts

A well-designed data mart provides a good alternative to a costly data warehouse. Properly implemented, the data mart can be used as the target for long, analytical or reporting queries, alleviating some of the load on the transactional database.

However, without the right tools and proper commitment, the data mart will not succeed.

Hardware

Larger servers, faster disks and processors and more memory remain the first haven for most companies because of their relative lower cost (when compared to months of consultancy with uncertain results) and high-tech lure. This is always a safe, short-term solution, but the problems of unmanaged data will undoubtedly resurface in time, probably with greater complexity

Archiving

Archiving historical data is often a scary proposition. Companies are afraid that the impact on the business can be harmful. Industry constraints also force companies to adopt the view that a certain number of years` worth of data must be immediately available.

While there is no such thing as a simple archiving strategy, it should be on the agenda of every company with a growing database that is a few years old. A consistent archiving approach will provide a means of ensuring that database growth does not continue unchecked.

The best approach is one that is tailored to the specific needs of the company, the primary target being transactional data that does not need to be accessed readily or frequently. It is not always necessary to take archived data offline - there are various methods (and industry solutions) that allow for archived data to be readily accessible.

For archiving to succeed, careful planning and complete business buy-in are imperative, since it will fundamentally change the way the business accesses data and also what data is available.

A well-defined business systems and data interaction model will provide a clear indication of the path to follow.

Conclusion

Corporate data remains possibly the most valuable asset any company possesses. Managing, protecting and using this data is vital - no company can afford to allow its corporate databases to simply become a dumping ground for data that is not used properly and a breeding ground for problems such as poor performance and unchecked space utilisation.

If your company is serious about its competitive-edge, it should be serious about caring for its data.

Share

Editorial contacts