When viewed from a high level, the cost of poor quality data can affect a company’s bottom-line in two ways. First, there’s the cost of scrap and rework, and second, missed opportunities.
An example of scrap and rework costs might be when an agent errs in recording a customer’s address details, and consequently a marketing premium is sent to the wrong address. Later, the customer calls to complain.
The complaint needs to be handled (extra call center time), the address details then need to be entered a second time (rework), and a second premium needs to be sent. The initial premium is scrapped.
An example of missed opportunity costs might be a credit card that is not granted because the calculated credit score (erroneously) falls below the cutoff score, and the customer is rejected. The opportunity to make a sale is lost, when marketing costs were already incurred.
In this whitepaper, I attempt to supply a comprehensive list of potential data quality costs.
Cost Categories of Information Quality
The costs of data quality can be broken down in 3 categories:
1. Immediate costs of non-quality data. This happens when the primary process breaks down as a result of erroneous data. Or, information scrap and rework, when immediately apparent errors or omissions in the data need to be circumvented in support of the primary business process. For example, data entry of a non-valid ZIP code requires back-office staff to look this up again and correct it before sending out a product.
2. Information quality assessment or inspection costs. These are costs/efforts expended for (re)assuring processes work properly. Every time a ‘suspect’ data source is handled, the time spent to seek reassurance of data quality is an irrecoverable expense.
3. Information quality process improvement and defect prevention costs. Broken business processes need to be improved to eliminate unnecessary information costs. When a data capture or processing operation malfunctions, it requires fixing. This is the long-term investment needed to avoid further losses.
1. Immediate costs of non-quality data
For example, capturing erroneous customer data like address, contact information, account details.
– Irrecoverable costs; e.g. premiums sent in vain to non-existing customer addresses.
– Liability and exposure costs; for instance credit risk losses when data quality problems cause erroneously offering credit to a customer who is not considered creditworthy on the basis of self-supplied information.
– Recovery costs of unhappy customers; time spent handling complaints. Information Scrap and Rework
– Redundant data handling; because many processes are ‘known’ to rely on inaccurate data, it is customary for front-line and back-office staff to maintain little private “lists” of all sorts. These serve merely as a backup or improved version of what is available in the primary database. Apart from further problems like ‘maintenance’ and ‘recovery’ not being possible for these private lists, such activities are redundant, and non-value adding.
– Costs of chasing missing information; a field that has not been filled out properly, or not at all, needs to be looked up later on in the process. Excess time and costs, inefficiency, and not in the least place an aggravation factor. Time spent looking up missing information is not being spent servicing the customer better.
– Business rework costs; e.g. reissuing a credit card that was sent out with a misspelled customer name.
– Workaround costs; when a primary key is missing or faulty, laborious fuzzy matches need to be performed to match records. This kind of work is challenging, and eats up precious time of the most highly skilled database workers.
– Data verification costs; e.g. costs of reworking data entry. data hk But also, analyses by knowledge workers must begin by checking the correctness of data available before beginning analysis.
– Program rewrite costs; rewriting programs that fail to run because of invalid entries found in the data. E.g.: sometimes pre- or post-conversion scripts needed to be written to deal with the content of source systems prior to loading in a Data Warehouse environment.
– Data cleansing and correction costs; when feeds are processed to load into the Data Warehouse, these data need to be transformed for reasons that stem from quality issues. Any data cleansing and scrubbing that needs to be performed in the ETL process is essentially redundant and unnecessary insofar this is caused by faulty initial data entry. For example, when a mailing is done on the basis of a problematic customer file, dedicated scripts need to be run to deal with the (known!) errors in the address fields. This process needs to be repeated for every mailing. Since such customer files are often shared across departments and systems,source changes need to be negotiated with all end users of these data.