Data Quality Tools, Mailing Software, Lists, NCOA, Data Enhancements
Shopping Cart Cart | Newsletters | Search
Call 1-800-Melissa Data Quality Solutions Professional Services Downloads Support Resources Lookups Company


 7 Sources of Poor Data Quality
    By William McKnight, partner, Information Management, Lucidity Consulting Group

In recent years, corporate scandals, regulatory changes, and the collapse of major financial institutions have brought much warranted attention to the quality of enterprise information. We have seen the rise and assimilation of tools and methodologies that promise to make data cleaner and more complete. Best practices have been developed and discussed in print and online. Data quality is no longer the domain of just the data warehouse. It is accepted as an enterprise responsibility. If we have the tools, experiences, and best practices, why, then, do we continue to struggle with the problem of data quality?

The answer lies in the difficulty of truly understanding what quality data is and in quantifying the cost of bad data. It isn't always understood why or how to correct this problem because poor data quality presents itself in so many ways. We plug one hole in our system, only to find more problems elsewhere. If we can better understand the underlying sources of quality issues, then we can develop a plan of action to address the problem that is both proactive and strategic.

Each instance of a quality issue presents challenges in both identifying where problems exist and in quantifying the extent of the problems. Quantifying the issues is important in order to determine where our efforts should be focused first. A large number of missing email addresses may well be alarming but could present little impact if there is no process or plan for communicating by email. It is imperative to understand the business requirements and to match them against the assessment of the problem at hand. Consider the following seven sources of data quality issues.
1. Entry quality: Did the information enter the system correctly at the origin?

2. Process quality: Was the integrity of the information maintained during processing through the system?
3. Identification quality: Are two similar objects identified correctly to be the same or different?

4. Integration quality: Is all the known information about an object integrated to the point of providing an accurate representation of the object?

5. Usage quality: Is the information used and interpreted correctly at the point of access?

6. Aging quality: Has enough time passed that the validity of the information can no longer be trusted?

7. Organizational quality: Can the same information be reconciled between two systems based on the way the organization constructs and views the data?
A plan of action must account for each of these sources of error. Each case differs in its ease of detection and ease of correction. An examination of each of these sources reveals a varying amount of costs associated with each and inconsistent amounts of difficulty to address the problem.

Entry Quality
Entry quality is probably the easiest problem to identify but is often the most difficult to correct. Entry issues are usually caused by a person entering data into a system. The problem may be a typo or a willful decision, such as providing a dummy phone number or address. Identifying these outliers or missing data is easily accomplished with profiling tools or simple queries.

The cost of entry problems depends on the use. If a phone number or email address is used only for informational purposes, then the cost of its absence is probably low. If instead, a phone number is used for marketing and driving new sales, then opportunity cost may be significant over a major percentage of records.

Addressing data quality at the source can be difficult. If data was sourced from a third party, there is usually little the organization can do. Likewise, applications that provide internal sources of data might be old and too expensive to modify. And there are few incentives for the clerks at the point of entry to obtain, verify, and enter every data point.

Process Quality
Process quality issues usually occur systematically as data is moved through an organization. They may result from a system crash, lost file, or any other technical occurrence that results from integrated systems. These issues are often difficult to identify, especially if the data has made a number of transformations on the way to its destination. Process quality can usually be remedied easily once the source of the problem is identified. Proper checks and quality control at each touchpoint along the path can help ensure that problems are rooted out, but these checks are often absent in legacy processes.

Identification Quality
Identification quality problems result from a failure to recognize the relationship between two objects. For example, two similar products with different SKUs are incorrectly judged to be the same.

Identification quality may have significant associated costs, such as mailing the same household more than once. Data quality processes can largely eliminate this problem by matching records, identifying duplicates and placing a confidence score on the similarity of records. Ambiguously scored records can be reviewed and judged by a data steward. Still, the results are never perfect, and determining the proper business rules for matching can involve trial and error.

Integration Quality
Integration quality, or quality of completeness, can present big challenges for large organizations. Integration quality problems occur because information is isolated by system or departmental boundaries. It might be important for an auto claims adjuster to know that a customer is also a high-value life insurance customer, but if the auto and life insurance systems are not integrated, that information will not be available.

While the desire to have integrated information may seem obvious, the reality is that it is not always apparent. Business users who are accustomed to working with one set of data may not be aware that other data exists or may not understand its value. Data governance programs that document and promote enterprise data can facilitate the development of data warehousing and master data management systems to address integration issues.

MDM enables the process of identifying records from multiple systems that refer to the same entity. The records are then consolidated into a single master record. The data warehouse allows the transactional details related to that entity to be consolidated so that its behaviors and relationships across systems can be assessed and analyzed.

Usage Quality
Usage quality often presents itself when data warehouse developers lack access to legacy source documentation or subject matter experts. Without adequate guidance, they are left to guess the meaning and use of certain data elements. Another scenario occurs in organizations where users are given the tools to write their own queries or create their own reports. Incorrect usage may be difficult to detect and quantify in cost.
Thorough documentation, robust metadata, and user training are helpful and should be built into any new initiative, but gaining support for a post-implementation metadata project can be difficult. Again, this is where a data governance program should be established and a grassroots effort made to identify and document corporate systems and data definitions. This metadata can be injected into systems and processes as it becomes part of the culture to do so. This may be more effective and realistic than a big-bang approach to metadata.

Aging Quality
The most challenging aspect of aging quality is determining at which point the information is no longer valid. Usually, such decisions are somewhat arbitrary and vary by usage. For example, maintaining a former customer's address for more than five years is probably not useful. If customers haven't been heard from in several years despite marketing efforts, how can we be certain they still live at the same address? At the same time, maintaining customer address information for a homeowner's insurance claim may be necessary and even required by law. Such decisions need to be made by the business owners and the rules should be architected into the solution. Many MDM tools provide a platform for implementing survivorship and aging rules.

Organizational Quality
Organizational quality, like entry quality, is easy to diagnose and sometimes very difficult to address. It shares much in common with process quality and integration quality but is less a technical problem than a systematic one that occurs in large organizations. Organizational issues occur when, for example, marketing tries to "tie" their calculations to finance. Financial reporting systems generally take an account view of information, which may be very different than how the company markets the product or tracks its customers. These business rules may be buried in many layers of code throughout multiple systems. However, the biggest challenge to reconciliation is getting the various departments to agree that their A equals the other's B equals the other's C plus D.

A Strategic Approach
The first step to developing a data strategy is to identify where quality problems exist. These issues are not always apparent, and it is important to develop methods for detection. A thorough approach requires inventorying the system, documenting the business and technical rules that affect data quality, and conducting data profiling and scoring activities that give us insight in the extent of the issues.

After identifying the problem, it is important to assess the business impact and cost to the organization. The downstream effects are not always easy to quantify, especially when it is difficult to detect an issue in the first place. In addition, the cost associated with a particular issue may be small at a departmental level but much greater when viewed across the entire enterprise. The business impact will drive business involvement and investment in the effort.

Finally, once we understand the issues and their impact on the organization, we can develop a plan of action. Data quality programs are multifaceted. A single tool or project is not the answer. Addressing data quality requires changes in the way we conduct our business and in our technology framework. It requires organizational commitment and long-term vision.

The strategy for addressing data quality issues requires a blend of analysis, technology, and business involvement. When viewed from this perspective, an MDM program is an effective approach. MDM provides the framework for identifying quality problems, cleaning the data, and synchronizing it between systems. However, MDM by itself won't resolve all data quality issues.

An active data governance program empowered by chief executives is essential to making the organizational changes necessary to achieve success. The data governance council should set the standards for quality and ensure that the right systems are in place for measurement. In addition, the company should establish incentives for both users and system developers to maintain the standards.

The end result is an organization where attention to quality and excellence permeate the company. Such an approach to enterprise information quality takes dedication and requires a shift in the organization's mindset. However, the results are both achievable and profitable.

---Source: Information Management June 2009 ( William McKnight is partner, Information Management, at Lucidity Consulting Group. He can be reached at




Follow us on:

Facebook           Twitter


Article Library | Direct Mail | Copywriting | Data Quality | eMail | Case Studies | Technical | Postal
Marketing Strategies | Internet & Web | Industry News | Subscript to Newsletters