What is Data Quality
Data quality is a measure of the condition of data based on factors such as accuracy, completeness, consistency, reliability, and whether it’s up to date.
There are many definitions of data quality. The two predominate ones are:
- Data is of high quality if the data is fit for the intended purpose of use
- Data is of high quality if the data correctly represent the real-world construct that the data describes
Why is Data Quality Crucial
Bad data can have significant business consequences for companies. Poor-quality data is often pegged as the source of operational snafus, inaccurate analytics, and ill-conceived business strategies. Examples of the economic damage that data quality problems can cause include added expenses when products are shipped to the wrong customer addresses, lost sales opportunities because of erroneous or incomplete customer records, and fines for improper financial or regulatory compliance reporting.
From a financial standpoint, maintaining high levels of data quality enables organizations to reduce the cost of identifying and fixing bad data in their systems. Companies are also able to avoid operational errors and business process breakdowns that can increase operating expenses and reduce revenues.
In addition, good data quality increases the accuracy of analytics applications, which can lead to better business decision-making that boosts sales, improves internal processes, and gives organizations a competitive edge over rivals. High-quality data can help expand the use of BI dashboards and analytics tools, as well — if analytics data is seen as trustworthy, business users are more likely to rely on it instead of basing decisions on feelings or assumptions.
5 Dimensions of Data Quality
|Accuracy||Is the information correct in every detail?|
|Completeness||How comprehensive is the information?|
|Reliability||Does the information contradict other trusted resources?|
|Relevance||Do you really need this information?|
|Timeliness||How up-to-date is information? Can it be used for real-time reporting?|
Accuracy means that information is correct. To determine whether data is accurate or not, you’d better consider whether the information reflects a real-world situation.
While defining all possible valid values allows invalid values to be easily spotted, it does not mean that they are accurate.
Completeness refers to how comprehensive the information is. When looking at data completeness, think about whether all of the data you need is available and how to deal with missing data. Data can be complete even if optional data is missing. For example, you might need a customer’s first and last name, but the middle may be optional.
Reliability is a vital data quality characteristic. When pieces of information contradict themselves, you can’t trust the data. In the realm of data quality characteristics, reliability means that a piece of information doesn’t contradict another piece of information in a different source or system. For example, if a patient’s birthday is June 13, 1980, in one database, yet it’s July 13, 1985, in another database, then the information has controversy and it is not reliable.
Before we starting collecting data and information, we’d better consider which data and information are relevant, which are not, and whether we really need them. Spending time and resources on irrelevant data is a big waste.
The timeliness of information is an important data quality characteristic because the information that isn’t timely can lead to people making the wrong decisions, which is likely to cause huge damage.Timeliness references whether the information is available when it is expected and needed. For example:
- Companies that are required to publish their quarterly results within a given frame of time
- Customer service providing up-to-date information to the customers
- Credit system checking in real-time on the credit card account activity