Reportedly, 85 per cent of businesses fail to effectively leverage big data to power their digital transformation initiatives. While the causes of failure are diverse – from process issues to people issues, there is an underlying challenge of data quality that is often the root cause of a digital transformation or big data failure. An Experian survey found 68 percent of businesses experience the impact of poor data quality on their data and transformation initiatives.
All too often, key data quality issues are overlooked unless they become a severe bottleneck causing the failure of an initiative. It’s only at this point that businesses realise they had been building their data foundations on sand. In this article I’ll highlight some of the key problems businesses face and how to rectify them.
Simply put, data quality refers to the “health” of your data and whether it is fit for its intended use.
This means your data must be:
For most organisations, the problem with data only comes to light when a migration or digital transformation initiative is halted because data is not prepared or good enough.
Often in the case of mergers, companies struggle the most with the consequences of poor data. When one company’s Customer Relationship Management (CRM) system is messed up, it affects the entire migration process – where time and effort is supposed to be spent in understanding and implementing the new system, it’s spent in sorting data!
What exactly constitutes poor data? Well, if your data suffers from:
…then it’s considered to be flawed data.
These are considered surface issues that are inevitable and universal – as long as you have humans formulating and inputting the data errors will occur.
However, poor data quality goes beyond surface issues. If data is siloed away, is hard to access and is duplicated, you’ve got serious trouble. Indeed, data duplication is a key challenge most organisations find difficult to tackle.
Let’s understand this further.
On average, enterprises have some 400 different data sources. Companies are literally drowning in data, especially duplicate data!
There are multiple ways duplicate data can be created of which some of the most common are:
Data duplication occurs primarily because of the lack of data governance and data mismanagement. As organisations grow, they focus on simply gathering data. More leads, more buyers, more sales. Vanity metrics are used to measure success.
If businesses really sorted their data, they would see a drastic difference in what they think they have vs what they actually have.
Consider this example:
An organisation’s employees are often at the receiving end of bad data. Day in, day out, marketers, sales reps, and customer service reps attempt to fix data problems, but despite using a powerful CRM like HubSpot, they are still not able to get clean, reliable data.
When executives demand insight reports, the reps show data at a superficial level. In fact, they only discover they have missing emails or phone numbers when they are running a report. Executives don’t look into the nitty-gritty – managers are satisfied as long as their signups, leads and sales targets are met.
All day, employees whose jobs should be to analyse data and contribute to strategic decision-making are frustrated. They know the data is flawed but management isn’t taking the problem seriously enough to invest in a solution.
As a result bad data quality causes:
That sounds alarming, right? Well, luckily, there are positive steps you can take.
A data quality framework is basically a lifecycle that makes it possible for companies to fix issues with their data and obtain data they can trust and use.
The framework consists of:
Integration of data sources for real-time or batch cleansing: This allows companies to connect their data sources such as databases, social media platforms, CRMs, emails and any other cloud source to the third-party platform for data profiling and cleansing.
Profiling data to give an overview of problems: This gives you an overview of the quality of your data. You can discover the percentage of data that is missing, invalid, corrupt, or flawed and find out the ‘health’ of your data fields. Data profiling will help you gauge the complexity of problems and the kind of standards you will need to put in place to ensure such problems don’t recur.
Cleaning data of errors, typos and format issues: Once you get an idea of the problems plaguing your data in the data profiling stage, you begin with data cleansing to fix those problems. For example, if data profiling shows that the [Phone] field contains letters of the alphabet or punctuation, data cleansing will remove them – automatically and with no manual intervention required.
Removing duplicates and merging data sources with data matching: The most important part of the data quality framework, data matching does a number of things. It helps you:
Implementing standards with data governance: As you clean, match and remove duplicate data, you’re much better equipped to understand the steps required to prevent such errors from happening. For example, you could employ a mechanism to ensure that all phone numbers start with + (country code) followed by (city code). Furthermore, you could also categorise phone numbers into mobiles, landlines or VOIP numbers.
Eve since data was first collected, stored and processed, data quality has been a consistent problem in the business world. However, in our current age, companies are so occupied with making grand plans for data that ironically, they often miss out on the very basics.
While there are tools that can be used to manage and fix data, they will not be effective if there is no process in place. Getting to grips with data issues and instituting reliable data management processes provides a strong, efficient foundation for digital transformation and the kinds of projects many companies have today.
Originally posted here