In today’s world, data is the fuel organizations run on. Big Data is a component in every key decision we make. Unfortunately, poor quality is widespread and costly. It is estimated that 20% – 30% of the data we use is erroneous. Many files consist of duplicate entries, outdated information, and erroneous input. Our intentions our admirable to clean and transform the data, but in this day of cutbacks and reductions, it is seldom accomplished. As a result, we operate with inaccurate data for decisions.
Data is traditionally collected to discover trends and patterns to assist with forecasting and to use as early warning signals. Big Data includes not only traditional database style information (in columns and rows), but social media, videos, audios, and other forms of communication.
Collecting big data does not ensure that you will obtain relevant or specific enough results; yet we are constantly increasing the amount of data that we collect. What is really the significant part of our Big Data is how we compile the statistics and which analytical techniques we use to reach conclusions and recommendations.
Understanding the purpose of the collection of your data and the impact on your organization is crucial. We must determine how much time and how many resources should be used.
Care needs to be extended for ensuring the integrity of your data. What outcome were you looking for: new product information, employment statistics, or sales results? Why do you need this data and why now? Are you analyzing results or are you hoping to justify resources? Understanding why you are collecting all this data will answer how much and when to collect the data.
So what should you emphasize in the collection of Big Data?
Start with analyzing the source, even when it is internal. Is it valid for the intended use or is it outdated or erroneous?
Next, consider if it is reliable. What procedures do you have in place to test your data or to reduce manipulation and errors? Have you considered theft, mishaps or hackers? Relevance must also be considered. Your data must represent a fair picture of the issue at hand. This enables accurate decisions to be made.
Finally, is your data timely and updated on a frequent basis?
Remember, the goal of Big Data is about finding insights, answering questions, and analyzing results that were previously beyond reach. Yet seldom do we analyze where our data comes from or what the potential problems are. This leads to one of the top three reasons why projects fail – misleading or confusing data.
The right data, collected at the right time, using the right tools is important. If users are unable to access the data for corrections and testing, their analyses become useless. It is essential that every organization consider the integrity of their Big Data along with its collection.
Big data is affected by the way data is entered, stored and managed. Create a policy that assesses its accuracy, accessibility, completeness, and consistency. Your policy should include a process that defines how the data is to be stored, archived, and protected from mishaps or attacks. And, do not forget procedures to ensure compliance with government regulations.
Big data is expanding every day. Make sure your data is trustworthy so that effective decisions can be made.
VP of Operations
Contributing Author – Lauri Sowa