Avoiding Big Data mistakes in your business

avoiding big data mistakes

Whilst governments and regulatory bodies try to come to grips with the issues surrounding Big Data in the public interest such as privacy and fraud, those with a fiduciary and managerial accountability over their own organisation’s enterprise data (Big, or otherwise) face a specific suite of challenges and that has little to do with the data itself.  Avoiding Big Data mistakes in the real world of your business is a challenge, despite the hype and promise associated with Big Data.

Before you and your organisation launch yourself into the Big Data world, it may be worth cross-checking some of your fundamental assumptions and capabilities, especially if the analysis is used to drive important business strategic or operational decisions, on which your organisation depends.

In this article, I will focus on a few of the key factors such as correlation, understanding the system you are dealing with, cross functional collaboration within your organisation and the importance of the discipline of Data Science.

Avoiding big data mistakes #1: Correlation risks.

Big Data is fertile grounds for finding correlations.   For example, are your sales volumes correlated with the speed of the forklift trucks in your warehouse or with the reduction in selling price?  How do you really know?

Case in point: In 2009, the much publicised Google Flu Trends (GFT) analysis based on flu-related web queries was reported as being a faster and more accurate predictor of flu trends that the U.S. Centers for Disease Control (CDC).  However, despite the hype, the journal Science  (Science 14 March 2014: Vol. 343 no. 6176 pp. 1203-1205) recently showed that the GFT overestimated the prevalence of flu in 93% of cases over a period of  over 2 years. Essentially the GFT has been dead wrong since August 2011.

Correlation is a statistical measure that describes the magnitude and direction of a relationship between two or more variables; however, it does not mean that one variable causes the other.

  • There is a positive correlation between those who wear size XXXL clothing and cardiovascular disease; however wearing XXXL clothing does not cause heart disease.

Causation, on the other hand indicates that one event is the result of the other event; i.e. there is a causal relationship between the two events. This is also referred to as cause and effect.

  • Chronic obesity is a cause of cardiovascular disease. Obesity results in individuals needing to wear XXXL sized clothing.

Bottom line: Simple systems tend to show up the correlation vs. causation relationships well, however this does not apply for complex systems. Apply rigour to your Big Data analyses to eliminate the probability of  mistaking correlation with causation. Making important strategic, managerial, investment or operational decisions on incorrect correlations could be costly.

Avoiding big data mistakes #2: Size doesn’t matter – Complexity does.

Whilst the the volume of data is one of the many dimensions of the Big Data landscape,  the intrinsic complexity of enterprise data should not be glossed over.  Whether 1Mb or 10Pb of data is being massaged, analysed and interpreted, some of the more critical factors that contribute to a truly successful Big Data initiative relate to:

  1. Your understanding of the system being modelled, and
  2. The design, selection and use of the appropriate analytical processes.

Case in point: In the mid 1980’s, scientists researching the appearance of the large hole in the Ozone layer seasonally appearing over Antarctica failed to detect the critical trend for a number of years because their computers had been programmed automatically to reject data outliers beyond certain limits. The reality was that the actual ozone losses were far beyond the error range of existing predictive models.  The correct data was being ignored. It was years before the error was eventually picked up  – fortunately for humanity.

 Bottom line: Size does not count. Spend time in the detail, test hypotheses and results carefully and with rigour if the results are important to your organisation.

Disrupt from within

Anyone who has implemented an ERP system would know all to well that one of the critical success factors relates to how the various business units and disparate silo’s play together, without being territorial or defensive.

When you are dealing with enterprise-wide Big Data, you are potentially dealing with the operation of the entire organisation. As such, having your Big Data efforts hijacked by a particular function or influential manager may lead to suboptimal results or incorrect decisions being made.

Bottom line: If your Big Data involves major parts of your organisation, be prepared to challenge the entrenched, and potentially defensive silos. Success in this area has more to do with successfully managing organisational cultural change based on effective cross-functional, multi-disciplinary teamwork. In doing so, avoiding big data mistakes is far more likely

Do not trivialise the science.

Data science in an emerging discipline, and one that is crucial to resolving complex data problems that relate to transforming data into knowledge and information. Essentially, a modern day form of alchemy – turning base metals into gold. Data science is the result of drawing together a range of specialised skills to work on data, and is not necessarily a job description.  These skills include applied mathematics and statistics, computer science, pattern recognition, machine learning, natural language processing and operations research, to name but a few.

Additionally, Big Data would also require domain expertise for avoiding big data mistakes involving complex systems or processes  For example, if you are in the medical field, having a domain expert as part of the Big Data initiative should be a prerequisite.

Recognise that the science of data (irrespective of size) is rarely two-dimensional.  The 2D world of the traditional finance, sales and marketing functions is a far cry from the multi-dimensional arrays of multivariate and volatile data of varying quality and structure.

More importantly, finding your organisation’s key decision makers who are able to grasp potentially complex mathematical and statistical mechanics can be a challenge, as many managers lack the necessary mathematical and statistical skills.

Bottom line: Do not trivialise the skills associated with data science.  You are potentially dealing with complex, multidimensional data that cannot be modelled using spreadsheet-like thinking.

Managing Big Data is analogous to being an airline pilot. As with any good airline pilot – they are well trained in the physics of flight, understand engineering systems and most importantly they are well trained in understanding the system that they are using (aircraft) as well as its context at any point in time, be that during take-off, flying around a storm or landing.

Big Data has Big Benefits if well managed, as well as Big Pitfalls if not effectively managed.

The bottom line is to invest the time and resources needed to adequately understand what you are dealing with if you have a high chance of avoiding big data mistakes in your business