Anyone who has actually worked in a company with Big Data knows that data in itself has very little value.
There are actually tons of companies with massive amounts of data. But very few derive disproportionate value from that data.
Ultimately, big data only helps if you do something different with that data that creates insight that is actionable. And there are 2 ways to do this.
Option #1 is to have a unique data set. One might have tons and tons of data points, but if lots of other providers have those similar points, you can extract relatively little value. Just go to an AdTech conference and you will hear many many companies talking about their wealth of data about users that drive better targeting. But how many of those vendors really have unique data that really helps in ways that can’t be replicated through other sources or providers? It’s hard to say.
Option #2 is to derive unique insight from data. But in many cases, BIG data doesn’t help. In fact, it can hurt.
This is an out of date example, but when I was at Ebay 10+ years ago, we tried to address a shocking fact that something like 20% of search queries on the site yielded zero results. That just felt wrong in a marketplace with extraordinarily broad selection. The question was how to figure out a) in how many cases those searches actually SHOULD yield something and b) what were the main problems with our finding process that created this problem.
One terrific solution ended up being proposed by Louis Monier‘s advanced technology team. But the solution was surprisingly low-tech and small-data. Rather than develop some sort of crazy algorithm to analyze our monstrous volume of data, we instead manually looked at a small set of buying sessions that yielded zero results. I think it was literally in the order of ~100 buying sessions. The process was basically to manually determine whether these queries were actually focused on items that did exist on the marketplace, and categorize the problems that led to no results. It turns out that in this case, a small dataset was sufficient to identify the top 5 problems with our finding experience. It also turns out that a human being was way more efficient as determining our rate of “false negatives” than a machine (and it turned out that our rate of false negatives was something in the order of 60-80%).
We ultimately worked to build a series of product enhancements that addressed these issues. Some were extremely successful and still exist today. In an environment with really really BIG data, the best solution was one that utilized very small amounts of data to extract a huge amount of value. I think there are more opportunities like this that folks think, and that makes me think that too much focus on “big data” confuses where real opportunities lie.
Side Note: I’m clearly talking about a subset of big data opportunities. On the actual storage and processing side of big data, I think we have created huge strides which have then allowed more value to be extracted by the data that is being generated. For a more broad summary, see this (somewhat old) post from Roger Ehrenberg at IA Ventures on opportunities in the big data ecosystem.