‘Big Data’ Has Come to Mean ‘Small Sampled Data’

‘Big Data’ Has Come to Mean ‘Small Sampled Data’

Of course, there are many other factors that influence the speed with which data from disk can actually be utilized by a CPU in practice, but no matter how you slice it, in terms of the ratio between the performance of our storage systems and the performance of our CPU systems, we haven’t really undergone a transformative shift in the decades since I sat at my first desktop. A company’s data science division could query multi-petabyte datasets non-stop 24 hours a day for the same fixed cost, removing cost as a limiting factor in performing absolute population-scale analyses. The problem is that when it comes to big data analyses there seems to be a tremendous gulf between the companies performing population-scale analyses using tools like BigQuery that analyze the totality of their datasets and return absolute results and the rest of the “big data” world in which estimations and random samples seem to dominate.

Source: www.forbes.com