You Are Not Google (2017)
Consider my recent conversation with a company that briefly considered using Cassandra for a read-heavy workflow over data that was loaded in nightly:
Having read the Dynamo paper, and knowing Cassandra to be a close derivative, I understood that these distributed databases prioritize write availability (Amazon wanted the “add to cart” action to never fail). Even Google Is Not Google
Use of large scale dataflow engines like Hadoop and Spark can be particularly funny: very often a traditional DBMS is better suited to the workload, and sometimes the volume of data is so small that it could even fit in memory. Hard drives prices are now much lower than they were in 2003, the year the GFS paper was published.Perhaps you have read the GFS and MapReduce papers and appreciate that part of the problem for Google wasn’t capacity but throughput: they distributed storage because it was taking too long to stream bytes off disk.
Source: blog.bradfieldcs.com