Efficient Personal Search with Vespa, the Open Source Big Data Serving Engine
The max data size per node is then a trade-off between latency for such users and the overall cost of executing their queries (less nodes per query is cheaper). Implement a fully functional search and relevance engine on top of the raw data store, which distributes queries to the right set of nodes for each user and merges the results. In addition to the standard indexing mode, Vespa includes a streaming mode for documents which provides this solution, implemented by layering the full search engine functionality over the raw data store built into Vespa.
Source: yahoodevelopers.tumblr.com