kenju's blog

About Programming and Mathematics

Reading Notes of "Designing Data-Intensive Applications" Chapter 6: Partitioning

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

「Reading Notes of "Designing Data-Intensive Applications" Chapter 5: Replication」( に引き続き、第六章の Reading Notes. 発表担当ではないけど、まとめる過程で思考を整理できるので、今回も書くことに。

Reading Notes

My Impressions

Partitioning is much more difficult problems than Replication (Chapter 5), since we have to take care about distributed computers, and not only about single hosted machine.

I have thought that hash key partitioning or simple round-robin partitioning is enough for many cases. Such simple cases will be enough for some cases, but the more complicated and sophisticated methodology is required to solve the current distributed networks' problems, especially in the era of cloud computing.

For example, simple algorithm can lead to hot spots if used without much considerations. Which keys to use for hash partitioning depends on each application's domain (timestamp, user id, UUID, etc.).

Rebalancing is also an interesting topic. Implementing rebalancing may be easy, but what makes it difficult is its cost. If rebalancing takes so much time, applications may be too slow for daily usage.


There are 2 types of partitioning:

  • Key range partitioning
  • Hash partitioning
type pros cons
Key range partitioning efficient range queries are possible available only when the key can be sorted / hot spots risk
Hash partitioning may distribute load more evenly cannot use range queries

Secondary indexes

There are 2 ways to implement secondary indexes:

  • Document-partitioned indexes (local indexes)
  • Term-partitioned indexes (global indexes)
type index position read write
Document-partitioned in the same partition as the primary key scatter/gather across all partitions write to only updated single partition
Term-partitioned separately, using the indexed values read can be served from a single partition several partition needs to be written


  • How not to do it: hash mod N
  • The problem with the mod N approach is that if the number of nodes N changes, most of the keys will need to be moved from one node to another
  • there is a fairly simple solution: create many more partitions than there are nodes, and assign several partitions to each node

Request Routing

  • Many distributed data systems rely on a separate coordinate service such as Zoo-Keeper to keep track of this cluster metadata
  • LinkedIn's Espresso use Helix (which in turn relies on ZooKeeper), HBase, SolrCloud and Kafka also use ZooKeeper
  • Cassandra and Riak use a gossip protocol among the nodes to disseminate any changes in cluster state