- 作者: Martin Kleppmann
- 出版社/メーカー: O'Reilly Media
- 発売日: 2017/04/02
- メディア: ペーパーバック
「Reading Notes of "Designing Data-Intensive Applications" Chapter 5: Replication」(http://itiskj.hatenablog.com/entry/2018/09/04/050000) に引き続き、第六章の Reading Notes. 発表担当ではないけど、まとめる過程で思考を整理できるので、今回も書くことに。
Partitioning is much more difficult problems than Replication (Chapter 5), since we have to take care about distributed computers, and not only about single hosted machine.
I have thought that hash key partitioning or simple round-robin partitioning is enough for many cases. Such simple cases will be enough for some cases, but the more complicated and sophisticated methodology is required to solve the current distributed networks' problems, especially in the era of cloud computing.
For example, simple algorithm can lead to hot spots if used without much considerations. Which keys to use for hash partitioning depends on each application's domain (timestamp, user id, UUID, etc.).
Rebalancing is also an interesting topic. Implementing rebalancing may be easy, but what makes it difficult is its cost. If rebalancing takes so much time, applications may be too slow for daily usage.
There are 2 types of partitioning:
- Key range partitioning
- Hash partitioning
|Key range partitioning||efficient range queries are possible||available only when the key can be sorted / hot spots risk|
|Hash partitioning||may distribute load more evenly||cannot use range queries|
There are 2 ways to implement secondary indexes:
- Document-partitioned indexes (local indexes)
- Term-partitioned indexes (global indexes)
|Document-partitioned||in the same partition as the primary key||scatter/gather across all partitions||write to only updated single partition|
|Term-partitioned||separately, using the indexed values||read can be served from a single partition||several partition needs to be written|
- How not to do it: hash mod N
- The problem with the mod N approach is that if the number of nodes N changes, most of the keys will need to be moved from one node to another
- there is a fairly simple solution: create many more partitions than there are nodes, and assign several partitions to each node
- Many distributed data systems rely on a separate coordinate service such as Zoo-Keeper to keep track of this cluster metadata
- LinkedIn's Espresso use Helix (which in turn relies on ZooKeeper), HBase, SolrCloud and Kafka also use ZooKeeper
- Cassandra and Riak use a gossip protocol among the nodes to disseminate any changes in cluster state