kenju's blog

About Programming and Mathematics

Reading Notes of "Designing Data-Intensive Applications" Chapter 5: Replication

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

社内の輪読会で、DDIA を読んでいる。Chapter 5 は自分が担当だったので、そのときに社内向けに書いたまとめの一部抜粋。

グローバルチームのメンバーの輪読会に混ぜてもらっている形なのだが、なかなかレベルが高いものの、DDIA の内容が自分の興味関心にぴったりなこともあって、モチベーション高く維持できており、刺激の多い最高の学びの機会となっている。最初の回は、ネイティブ話者に混じって技術的な議論をする機会がなかなかなかったため、聞き役に徹することも多かったのだが、今回くらいから徐々に発言する場面を増やすことができて、非常に楽しく参加している。毎週月曜日に、早く参加したくてしょうがない予定があるというのも、なかなか良いものだ。

今回は、Replication が題材の回。興味があったので、References に紹介されていた、AWS Redshift, DynamoDB, Apache Kafka のリーダー戦略や Replication について書かれた論文も読んでみた。DDIA は、現役の研究者が書かれたということもあって、質の高い情報が密度高く紹介されており、難解な箇所も多々あるが、非常に読んでいて勉強になる。今年であった中でもベスト3に入る良著。

ちなみに、AWS DynamoDB について書かれた論文を読んだときのメモは、別記事 "論文『Dynamo: Amazon's Highly Available Key-value Store』を読んだ"(http://itiskj.hatenablog.com/entry/2018/09/03/213643) にもまとめてあるので、そちらも参照されたい。

List of Sections

  • Leaders and Followers
    • Synchronous Versus Asynchronous Replication
    • Setting Up New Followers
    • Handling Node Outages
    • Implementation of Replication Log
  • Problems with Replication Log
    • Reading Your Own Writes
    • Monotonic Reads
    • Consistent Prefix Reads
    • Solutions for Replication Lag
  • Multi-Leader Replication
    • Use Cases for Multi-Leader Replication
    • Handling Write Conflicts
    • Multi-Leader Replication Topologies
  • Leaderless Replication
    • Writing to the Database When a Node is Down
    • Limitations of Quorum Consistency
    • Sloppy Quorums and Hinted Handoff
    • Detecting Concurrent Writes
  • Summary

Keywords

* eventual consistency
* leader-based replication/master-slave replication
* single-leader/multi-leader/leaderless
* synchronous/asynchronous/semi-synchronous
* chaing replication (Microsoft Azure Storage)
* replicatoin logs
* WAL(Write-ahead log)
* read-after-write-consistency/read-your-write-consistency
* monotonic reads
* consistent prefix reads
* replication topology

Summary

Why replication?

  • To achive high availability/scalability
  • To allow operation even while network disconnected
  • To increase latency

How to replicate?

NOTE: each approach has both pros/cons.

  • Single-leader replication
  • Multi-leader replication
  • Leaderless replication
type pros cons
Single-leader easy to understand, no conflict resolution less robust
Multi-leader robust against faulty, network interupptions and latency spikes complex and harder to maintain
Leaderless (same with the above) complext (especially the quorum model)

When to replicate?

type pros cons
synchronous easy to gurantee consistency slow (especially if there are many folowers)
asynchronous blazingly fast harder to gurantee consistency
semi-synchronous both advantages from synchronous & asynchrnous harder to implement

How to tackle Replication Lag Problem?

Here are some consistency models:

  • Read-after-write consistency
  • Monotonic reads
  • Consistent prefix reads

My Reading Notes 🔖

Semisync replication

Briefly looked through this blog post, introduced at references [7]

http://yoshinorimatsunobu.blogspot.com/2014/04/semi-synchronous-replication-at-facebook.html

Semisync replication was originated from Google in 2007. Official MySQL supported from 5.5. Actual implementation algorithm was substantially different from Google's.

  • there are 2 types of Semisync
    • Normal Semisync
    • Loss-Less Semisync
  • Facebook use MySQL's mysqlbinlog for Semisync

Amazon Redshift replication

  • Amazon Redshift seems to support single-leader replication
    • but, the leader nodes seem to exist more than one, so maybe "multi-leader replication"?
    • "follower nodes" are called "compute nodes" instead
    • see "Deep Dive on Amazon Redshift" slide also
      • p7, 10,
      • The leader node also has responsibility for task scheduling, parser & rewriter, planner & optimizer, compiler, code generation, etc...

Amazon DynamoDB replication

Briefly looked through this thesis, introduced at references [37]

https://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf

Especially, 4.2 Partitioning Algorithm, 4.3 Replication and 4.8.1 Ring Membership.

  • Dynamo follows "Leaderless" design
  • Dynamo's partitioniong scheme relies on consistent hashing
  • "every node in the sytem can determine which nodes should be in this list for any particular key", because the partitioned key can be calculated via the consistent hashing algorithm (= Leaderless)

Kafka replication

I like Pub/Sub middlewares and read below slides from references [5]

  • https://www.slideshare.net/junrao/kafka-replication-apachecon2013
  • p9 for Kafka architecture. almost same with AWS Kinesis Streams
  • p13~ for Kafka replication design.
    • Kafka Replication gurantee CA of CAP theorum.
    • Kafka distribute message brokers into some partition, and each partition has replicas of topics (p17)
    • Kafka follows Single-leader replication
      • One of the replicas is leader, and all writes go to leader
      • The leader propagates writes to followers in push style (not pull or polling style, so the leaders decide when to commit messages)
  • p21~22 for Data Flow in Replication
    • the leader push messages to each followers, commit, and then return the result to consumers
    • the leaders are evenly spread among brokers