I am developing a realtime log architecture for advertising delivery system. While researching about best practices or common solution, I found a bunch of useful resources to read through. Here is a memo to myself about those resources.
These two essays are must to read.
- The world beyond batch: Streaming 101
- the glossary arts about streaming architecture
- Batch vs Streaming
- Processing Time vs Event Time
- The world beyond batch: Streaming 102
- how to handle late logs (discarding, watermarks, trigger, and accumulation)
Watermarks is the timestamp to clarify until that time log records have been processed.
- Watermarks - Measuring Time and Progress in Streaming Pipelines
- practical examples about handling late data with watermarks
- How to beat the CAP theorem
- this is the first post Nathan has introduced the idea of Lambda Architecture
- official page ?
- Questioning the Lambda Architecture
- the first post Jay Kreps has introduced the idea of Kappa Architecture
- The Log: What every software engineer should know about real-time data's unifying abstraction
- Jay Kreps also wrote anothre great essay
- KAPPA ARCHITECTURE USING MANAGED CLOUD SERVICES (PART II)
- Applying the Kappa architecture in the telco industry
- another example
- Case Study: Stream Processing on AWS using Kappa Architecture
- more example
- Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
- practical examples about building streaming pipeline with Apache Flink
- Structured Streaming Programming Guide
- Full of hints for streaming programing, especially with watermarks implementation