Lambda Architecture
Batch Layer (Spark engine): Stores the master dataset (contains all historical data). Periodially recomputes views from scratch to ensure data correctness.
For example, user click events are collected continuously, these events are used to compute user preference scores. Later, the company improves the recommendation algorithm. The new algorithm must be applied to all historical events. This requires recomputing the dataset from the beginning.
Speed Layer (Streaming Framework): Produces real-time views. Usually processes only recent data. (Apache Flinks, Spark Streaming, Google DataFlow)
Serving Layer: Merges results from the batch layer and speed layer. Queries are answered using both batch views (accuracy) and speed views (freshness).
It tolerates failures in the speed layer becuase batch recomputation can correct errors in streaming processes. But it doubles development and maintanence complexity -> Using Kappa to simplify.
*** Database lookup is not part of Lambda architecture but part of data enrichment. Data enrichment adds external context to events. For example, a credit card transaction event arrives, the system checks whether the card is on a blocked list. The blocked list may be stored in a database or cache. This enrichment step can exist in both Lambda and Kappa architectures. It is independent of the architecture style.
*** If the speed layer fails temporarily, real-time results may be missing. However, the batch layer will eventually recompute everything correctly. This correct batch result will replace the imcomplete speed-layer results. Therefore the system eventually becomes correct [EVENTUAL CORRECTNESS].
Comments
Post a Comment