Skip to main content
DE_4. Google Cloud Data Engineering Stack
Example Architecture
- Streaming Architecture:
- Application Log / Mobile app events
- Pub/Sub (ingestion)
- Dataflow streaming pipeline (data processing engine)
- BigQuery (analytics warehouse)
- BI Dashboard
- Batch Architecture:
- ERP Database
- Cloud storage (raw dataset)
- Dataproc spark job (ETL job/ ML preprocessing)
- BigQuery warehouse
- Business Analytics
BEAM (Framework) & SPARK (Processing engine)
- If we only use spark without Apache Beam for data processing work, we need...
- Batch process (for log analysis once a day)
- Streaming process (for real-time user event analysis)
- If we implement Beam...
- Beam pipeline (Read -> Transform -> Filter -> Write)
- Batch Runner (spark)
- Streaming Runner (dataflow)
- One Beam code can process both batch and streaming runners.
- Just manage the runner engines.
- One code can be used for multiple engines (spark, flink, dataflow).
Comments
Post a Comment