Skip to main content
DE_3. Data Lake VS. Data Warehouse
Data Lake
- Storage system designed to store large volumes of raw data.
- Store data in its original format (structured, semi-structured, unstructured)
- JSON, CSV, Logs, Images, Videos
- Massive Scalability & Low storage cost
- Does not provide native SQL query capabilities
- To analyze data in cloud storage, you must use a compute engine (spark / dataflow)
- SCHEMA-ON-READ
- Amazon S3, Google cloud storage, Azure data lake storage, HDFS
Data Warehouse
- Optimized for analytics and querying
- Data is typically cleaned, structured, and transformed before loading.
- Supports fast queries across large dataset.
- SCHEMA-ON-WRITE
- BigQuery, Snowflake, Redshift, Synapse
Lakehouse
- Combines the benefits of lake & warehouse
- Delta lakes, Apache Iceberg, Apache Hudi
Comments
Post a Comment