[reading review] Lakehouse: A New Generation of Open Platforms that unify Data Warehouse and Advanced Analytics

In this paper, the authors designed and implemented a new generation of data platform, which combines data warehouse and data lake architectures. The new lakehouse architecture resolve the data staleness problem in data lake and also provide high conveniece in applying non-sql advanced analytics operations (like machine learning, model training). Lakehouse also provide transactional features and high competitive query performance.

  • Strengths: Lakehouse is a unified data playform where we can use unified operation to access different kinds of data on one platform.

  • Future works: we can explore more kinds of open data formats and new combinations of storage formats, metadata layers and access APIs.




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • [reading review] OceanBase: A 707 Million tpmC Distributed Relational Database System
  • [reading review] Velox: Meta's Unified Execution Engine
  • [reading review] Exploiting Cloud Object Storage for High-Performance Analytics
  • [reading review] The FastLanes Compression Layout: Decoding >100 Billion Integers per Second with Scalar Code
  • [reading review] An Empirical Evaluation of Columnar Storage Formats