Mastering Your Data with Medallion Architecture: The Three-Layer Design for Data Organization and Management
February 1, 2023
Data is the backbone of any organization, and its proper organization and management are critical to ensuring its practical use. One such way of organizing and managing data is through the use of a medallion architecture.
A medallion architecture consists of three layers: Bronze, Silver, and Gold. The data flows from one layer to the next, gradually transforming from raw and unstructured data to high-quality, refined data ready for use. Let's take a closer look at each layer:
Bronze Layer (Raw Data) - The Bronze layer is where all the raw data from external source systems is stored. The table structures in this layer correspond to the source system table structures as-is, along with any additional metadata columns to capture the load date/time, process ID, etc. This layer focuses on quick Change Data Capture and the ability to provide a historical archive of source (cold storage) data, lineage, auditability, and reprocessing if needed without rereading the data from the source system.
Silver Layer (Cleansed and Conformed Data) - In the Silver layer, the data from the Bronze layer is matched, merged, conformed, and cleansed so that it can provide an "Enterprise view" of all its key business entities, concepts, and transactions. The Silver layer serves as a source for departmental analysts, data engineers, and data scientists to create further projects and analyses to answer business problems via enterprise and departmental data projects in the Gold Layer. The Silver layer follows the ELT methodology, with minimal transformations and data cleansing rules applied during loading. The focus is on speed and agility in ingesting and delivering the data in the data lake, with complex transformations and business rules applied during the transfer from the Silver to the Gold layer.
Gold Layer (Curated Business-Level Tables) - The Gold layer is where the data is organized in consumption-ready, function-specific databases. This layer is for reporting and uses more denormalized and read-optimized data models with fewer joins. The final layer of data transformations and data quality rules are applied here, making it the presentation layer for various projects such as customer analytics, product quality analytics, inventory analytics, customer segmentation, product recommendations, marketing/sales analytics, etc. The Gold layer often uses Kimball-style star schema-based data models or Inmon-style data marts.
In some cases, data marts and EDWs from traditional RDBMS technology stacks are also ingested into the lakehouse, allowing enterprises to perform advanced analytics and ML for the first time, which was not possible or too cost-prohibitive in a traditional stack.
The Medallion architecture is compatible with the concept of a data mesh, where bronze and silver tables can be joined together in a "one-to-many" fashion, meaning that the detain a single upstream table can be used to generate multiple downstream tables.
The medallion architecture is a powerful tool for organizations looking to manage and use their data effectively. Whether you are a small start-up or a large multinational corporation, implementing a medallion architecture can help you unlock the full potential of your data.
See how others are winning...
Learn why organizations are getting better outcomes using data and how Macula can help.