Demo 01: From Landing Files to Iceberg and Delta Tables on MinIO (Using Trino)

Demo 01: From Landing Files to Iceberg and Delta Tables on MinIO (Using Trino)
Medallion Architecture
Parquet is a file format, while Delta and Iceberg are table formats that add transactional metadata layers on top of Parquet to provide ACID guarantees, schema evolution, time travel, and reliable data engineering workflows.

In this demonstration, we will implement a classic Lakehouse promotion pipeline:

Landing (raw parquet files on MinIO)
→ Bronze (Iceberg table)
→ Bronze (Delta table)

We will use:

  • MinIO as object storage
  • Trino as the query engine
  • Hive connector to read raw parquet files
  • Iceberg connector to create managed Iceberg tables
  • Delta Lake connector to create managed Delta tables

This setup mirrors a Databricks-style Lakehouse architecture — but fully self-hosted.


1️⃣ Architecture Overview

Assume the following:

  • MinIO bucket: demo
  • Raw files already uploaded to:
s3a://demo/landing/nyc_taxi/2026/02/26/*.parquet

Your Trino catalogs:

CatalogConnectorPurpose
hivehiveRead external parquet files
icebergicebergManaged Iceberg tables
deltadelta_lakeManaged Delta tables

We will:

  1. Register landing parquet files via Hive connector
  2. Promote them to an Iceberg Bronze table
  3. Promote them to a Delta Bronze table

2️⃣ Step 1 — Register Landing Parquet Files (Hive External Table)

This does not move or modify data.
It only tells Trino how to read the files.

CREATE SCHEMA IF NOT EXISTS hive.landing;

DROP TABLE IF EXISTS hive.landing.fhvhv_tripdata_202601_landing;

CREATE TABLE hive.landing.fhvhv_tripdata_202601_landing (
    hvfhs_license_num        VARCHAR,
    dispatching_base_num     VARCHAR,
    originating_base_num     VARCHAR,
    request_datetime         TIMESTAMP(6),
    on_scene_datetime        TIMESTAMP(6),
    pickup_datetime          TIMESTAMP(6),
    dropoff_datetime         TIMESTAMP(6),
    PULocationID             INTEGER,
    DOLocationID             INTEGER,
    trip_miles               DOUBLE,
    trip_time                BIGINT,
    base_passenger_fare      DOUBLE,
    tolls                    DOUBLE,
    bcf                      DOUBLE,
    sales_tax                DOUBLE,
    congestion_surcharge     DOUBLE,
    airport_fee              DOUBLE,
    tips                     DOUBLE,
    driver_pay               DOUBLE,
    shared_request_flag      VARCHAR,
    shared_match_flag        VARCHAR,
    access_a_ride_flag       VARCHAR,
    wav_request_flag         VARCHAR,
    wav_match_flag           VARCHAR,
    cbd_congestion_fee       DOUBLE
)
WITH (
    format = 'PARQUET',
    external_location = 's3a://demo/landing/nyc_taxi/2026/02/26/'
);

What happens in MinIO?

Nothing new is created.
Trino simply reads the existing parquet files.


3️⃣ Step 2 — Create Iceberg Bronze Table

Now we create a managed Iceberg table.

CREATE SCHEMA IF NOT EXISTS iceberg.bronze;

DROP TABLE IF EXISTS iceberg.bronze.fhvhv_tripdata_raw;

CREATE TABLE iceberg.bronze.fhvhv_tripdata_raw (
    hvfhs_license_num        VARCHAR,
    dispatching_base_num     VARCHAR,
    originating_base_num     VARCHAR,
    request_datetime         TIMESTAMP(6),
    on_scene_datetime        TIMESTAMP(6),
    pickup_datetime          TIMESTAMP(6),
    dropoff_datetime         TIMESTAMP(6),
    PULocationID             INTEGER,
    DOLocationID             INTEGER,
    trip_miles               DOUBLE,
    trip_time                BIGINT,
    base_passenger_fare      DOUBLE,
    tolls                    DOUBLE,
    bcf                      DOUBLE,
    sales_tax                DOUBLE,
    congestion_surcharge     DOUBLE,
    airport_fee              DOUBLE,
    tips                     DOUBLE,
    driver_pay               DOUBLE,
    cbd_congestion_fee       DOUBLE,
    shared_request_flag      VARCHAR,
    shared_match_flag        VARCHAR,
    access_a_ride_flag       VARCHAR,
    wav_request_flag         VARCHAR,
    wav_match_flag           VARCHAR,

    _ingest_timestamp        TIMESTAMP(6),
    _source_prefix           VARCHAR,
    _ingestion_id            VARCHAR
)
WITH (
    format = 'PARQUET',
    location = 's3a://demo/bronze/nyc_fhvhv_iceberg/',
    partitioning = ARRAY['day(pickup_datetime)']
);

4️⃣ Step 3 — Insert Landing Data into Iceberg

INSERT INTO iceberg.bronze.fhvhv_tripdata_raw
SELECT
    hvfhs_license_num,
    dispatching_base_num,
    originating_base_num,
    request_datetime,
    on_scene_datetime,
    pickup_datetime,
    dropoff_datetime,
    PULocationID,
    DOLocationID,
    trip_miles,
    trip_time,
    base_passenger_fare,
    tolls,
    bcf,
    sales_tax,
    congestion_surcharge,
    airport_fee,
    tips,
    driver_pay,
    cbd_congestion_fee,
    shared_request_flag,
    shared_match_flag,
    access_a_ride_flag,
    wav_request_flag,
    wav_match_flag,
    current_timestamp,
    's3a://demo/landing/nyc_taxi/2026/02/26/',
    to_hex(md5(cast(current_timestamp as varchar) || cast(random() as varchar)))
FROM hive.landing.fhvhv_tripdata_202601_landing;

What happens in MinIO?

New structure appears:

demo/bronze/nyc_fhvhv_iceberg/
├── data/
│   ├── *.parquet
└── metadata/
    ├── v1.metadata.json
    ├── snapshot-*.avro
    └── manifest files

This is a fully managed Iceberg table.


5️⃣ Step 4 — Create Delta Bronze Table

Now we do the same promotion — but into Delta format.

CREATE SCHEMA IF NOT EXISTS delta.bronze
WITH (location = 's3a://demo/bronze/_schemas/bronze/');

DROP TABLE IF EXISTS delta.bronze.fhvhv_tripdata_raw;

CREATE TABLE delta.bronze.fhvhv_tripdata_raw (
    hvfhs_license_num        VARCHAR,
    dispatching_base_num     VARCHAR,
    originating_base_num     VARCHAR,
    request_datetime         TIMESTAMP(6),
    on_scene_datetime        TIMESTAMP(6),
    pickup_datetime          TIMESTAMP(6),
    dropoff_datetime         TIMESTAMP(6),
    PULocationID             INTEGER,
    DOLocationID             INTEGER,
    trip_miles               DOUBLE,
    trip_time                BIGINT,
    base_passenger_fare      DOUBLE,
    tolls                    DOUBLE,
    bcf                      DOUBLE,
    sales_tax                DOUBLE,
    congestion_surcharge     DOUBLE,
    airport_fee              DOUBLE,
    tips                     DOUBLE,
    driver_pay               DOUBLE,
    cbd_congestion_fee       DOUBLE,
    shared_request_flag      VARCHAR,
    shared_match_flag        VARCHAR,
    access_a_ride_flag       VARCHAR,
    wav_request_flag         VARCHAR,
    wav_match_flag           VARCHAR,
    _ingest_timestamp        TIMESTAMP(6),
    _source_prefix           VARCHAR,
    _ingestion_id            VARCHAR
)
WITH (
    location = 's3a://demo/bronze/nyc_fhvhv_delta/'
);

6️⃣ Step 5 — Insert Landing Data into Delta

INSERT INTO delta.bronze.fhvhv_tripdata_raw
SELECT
    hvfhs_license_num,
    dispatching_base_num,
    originating_base_num,
    request_datetime,
    on_scene_datetime,
    pickup_datetime,
    dropoff_datetime,
    PULocationID,
    DOLocationID,
    trip_miles,
    trip_time,
    base_passenger_fare,
    tolls,
    bcf,
    sales_tax,
    congestion_surcharge,
    airport_fee,
    tips,
    driver_pay,
    cbd_congestion_fee,
    shared_request_flag,
    shared_match_flag,
    access_a_ride_flag,
    wav_request_flag,
    wav_match_flag,
    current_timestamp,
    's3a://demo/landing/nyc_taxi/2026/02/26/',
    to_hex(md5(cast(current_timestamp as varchar) || cast(random() as varchar)))
FROM hive.landing.fhvhv_tripdata_202601_landing;

What happens in MinIO?

demo/bronze/nyc_fhvhv_delta/
├── _delta_log/
│   ├── 00000000000000000000.json
│   ├── ...
└── part-*.parquet

This is now a fully managed Delta table.


7️⃣ Iceberg vs Delta — Physical Differences

IcebergDelta
metadata/ directory_delta_log/ directory
Snapshot-based metadataJSON transaction log
Engine-agnostic designSpark-native ecosystem
Strong multi-engine supportStrong Spark/Databricks integration

8️⃣ Final Result

You now have:

Raw immutable files

demo/landing/nyc_taxi/2026/02/26/*.parquet

Managed Iceberg Bronze

demo/bronze/nyc_fhvhv_iceberg/

Managed Delta Bronze

demo/bronze/nyc_fhvhv_delta/

You have successfully implemented a Lakehouse promotion pipeline on MinIO.


Why This Matters

This workflow demonstrates:

  • Separation of landing vs managed layers
  • Catalog-aware design (Hive vs Iceberg vs Delta)
  • Object storage physical layout understanding
  • Table-format-level governance

This is not “just writing SQL”.
This is platform engineering.