The Linux Foundation Projects
Delta Lake

Delta Lake Blogs

Thumbnail for Delta Lake Small File Compaction with OPTIMIZE

Delta Lake Small File Compaction with OPTIMIZE

By Matthew Powers

This post shows compact small files in Delta tables with OPTMIZE.

Thumbnail for Adding and Deleting Partitions in Delta Lake tables

Adding and Deleting Partitions in Delta Lake tables

By Matthew Powers , Ryan Zhu

This post shows add partitions and remove partitions from Delta Lake tables.

Thumbnail for Remove old files with the Delta Lake Vacuum Command

Remove old files with the Delta Lake Vacuum Command

By Matthew Powers , Nick Karpov

This blog post explains how to remove files marked for deletion from storage with the Delta Lake Vacuum command.

Thumbnail for Reading Delta Lake Tables into Polars DataFrames

Reading Delta Lake Tables into Polars DataFrames

By Matthew Powers , Chitral Verma

This post shows how to read Delta Lake tables into Polars DataFrames.

Thumbnail for Building a more efficient data infrastructure for machine learning with Open Source using Delta Lake, Amazon SageMaker, and EMR

Building a more efficient data infrastructure for machine learning with Open Source using Delta Lake, Amazon SageMaker, and EMR

By Vedant Jain , Denny Lee

In this blog, we’ll explore how connecting Delta Lake, Amazon SageMaker Studio, and Amazon EMR can simplify the end-to-end workflow required to support data engineering and data science projects.

Thumbnail for How to Delete Rows from a Delta Lake Table

How to Delete Rows from a Delta Lake Table

By Matthew Powers

This post teaches you how to delete rows from a Delta Lake table and how the operation is implemented under the hood.

Thumbnail for Delta Lake Constraints and Checks

Delta Lake Constraints and Checks

By Matthew Powers

This post shows how to add constraints to your Delta table to avoid certain types of values from getting appended.

Thumbnail for Delta Lake Schema Enforcement

Delta Lake Schema Enforcement

By Matthew Powers

This post teaches you about schema enforcement in Delta Lake and why it's better than what's offered by data lakes

Thumbnail for Why PySpark append and overwrite write operations are safer in Delta Lake than Parquet tables

Why PySpark append and overwrite write operations are safer in Delta Lake than Parquet tables

By Matthew Powers

This post shows you why PySpark overwrite operations are safer with Delta Lake and how the different save mode operations are implemented under the hood.

Thumbnail for How to Create Delta Lake Tables

How to Create Delta Lake Tables

By Matthew Powers

This post shows you how to create Delta Lake tables with Python, SQL, and PySpark.

Thumbnail for How to Version Your Data with pandas and Delta Lake

How to Version Your Data with pandas and Delta Lake

By Matthew Powers

This post shows you how to version your pandas datasets and the benefits you'll enjoy with versioned data.