Delta Lake Blogs

Delta Lake Small File Compaction with OPTIMIZE
This post shows compact small files in Delta tables with OPTMIZE.

Adding and Deleting Partitions in Delta Lake tables
By Matthew Powers , Ryan Zhu
This post shows add partitions and remove partitions from Delta Lake tables.

Remove old files with the Delta Lake Vacuum Command
By Matthew Powers , Nick Karpov
This blog post explains how to remove files marked for deletion from storage with the Delta Lake Vacuum command.

Reading Delta Lake Tables into Polars DataFrames
By Matthew Powers , Chitral Verma
This post shows how to read Delta Lake tables into Polars DataFrames.

Building a more efficient data infrastructure for machine learning with Open Source using Delta Lake, Amazon SageMaker, and EMR
By Vedant Jain , Denny Lee
In this blog, we’ll explore how connecting Delta Lake, Amazon SageMaker Studio, and Amazon EMR can simplify the end-to-end workflow required to support data engineering and data science projects.

Data Sharing across Government Agencies using Delta Sharing
By Li Yu , Mubashir Kazia , Jon D. Ceanfaglione , Prabha Rajendran , Purushotam Shrestha , Shawn A. Benjamin
This post shows how government agencies are sharing data with Delta Sharing.

How to Delete Rows from a Delta Lake Table
This post teaches you how to delete rows from a Delta Lake table and how the operation is implemented under the hood.

Delta Lake Constraints and Checks
This post shows how to add constraints to your Delta table to avoid certain types of values from getting appended.

Delta Lake Schema Enforcement
This post teaches you about schema enforcement in Delta Lake and why it's better than what's offered by data lakes

Why PySpark append and overwrite write operations are safer in Delta Lake than Parquet tables
This post shows you why PySpark overwrite operations are safer with Delta Lake and how the different save mode operations are implemented under the hood.

How to Create Delta Lake Tables
This post shows you how to create Delta Lake tables with Python, SQL, and PySpark.

How to Version Your Data with pandas and Delta Lake
This post shows you how to version your pandas datasets and the benefits you'll enjoy with versioned data.