Buoyant Data Blog

The buoyant_kernel distribution

April 09, 2026

deltalake

kernel

opensource

The next version of Delta Lake for Rust and Python will use a tailored distribution of delta_kernel. The new buoyant_kernel allows features, bug fixes, and optimizations to ship to users faster than before.

Investing in Delta Lake Security

March 25, 2026

deltalake

security

The recent supply-chain attacks in the Python ecosystem has shaken the confidence of a number of organizations who depend on Python to power their data ecosystem. In this post we detail how Buoyant Data is helping to ensure the security of the Delta Lake project.

Triggering small ETL workloads

March 16, 2026

deltalake

parquet

Processing less data is the best way to reduce data platform costs. The key is to use event-driven pipelines rather than scheduled pipelines to only process data when it is ready!

Going multiumodal on Data Engineering Central

February 13, 2026

databricks

deltalake

podcast

In this episode of the Data Engineering Central podcast, I join Daniel Beach to talk the present and future of the data platform. We discuss the "lakehouse architecture" as a stepping stone into what comes next for data engineering in an increasingly LLM-driven ecosystem.

The multimodal Delta Lake

February 03, 2026

rust

deltalake

parquet

The storage changes we need today must support "multimodal data" which is a dramatic departure in many ways from the traditional query and usage patterns our existing infrastructure supports. This post explores some research and development to deliver multimodal data for analysts and developers without changing the entire platform.

High-throughput data ingestion with the Buoyant Architecture

January 02, 2026

rust

deltalake

oxbow

Delta Lake allows for building high-throughput applications, especially for append-only workloads as part of a medallion architecture. In this post we review the high-throughput data ingestion architecture deployed by Buoyant Data using oxbow. Separating write and transaction management for efficiency when bringing data into the bronze layer.

Build more climate-friendly data applications with Rust

April 22, 2025

deltalake

oxbow

rust

Building more efficient data applications with Rust means lower cloud costs but also more climate-friendly software. Rust-based data pipelines can be an order of magnitude smaller than their JVm counterparts, leading to massive savings in power and compute consumption.

Lessons learned building delta-rs

March 09, 2025

deltalake

Reviewing some of the lessons learned building the delta-rs tooling and community.

Even more messages with serverless data ingestion!

February 24, 2025

deltalake

lambda

sqs-ingest

Serverless data ingestion can be extremely cost effective but limitations of AWS Lambda can result in transaction log bloat. In this post we'll discuss the "BUFFER_MORE" feature in sqs-ingest and how it helps get more bang for your Lambda buck.

Less is more: scaling streaming Delta Lake applications

December 31, 2024

deltalake

lambda

kafka-delta-ingest

Facing a large backlog of data it is tempting to horizontally scale Delta writers as much as compute and budget will allow. In this post we'll dive into how this can be counter propductive and actually slow throughput rather than accelerate it!

Introducing: Delta Lake The Definitive Guide

November 25, 2024

deltalake

python

rust

Introducing the definitive guide to Delta Lake, the high-performance open table format for cloud and on-premise big data needs. The book is now available from O'Reilly, including the contributed chapter for using Delta Lake with Rust and Python by R. Tyler Croy.

Let's do data engineering in Rust!

October 17, 2024

databricks

aws

event

rust

The future of data engineering is becoming more and more Rust-powered. In this video session Tyler walks the audience through a starting point on using Rust for real-world data engineering tasks with the deltalake, datafusion, and arrow crates.

Fast, cheap, and easy data ingestion with AWS Lambda and Delta Lake

October 16, 2024

databricks

aws

event

In this session we will dive into examples of how to work with Delta tables from AWS Lambdas written in Python and Rust. For many ingestion, or lightweight data processing workloads AWS Lambda provides a fast, easy, and cheap execution environment.

Join us for two talks at Data and AI Summit

June 04, 2024

databricks

aws

event

Buoyant Data will be in San Francisco for Data and AI Summit this year for a number of sessions including a obok signing, an open source summit, an AMA, and two conference track sessions! Come chat with us!

Scaling S3 Event Notifications for Delta Lake

December 30, 2023

aws

lambda

S3 Event Notifications are a highly useful way of orchestrating workflows around AWS S3-based Delta tables. This post details a pattern for ensuring highly concurrent Lambda execution with S3 Event Notifications

Concurrency limitations for Delta Lake on AWS

November 27, 2023

rust

deltalake

At a protocol level Delta Lake can scale to an infinite number of concurrent readers and writers, in theory, so long as the underlying storage provider supports strong atomicity. On AWS the Simple Storage Service lacks a necessary "put if absent" operation which requires Delta writers coordinate to ensure consistent writes to any given table.

Automating credentials for Delta Lake on AWS

July 08, 2023

rust

python

deltalake

aws

Remove those pesky hard-coded secret keys from your data applications and learn how to assume roles using built-in credential providers in AWS. This post includes examples that can be copied for both Rust and Python applications which need to access Delta tables.

5 tips for cheaper Databricks workloads

May 21, 2023

databricks

aws

Optimizing cost of workloads running on Databricks can be daunting at first, but there are plenty of low hanging fruit! These tips will help you save thousands of dollars annually on your big data's big bills!

Join us at Data and AI Summit 2023

May 17, 2023

databricks

aws

event

Buoyant Data will be in San Francisco for Data and AI Summit from June 26th to June 29th. We'll be talking about alternative data pipelines using Rust and Python, and cost optimization in AWS. Come find us!

Writing RecordBatches to Delta in Rust

February 09, 2023

deltalake

rust

developer

A developer focused post explaining how to write to a Delta table in Rust using the Apache Arrow RecordBatch data structure.

The cheapest Databricks deployment is $33/month

January 03, 2023

aws

databricks

Discussing whether it is possible to have a Databricks deployment with a $0 idle cost in AWS. It is a nice idea, but not entirely possible in practice. This post discusses the minimum footprint possible with Databricks.

Initial commit

December 18, 2022

news

aws

deltalake

databricks

An introductory post outlining what Buoyant Data can do to help save on their Databricks and AWS costs, along with our preferences for the most cost effective data platform architecture.

Buoyant Data

News and tips from us to you

The buoyant_kernel distribution

Investing in Delta Lake Security

Triggering small ETL workloads

Going multiumodal on Data Engineering Central

The multimodal Delta Lake

High-throughput data ingestion with the Buoyant Architecture

Build more climate-friendly data applications with Rust

Lessons learned building delta-rs

Even more messages with serverless data ingestion!

Less is more: scaling streaming Delta Lake applications

Introducing: Delta Lake The Definitive Guide

Let's do data engineering in Rust!

Fast, cheap, and easy data ingestion with AWS Lambda and Delta Lake

Join us for two talks at Data and AI Summit

Scaling S3 Event Notifications for Delta Lake

Concurrency limitations for Delta Lake on AWS

Automating credentials for Delta Lake on AWS

5 tips for cheaper Databricks workloads

Join us at Data and AI Summit 2023

Writing RecordBatches to Delta in Rust

The cheapest Databricks deployment is $33/month

Initial commit

Buoyant Data Inc