<?xml version="1.0" encoding="utf-8"?><rss version="2.0"><channel><title>Buoyant Data</title><link>https://www.buoyantdata.com</link><description>Buoyant Data</description><item><title>The buoyant_kernel distribution</title><link>https://www.buoyantdata.com/blog/2026-04-09-buoyant-kernel.html</link><description><![CDATA[The next version of Delta Lake for Rust and Python will use a tailored
distribution of delta_kernel. The new buoyant_kernel allows features, bug
fixes, and optimizations to ship to users faster than before.
]]></description><guid>https://www.buoyantdata.com/blog/2026-04-09-buoyant-kernel.html</guid><pubDate>Thu, 09 Apr 2026 00:00:00 +0000</pubDate></item><item><title>Investing in Delta Lake Security</title><link>https://www.buoyantdata.com/blog/2026-03-25-investing-deltalake-security.html</link><description><![CDATA[The recent supply-chain attacks in the Python ecosystem has shaken the
confidence of a number of organizations who depend on Python to power their
data ecosystem. In this post we detail how Buoyant Data is helping to
ensure the security of the Delta Lake project.
]]></description><guid>https://www.buoyantdata.com/blog/2026-03-25-investing-deltalake-security.html</guid><pubDate>Wed, 25 Mar 2026 00:00:00 +0000</pubDate></item><item><title>Triggering small ETL workloads</title><link>https://www.buoyantdata.com/blog/2026-03-16-triggering-small-batches.html</link><description><![CDATA[Processing less data is the best way to reduce data platform costs. The key
is to use event-driven pipelines rather than scheduled pipelines to only
process data when it is ready!
]]></description><guid>https://www.buoyantdata.com/blog/2026-03-16-triggering-small-batches.html</guid><pubDate>Mon, 16 Mar 2026 00:00:00 +0000</pubDate></item><item><title>Going multiumodal on Data Engineering Central</title><link>https://www.buoyantdata.com/blog/2026-02-13-data-engineering-central.html</link><description><![CDATA[In this episode of the Data Engineering Central podcast, I join Daniel Beach to
talk the present and future of the data platform. We discuss the "lakehouse
architecture" as a stepping stone into what comes next for data engineering
in an increasingly LLM-driven ecosystem.
]]></description><guid>https://www.buoyantdata.com/blog/2026-02-13-data-engineering-central.html</guid><pubDate>Fri, 13 Feb 2026 00:00:00 +0000</pubDate></item><item><title>The multimodal Delta Lake</title><link>https://www.buoyantdata.com/blog/2026-02-03-multimodal-delta-lake.html</link><description><![CDATA[The storage changes we need today must support "multimodal data" which is a
dramatic departure in many ways from the traditional query and usage
patterns our existing infrastructure supports. This post explores some
research and development to deliver multimodal data for analysts and
developers without changing the entire platform.
]]></description><guid>https://www.buoyantdata.com/blog/2026-02-03-multimodal-delta-lake.html</guid><pubDate>Tue, 03 Feb 2026 00:00:00 +0000</pubDate></item><item><title>High-throughput data ingestion with the Buoyant Architecture</title><link>https://www.buoyantdata.com/blog/2026-01-02-design-for-throughput.html</link><description><![CDATA[Delta Lake allows for building high-throughput applications, especially for
append-only workloads as part of a medallion architecture. In this post we
review the high-throughput data ingestion architecture deployed by Buoyant
Data using oxbow. Separating write and transaction management for efficiency
when bringing data into the bronze layer.
]]></description><guid>https://www.buoyantdata.com/blog/2026-01-02-design-for-throughput.html</guid><pubDate>Fri, 02 Jan 2026 00:00:00 +0000</pubDate></item><item><title>Build more climate-friendly data applications with Rust</title><link>https://www.buoyantdata.com/blog/2025-04-22-rust-is-good-for-the-climate.html</link><description><![CDATA[Building more efficient data applications with Rust means lower cloud costs
but also more climate-friendly software.

Rust-based data pipelines can be an order of magnitude smaller than their
JVm counterparts, leading to massive savings in power and compute consumption.
]]></description><guid>https://www.buoyantdata.com/blog/2025-04-22-rust-is-good-for-the-climate.html</guid><pubDate>Tue, 22 Apr 2025 00:00:00 +0000</pubDate></item><item><title>Lessons learned building delta-rs</title><link>https://www.buoyantdata.com/blog/2025-03-09-lessons-learned-building-delta-rs.html</link><description><![CDATA[Reviewing some of the lessons learned building the delta-rs tooling and
community.
]]></description><guid>https://www.buoyantdata.com/blog/2025-03-09-lessons-learned-building-delta-rs.html</guid><pubDate>Sun, 09 Mar 2025 00:00:00 +0000</pubDate></item><item><title>Even more messages with serverless data ingestion!</title><link>https://www.buoyantdata.com/blog/2025-02-24-just-keep-buffering.html</link><description><![CDATA[Serverless data ingestion can be extremely cost effective but limitations
of AWS Lambda can result in transaction log bloat. In this post we'll
discuss the "BUFFER_MORE" feature in sqs-ingest and how it helps get more
bang for your Lambda buck.
]]></description><guid>https://www.buoyantdata.com/blog/2025-02-24-just-keep-buffering.html</guid><pubDate>Mon, 24 Feb 2025 00:00:00 +0000</pubDate></item><item><title>Less is more: scaling streaming Delta Lake applications</title><link>https://www.buoyantdata.com/blog/2024-12-31-high-concurrency-logstore.html</link><description><![CDATA[Facing a large backlog of data it is tempting to horizontally scale Delta
writers as much as compute and budget will allow. In this post we'll dive
into how this can be counter propductive and actually slow throughput
rather than accelerate it!
]]></description><guid>https://www.buoyantdata.com/blog/2024-12-31-high-concurrency-logstore.html</guid><pubDate>Tue, 31 Dec 2024 00:00:00 +0000</pubDate></item><item><title>Introducing: Delta Lake The Definitive Guide</title><link>https://www.buoyantdata.com/blog/2024-11-25-delta-lake-the-definitive-guide.html</link><description><![CDATA[Introducing the definitive guide to Delta Lake, the high-performance
open table format for cloud and on-premise big data needs. The book is
now available from O'Reilly, including the contributed chapter for
using Delta Lake with Rust and Python by R. Tyler Croy.
]]></description><guid>https://www.buoyantdata.com/blog/2024-11-25-delta-lake-the-definitive-guide.html</guid><pubDate>Mon, 25 Nov 2024 00:00:00 +0000</pubDate></item><item><title>Let&apos;s do data engineering in Rust!</title><link>https://www.buoyantdata.com/blog/2024-10-17-data-ai-summit-2024-rust-data-eng.html</link><description><![CDATA[The future of data engineering is becoming more and more Rust-powered.
In this video session Tyler walks the audience through a starting point
on using Rust for real-world data engineering tasks with the deltalake,
datafusion, and arrow crates.
]]></description><guid>https://www.buoyantdata.com/blog/2024-10-17-data-ai-summit-2024-rust-data-eng.html</guid><pubDate>Thu, 17 Oct 2024 00:00:00 +0000</pubDate></item><item><title>Fast, cheap, and easy data ingestion with AWS Lambda and Delta Lake</title><link>https://www.buoyantdata.com/blog/2024-10-16-data-ai-summit-2024-videos.html</link><description><![CDATA[In this session we will dive into examples of how to work with Delta tables
from AWS Lambdas written in Python and Rust. For many ingestion, or lightweight
data processing workloads AWS Lambda provides a fast, easy, and cheap execution
environment. 
]]></description><guid>https://www.buoyantdata.com/blog/2024-10-16-data-ai-summit-2024-videos.html</guid><pubDate>Wed, 16 Oct 2024 00:00:00 +0000</pubDate></item><item><title>Join us for two talks at Data and AI Summit</title><link>https://www.buoyantdata.com/blog/2024-06-04-data-and-ai-summit.html</link><description><![CDATA[Buoyant Data will be in San Francisco for Data and AI Summit this year for 
a number of sessions including a obok signing, an open source summit, an
AMA, and two conference track sessions! Come chat with us!
]]></description><guid>https://www.buoyantdata.com/blog/2024-06-04-data-and-ai-summit.html</guid><pubDate>Tue, 04 Jun 2024 00:00:00 +0000</pubDate></item><item><title>Scaling S3 Event Notifications for Delta Lake</title><link>https://www.buoyantdata.com/blog/2023-12-30-serialized-s3-notifications.html</link><description><![CDATA[S3 Event Notifications are a highly useful way of orchestrating workflows
around AWS S3-based Delta tables. This post details a pattern for ensuring
highly concurrent Lambda execution with S3 Event Notifications
]]></description><guid>https://www.buoyantdata.com/blog/2023-12-30-serialized-s3-notifications.html</guid><pubDate>Sat, 30 Dec 2023 00:00:00 +0000</pubDate></item><item><title>Concurrency limitations for Delta Lake on AWS</title><link>https://www.buoyantdata.com/blog/2023-11-27-concurrency-limitations-with-deltalake-on-aws.html</link><description><![CDATA[At a protocol level Delta Lake can scale to an infinite number of concurrent readers and writers, in theory, so long as the underlying storage provider supports strong atomicity. On AWS the Simple Storage Service lacks a necessary "put if absent" operation which requires Delta writers coordinate to ensure consistent writes to any given table.
]]></description><guid>https://www.buoyantdata.com/blog/2023-11-27-concurrency-limitations-with-deltalake-on-aws.html</guid><pubDate>Mon, 27 Nov 2023 00:00:00 +0000</pubDate></item><item><title>Automating credentials for Delta Lake on AWS</title><link>https://www.buoyantdata.com/blog/2023-07-08-instance-authentication-delta-rust.html</link><description><![CDATA[Remove those pesky hard-coded secret keys from your data applications and
learn how to assume roles using built-in credential providers in AWS. This
post includes examples that can be copied for both Rust and Python
applications which need to access Delta tables.
]]></description><guid>https://www.buoyantdata.com/blog/2023-07-08-instance-authentication-delta-rust.html</guid><pubDate>Sat, 08 Jul 2023 00:00:00 +0000</pubDate></item><item><title>5 tips for cheaper Databricks workloads</title><link>https://www.buoyantdata.com/blog/2023-05-21-five-tips-for-cheaper-databricks.html</link><description><![CDATA[Optimizing cost of workloads running on Databricks can be daunting at
first, but there are plenty of low hanging fruit! These tips will help
you save thousands of dollars annually on your big data's big bills!
]]></description><guid>https://www.buoyantdata.com/blog/2023-05-21-five-tips-for-cheaper-databricks.html</guid><pubDate>Sun, 21 May 2023 00:00:00 +0000</pubDate></item><item><title>Join us at Data and AI Summit 2023</title><link>https://www.buoyantdata.com/blog/2023-05-17-data-and-ai-summit.html</link><description><![CDATA[Buoyant Data will be in San Francisco for Data and AI Summit from June 26th
to June 29th. We'll be talking about alternative data pipelines using Rust
and Python, and cost optimization in AWS. Come find us!
]]></description><guid>https://www.buoyantdata.com/blog/2023-05-17-data-and-ai-summit.html</guid><pubDate>Wed, 17 May 2023 00:00:00 +0000</pubDate></item><item><title>Writing RecordBatches to Delta in Rust</title><link>https://www.buoyantdata.com/blog/2023-02-09-rust-recordbatchwriter-example.html</link><description><![CDATA[A developer focused post explaining how to write to a Delta table in Rust
using the Apache Arrow RecordBatch data structure.
]]></description><guid>https://www.buoyantdata.com/blog/2023-02-09-rust-recordbatchwriter-example.html</guid><pubDate>Thu, 09 Feb 2023 00:00:00 +0000</pubDate></item><item><title>The cheapest Databricks deployment is $33/month</title><link>https://www.buoyantdata.com/blog/2023-01-03-cheapest-possible-databricks.html</link><description><![CDATA[Discussing whether it is possible to have a Databricks deployment with a $0
idle cost in AWS. It is a nice idea, but not entirely possible in practice. This
post discusses the minimum footprint possible with Databricks.
]]></description><guid>https://www.buoyantdata.com/blog/2023-01-03-cheapest-possible-databricks.html</guid><pubDate>Tue, 03 Jan 2023 00:00:00 +0000</pubDate></item><item><title>Initial commit</title><link>https://www.buoyantdata.com/blog/2022-12-18-initial-commit.html</link><description><![CDATA[An introductory post outlining what Buoyant Data can do to help save on
their Databricks and AWS costs, along with our preferences for the most
cost effective data platform architecture.
]]></description><guid>https://www.buoyantdata.com/blog/2022-12-18-initial-commit.html</guid><pubDate>Sun, 18 Dec 2022 00:00:00 +0000</pubDate></item><item><title>Buoyant Data Blog</title><link>https://www.buoyantdata.com/blog/index.html</link><description><![CDATA[]]></description><guid>https://www.buoyantdata.com/blog/index.html</guid></item></channel></rss>
