This week we released the initial versions of the buoyant_kernel crate to
unlock more features for Delta Lake Rust and Python users. After a frenzied
sprint I thought it was important to share what we are going to
accomplish with buoyant_kernel,
where we're going, and why it's important for the Detla
Lake ecosystem.
I believe the distribution model of the Linux kernel has been one of the great strengths of the Linux ecosystem. Different integrators with slightly different system requirements are able to curate customizations downstream of the vanilla Linux kernel. For example, Red Hat is able to publish customizations needed for their specific customers' needs, while Linus and others evaluate patches for their broad applicability. It is in everybody's best interests to get their patches upstream but the presence of these distributions allows experiments to incubate closest to where they are used.
buoyant_kernel is a distribution of
delta_kernel which gives us a
similar capability.
The Delta Kernel is one of the most technically challenging and ambitious open source projects I have worked on. Kernel is fundamentally about unifying all of our needs and wants from a Delta Lake implementation into a single cohesive yet-pluggable API surface.
I have written elsewhere on the challenges facing Delta Kernel and I believe distributions downstream help the entire community move forward by providing a path for more rapid releases, curated patches for experimentation, and integration into delta-rs.
delta-rs powered by buoyant_kernel
The next release of the deltalake crate and
Python package will
be using the buoyant_kernel. This upcoming release will contain Arrow 58, Datafusion
53, object_store 0.13,
and a number of other dependency upgrades that are only possible because of the
buoyant_kernel distribution.
We are also incorporating experimental nanosecond timestamp support which has been developed by the Open Source Team at G-Research before that support officially lands upstream.
This is huge for the delta-rs project as it gives us a way to identify
fixes in delta_kernel faster, but not wait for the full development cycle
time upstream to incorporate those fixes into a released delta_kernel crate.
Using Cargo's renaming
support
delta-rs and other projects are able to easily incorporate buoyant_kernel as
a drop-in replacement for the upstream delta_kernel package.
delta_kernel = { package = "buoyant_kernel", version = "0.21.101", features = [
"arrow-58",
"internal-api",
] }
How it works
Personally I do not want branch too far away from the
upstream Delta Kernel project. Too much drift becomes a bigger release
engineering burden than would be worthwhile. I have also mentioned a number of times
the project is incredibly ambitious and important. There are numerous
talented folks working to help Delta Kernel realize the vision of unifying
support for Delta Lake connectors across the ecosystem. The distribution model
of buoyant_kernel is key to making this function efficiently. Providing
more rapid releases
results in more frequent integration of delta_kernel code with with real
production workloads.
The buoyant_kernel code is open source but
the savvy reader may notice that Pull Requests are disabled on the repository.
All code should be contributed to the
upstream project where it can be
considered for incorporation.
buoyant_kernel tracks the upstream main branch with periodic refresh
merges. In some situations we will need to rebase our patches on top of the
upstream main and therefore the buoyant/main branch of the repository may
need to be force-pushed in the future. This is one of the many reasons we want
to discourage direct pull requests to buoyant_kernel.

Buoyant Data has integration and release testing infrastructure in place
to support Delta Lake LTS. The LTS product
offers backported fixes for Rust and Python enterprise customers. This same
infrastructure was easily adapted to support full-stack integration testing for
all the upstream changes coming into buoyant_kernel.
Patches identified by the delta-rs team for bug and
security fixes, or performance
improvements are pulled forward into buoyant_kernel and published to
crates.io. This allows for rapid
adoption of improvements by delta-rs users while allowing improvements to land
upstream on a timeline that is comfortable for the upstream maintainers of
delta_kernel.
I'm looking forward to the next few releases of
delta-rs built around buoyant_kernel
which will be bringing a number of improvements from upstream like Identity
Columns support, better v2 Checkpoints, and improved Column Mapping support.
If you are interested in building downstream of buoyant_kernel or have more
questions about how we're improving Delta Lake for Rust and Python, drop me an
email and we'll chat!!
