Big data adds significant value to your organization but can also add significant cost. Buoyant Data specializes improving data infrastructure with high performance low-cost ingestion and transformation pipelines with Rust, Databricks, and AWS.
With years of experience creating open source data infrastructure such as delta-rs, kafka-delta-ingest, and more, Buoyant Data can help your organization adopt and excel with high-performance, low-cost data services built on top of Rust.
Our expertise in leveraging Delta Lake, Databricks (including Unity catalog), and AWS Glue or Athena can help design and implement a scalable and efficient data platform.
We can also help review existing data infrastructure and analytics to squeeze faster queries and lower costs out of your current data platform.
At a protocol level Delta Lake can scale to an infinite number of concurrent readers and writers, in theory, so long as the underlying storage provider supports strong atomicity. On AWS the Simple Storage Service lacks a necessary "put if absent" operation which requires Delta writers coordinate to ensure consistent writes to any given table.
Read moreRemove those pesky hard-coded secret keys from your data applications and learn how to assume roles using built-in credential providers in AWS. This post includes examples that can be copied for both Rust and Python applications which need to access Delta tables.
Read moreOptimizing cost of workloads running on Databricks can be daunting at first, but there are plenty of low hanging fruit! These tips will help you save thousands of dollars annually on your big data's big bills!
Read moreBuoyant Data will be in San Francisco for Data and AI Summit from June 26th to June 29th. We'll be talking about alternative data pipelines using Rust and Python, and cost optimization in AWS. Come find us!
Read moreA developer focused post explaining how to write to a Delta table in Rust using the Apache Arrow RecordBatch data structure.
Read moreDiscussing whether it is possible to have a Databricks deployment with a $0 idle cost in AWS. It is a nice idea, but not entirely possible in practice. This post discusses the minimum footprint possible with Databricks.
Read moreAn introductory post outlining what Buoyant Data can do to help save on their Databricks and AWS costs, along with our preferences for the most cost effective data platform architecture.
Read more