Open Source @

We strongly believe that open source data technology is the right choice for most organizations. An open source solution allows you to take something off the shelf, and tailor it to the unique needs of your data platform. We can suggest and in some cases support your organization's adoption of open source tooling for your data and ML platform.

We Build

  • Delta Lake Lambdas for managing data platform workflows:
    • oxbow: Utility for converting a bunch of Apache Parquet into a Delta Lake table.
    • s3-restructure: Lambda for restructuring objects created in an S3 bucket.
    • delta-optimize: Lambda for periodically running OPTIMIZE on Delta tables.
    • spark-connect-rust: Thin Rust bindings for Spark Connect

We Customize

We Support

  • Apache Airflow, a platform created by the community to programmatically author, schedule and monitor workflows.
  • Apache Arrow (Rust), implementation of the Arrow in-memory data format in Rust.
  • Apache Kafka, a distributed event streaming platform.
  • DataFusion, an extensible query planning, optimization, and execution framework, written in Rust, that uses Apache Arrow as its in-memory format.
  • terraform-provider-databricks, a provider for automating Databricks infrastructure with Terraform.

We Like

If you need help with open source data technology in your organization, let us know!