Let's do data engineering in Rust!

Delta Lake for Rust logo with Ferris the crab in front of the delta triangle logo

This talk is was given at the 2024 Data and AI Summit hosted by Databricks.

The future of data engineering is looking increasingly Rusty. By adopting the foundational crates of deltalake, datafusion, and arrow, developers can write high-performance and low-cost ingestion pipelines, transformation jobs, and data query applications.

Attendees don't need to know Rust ahead of time; we will review some fundamental concepts of the language as they pertain to the data engineering domain. The main goal of this session is to provide attendees with a starting point to learn Rust by applying it to the real-world data problems they're already familiar with:

  • Ingesting semi-structured data into Delta tables (e.g. CSV, JSON, etc.)
  • Enriching data from multiple tables to create new silver/gold tables
  • Performing table management (e.g. OPTIMIZE, VACUUM)
  • Exporting data from Delta tables to external systems (e.g. Elastic/OpenSearch)

Note: The presentation software used for this talk is the open source presenterm tool which is delightful for creating development-focused presentations like this one!