Build more climate-friendly data applications with Rust

April 22, 2025

by R. Tyler Croy

deltalake

oxbow

rust

Time is money. In the cloud time is measured and billed by the vCPU/hour and the most efficient software is always the cheapest. This past week I deployed a new design for an existing data application built with Rust which was 1% the cost of its predecessor which means it requires 1% of the resources and produces 1% of the CO2.

The goal for the new application architecture was not to be "more green". For most businesses doing more with less cost is an inherently valuable objective. That makes application design optimization with or without Rust a worthwhile pursuit. The cost improvements in this case were so substantial I could not help but wonder about the change in CO2 impact and how big of a role Rust played. ferris loves ferris (by @aldeka)

There is academic research which suggests that Rust and other native applications (C/C++/Fortran) use substantially less energy than equivalent software in JVM or interpreted languages such as Python. Power efficiency can be used as a proxy for CO2 impact since higher efficiency applications require less power from fewer computers, which reduces emissions impact.


Using this calculator as a rough guideline. I was able to calculate the estimated amount of CO2 produced by the compute resources used in the correct region for the legacy and new versions of this application. The overall data flow is:

  • High frequency data received over TCP/TLS connection
  • Deserialization against an expected Delta schema.
  • Data written into a Delta table.

Both versions are serving identical data streams during the change-over period which gives us a brief glimpse into how different application architectures and platforms can perform.

Legacy

flowchart LR P>Producer] -->|TLS| HD[Receiver] HD --> K{Kafka} K -->|topic:part1| KDIA[kafka-delta-ingest] K -->|topic:part2| KDIB[..] K -->|topic:partX| KDIX[..] KDIA -->|write| DT[(Delta Table)] KDIB --> DT KDIX --> DT

I should note that kafka-delta-ingest is already written in Rust and saved over 90% of the cost compared to the Apache Spark system which preceded it. The application architecture still required far more resources than are strictly necessary for the task at hand, which led to the new design below.

New

The new application was radically simplified by cutting out intermediate buffering and adopts the oxbow architecture for performing high-throughput Delta Lake writes.

flowchart LR P>Producer] -->|TLS| HD[Receiver] HD -->|parquet write| S[(S3)] HD -->|parquet write| S[(S3)] S -->|bucket notifications| O[Oxbow] O -->|write| DT[(Delta Table)]
  • Receiver: 1vCPU ~36kg/year of CO2
  • Oxbow: <1vCPU and less than ~20kg of CO2

Removing Apache Kafka from the architecture had a substantial cost impact but I was surprised to see the majority of the emissions impact came from removing the need to keep as many vCPUs and RAM online for the non-Kafka parts of the architecture. By deserializing data and landing it directly into storage, the equivalent of an entire 1U server could be decommissioned!

As one commenter pointed out on Mastodon:

Times and times again, efficiency gains always has resulted in greater usage, more than compensating the gains made with efficiency. It's so common that it has a name: Jevon's paradox. As a rust advocate, I always mention the efficiency gains, but it will always be in back of my mind.

(see Jevons paradox)

The following conditions are necessary for a Jevons paradox to occur:

  1. Technological change which increases efficiency or productivity
  2. The efficiency/productivity boost must result in a decreased consumer price for such goods or services
  3. That reduced price must drastically increase quantity demanded (demand curve must be highly elastic)

For better or worse the cost of on-demand compute in cloud environmnents is not highly elastic so I believe the probability of Jevons paradox occurring when optimizing systems with Rust to be low!


Converting frequently invoked AWS Lambda functions from an interpreted language to Rust can also lead to dramatically lower costs. Reducing the function duration has a 1:1 impact on cost, a Rust function which is 90% faster is also 90% cheaper. A function which requires 50% less memory is also 50% cheaper.

The climate benefits of optimizing Lambdas are incredibly difficult to prove given the nature of AWS' capacity planning and procurement practices, but I believe that the fewer resources used, the less climate impact will be.

Cost-benefit of cost optimization

In the cloud there is a really compelling cost-benefit equation that can be used for optimizing any application. Fewer resources means less vCPUs and lower costs. There is a simple causal relationship that can observed within hours on the cost of doing business.

In the on-premise environment there is a looser correlation between performance and cost since racked servers represent committed fixed overhead. Unlike the cloud environment, on-premise cost follows a step-function where climate-friendly applications can become a lot more meaningful. For example, an application which requires scaling from 10 to 15 servers in a data center has the obvious impact of those 5 additional servers. In many facilities there are other costs which step up like network transit requiring additional hardware, or powering one rack versus two.

The climate benefits of cost optimization for on-premise deployments goes up dramatically if the hardware refresh rate can be extended from 2 years to 3 or further.


Building more efficient data applications has a myriad of benefits such as lower costs and faster feedback which can be proxy indicators for lower power and resource consumption. The Rust ecosystem for data applications has matured at such an incredible rate with Apache DataFusion and Apache Arrow that it is now possible to solve most of our data problems using native code rather than heavier interpreted languages like Python, Scala, or even pure Java.

Use the opportunity to shut machines off, save some power, and lower the footprint for your own data applications!


If your team is looking to improve performance of your streaming or batch infrastructure with Delta Lake, drop me an email and we'll chat!!