Project 2: The Box-less Async Trait (Zero-Cost Async)
Project 2: The Box-less Async Trait (Zero-Cost Async)
Goal: Master Generic Associated Types (GATs) and understand why async in traits was historically problematic by building a zero-allocation async trait system that rivals
async-traitin ergonomics while eliminating all heap allocations.
Project Metadata
- Main Programming Language: Rust
- Coolness Level: Level 3: Genuinely Clever
- Difficulty: Level 4: Expert
- Knowledge Area: Async / Metaprogramming
- Time Estimate: 1 week
- Prerequisites: Solid understanding of async/await, basic trait design, familiarity with lifetimes
Learning Objectives
By completing this project, you will:
- Understand the historical problem - Why
async fnin traits was impossible before Rust 1.75 and what the compiler needs to make it work - Master Generic Associated Types (GATs) - How to define associated types with their own lifetime and type parameters
- Deeply comprehend Higher-Ranked Trait Bounds (HRTBs) - The
for<'a>syntax and when/why you need it - Know the three approaches - Compare
async-trait(boxing), GAT-based (manual), and RPITIT (Rust 1.75+) - Visualize memory layouts - See the difference between stack-allocated and heap-allocated futures
- Measure real performance - Build benchmarks that prove the allocation difference matters
- Understand object safety - Why GAT-based traits sacrifice
dyn Traitcapability - Apply to real-world patterns - See how Tower, Axum, and other async crates solve this problem
- Navigate lifetime complexity - Master the
where Self: 'apattern and why itโs essential - Debug async trait issues - Recognize common error patterns and their solutions
Deep Theoretical Foundation
The Historical Problem: Why Async in Traits Was Hard
Before we build the solution, we must deeply understand the problem. When Rust introduced async/await in version 1.39 (November 2019), it came with a significant limitation: you could not write async fn directly in trait definitions.
The core issue is that async functions return anonymous types:
// When you write this:
async fn hello() -> String {
"Hello".to_string()
}
// The compiler generates something like this:
fn hello() -> impl Future<Output = String> {
// An anonymous struct implementing Future
// Size known only to the compiler at this call site
}
The impl Future return type works in free functions because the compiler knows the exact type at the call site. But in traits, the implementor determines the concrete type:
trait Greeter {
async fn greet(&self) -> String; // ERROR before Rust 1.75!
}
Why the compiler canโt handle this:
The Size Problem:
=================
When you call a trait method through a reference:
fn use_greeter(g: &impl Greeter) {
let future = g.greet(); // What is the SIZE of this future?
// ^^^^^^^^^^
// Compiler needs to know at compile time
// But the size depends on which impl is used!
}
For static dispatch (&impl Greeter), the compiler monomorphizes
and can figure it out. But for dynamic dispatch (dyn Greeter),
it's impossible because:
1. The future type is different for each implementor
2. Different futures have different sizes
3. dyn Trait requires uniform sizing for vtable dispatch
What async-trait Actually Does: The Boxing Solution
The async-trait crate by David Tolnay solved this with a procedural macro that transforms your code:
// What you write:
#[async_trait]
trait Greeter {
async fn greet(&self) -> String;
}
#[async_trait]
impl Greeter for MyGreeter {
async fn greet(&self) -> String {
"Hello!".to_string()
}
}
// What the macro generates:
trait Greeter {
fn greet<'life0, 'async_trait>(
&'life0 self
) -> Pin<Box<dyn Future<Output = String> + Send + 'async_trait>>
where
'life0: 'async_trait,
Self: 'async_trait;
}
impl Greeter for MyGreeter {
fn greet<'life0, 'async_trait>(
&'life0 self
) -> Pin<Box<dyn Future<Output = String> + Send + 'async_trait>>
where
'life0: 'async_trait,
Self: 'async_trait,
{
Box::pin(async move {
"Hello!".to_string()
})
}
}
Visual representation of the boxing:
async-trait Desugaring:
=======================
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Your async fn body: โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ async move { "Hello!".to_string() } โ โ
โ โ โ โ
โ โ This is a state machine (anonymous struct) โ โ
โ โ Size: 0 bytes (this example) to hundreds โ โ
โ โ of bytes (complex async functions) โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Box::pin( ... ) โ โ
โ โ โ โ
โ โ Allocates on the HEAP: โ โ
โ โ - 16 bytes for Box pointer + vtable pointer โ โ
โ โ - N bytes for the actual future โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Return type: Pin<Box<dyn Future<Output = T>>> โ โ
โ โ โ โ
โ โ Always the same size (2 * usize = 16 bytes) โ โ
โ โ Works with dyn Trait (object safe!) โ โ
โ โ BUT: Heap allocation on EVERY call โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
The cost of boxing:
- Heap allocation: Every async method call requires
malloc+free - Indirection: Two pointer dereferences (Box + vtable)
- No inlining: The optimizer cannot inline across dynamic dispatch
- Cache unfriendly: Heap-allocated futures scatter across memory
Generic Associated Types (GATs): The Key to Zero-Cost
GATs, stabilized in Rust 1.65 (November 2022), allow associated types to have their own generic parameters:
// Before GATs (Rust < 1.65):
trait Iterator {
type Item; // No generics allowed
}
// With GATs (Rust >= 1.65):
trait LendingIterator {
type Item<'a> where Self: 'a; // Lifetime parameter!
fn next<'a>(&'a mut self) -> Option<Self::Item<'a>>;
}
Why GATs solve the async trait problem:
trait AsyncService {
// The future type can now vary with the lifetime of &self
type ProcessFut<'a>: Future<Output = String> + 'a
where
Self: 'a; // The future can't outlive self
fn process<'a>(&'a self, input: &'a str) -> Self::ProcessFut<'a>;
}
impl AsyncService for MyService {
// Each implementor specifies its own future type
type ProcessFut<'a> = impl Future<Output = String> + 'a;
fn process<'a>(&'a self, input: &'a str) -> Self::ProcessFut<'a> {
async move {
format!("{}: {}", self.prefix, input)
}
}
}
Memory layout with GATs (stack allocation):
GAT-based Async Trait:
======================
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Your async fn body: โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ async move { format!("{}: {}", self.prefix, input) } โ
โ โ โ โ
โ โ State machine struct: โ โ
โ โ - Reference to self (&MyService) โ โ
โ โ - Reference to input (&str) โ โ
โ โ - State enum (Start, Waiting, Done) โ โ
โ โ Size: ~24-48 bytes (implementation specific) โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Return type: Self::ProcessFut<'a> โ โ
โ โ โ โ
โ โ Type is CONCRETE (known at compile time) โ โ
โ โ Lives on the STACK (caller's stack frame) โ โ
โ โ NO heap allocation! โ โ
โ โ CAN be inlined by optimizer! โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Stack Frame:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Local variables โ โ Caller's stack
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ ProcessFut<'a> { โ
โ self_ref: &MyService, โ
โ input_ref: &str, โ
โ state: State::Start, โ
โ } โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ More locals... โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
NO HEAP ALLOCATION!
Higher-Ranked Trait Bounds (HRTBs): The for<โa> Syntax
HRTBs are crucial when you need to express โworks for any lifetimeโ:
// Without HRTB - specific lifetime
fn call_once<'a>(service: &'a impl AsyncService) -> impl Future<Output = String> + 'a {
service.process("hello")
}
// With HRTB - works for ANY lifetime
fn accepts_any_service<S>(service: S)
where
S: for<'a> AsyncService, // For ANY lifetime 'a
for<'a> S::ProcessFut<'a>: Send, // The future must be Send for all 'a
{
// ...
}
Understanding for<โa> visually:
HRTB: for<'a> Trait<'a>
=======================
Without HRTB (specific lifetime):
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ fn foo<'x>(t: &impl Trait<'x>) โ
โ โ
โ The caller picks 'x. The function must work โ
โ with whatever specific lifetime is chosen. โ
โ โ
โ Lifetime 'x: โโโโโโโโโโโโโโโโโโโโโโโโโบ โ
โ (one specific duration) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
With HRTB (any lifetime):
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ fn bar(t: impl for<'a> Trait<'a>) โ
โ โ
โ The function works for ALL possible lifetimes โ
โ - The function can use 'a however it wants โ
โ - More flexible but more restrictive on T โ
โ โ
โ 'a could be: โ
โ โโโบ (short) โ
โ โโโโโโโโโบ (medium) โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโบ (long) โ
โ (any duration works!) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Common use case in async traits:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ where โ
โ S: for<'a> Service<'a>, โ
โ for<'a> <S as Service<'a>>::Fut: Send โ
โ โ
โ "S implements Service for any lifetime 'a, โ
โ AND its future type is Send for any 'a" โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
The where Self: โa Bound Explained
This bound is essential and often confusing. Letโs break it down:
trait AsyncService {
type ProcessFut<'a>: Future<Output = String> + 'a
where
Self: 'a; // What does this mean?
fn process<'a>(&'a self, input: &'a str) -> Self::ProcessFut<'a>;
}
The intuition:
where Self: 'a
==============
This means: "Self must live at least as long as 'a"
Why do we need this?
โโโโโโโโโโโโโโโโโโโโ
The future ProcessFut<'a> has lifetime 'a and contains
a reference to &'a self. For this to be valid:
- self must be valid for at least 'a
- Therefore Self: 'a (Self outlives 'a)
Without this bound:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ let service = MyService::new(); โ
โ let future = service.process("hello"); โ
โ drop(service); // Service is gone! โ
โ future.await; // DANGER: future holds &serviceโ
โ // This would be use-after-free! โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
With the bound, the compiler ensures:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ let service = MyService::new(); โ
โ let future = service.process("hello"); โ
โ // service cannot be dropped while future existsโ
โ // because future: ProcessFut<'a> โ
โ // and service: Self must outlive 'a โ
โ future.await; // Safe! โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
RPITIT: Return Position Impl Trait in Traits (Rust 1.75+)
Rust 1.75 (December 2023) introduced native support for -> impl Trait in traits:
// Now legal in Rust 1.75+!
trait Greeter {
fn greet(&self) -> impl Future<Output = String>;
}
impl Greeter for MyGreeter {
fn greet(&self) -> impl Future<Output = String> {
async { "Hello!".to_string() }
}
}
How RPITIT works under the hood:
// What you write:
trait Greeter {
fn greet(&self) -> impl Future<Output = String>;
}
// What the compiler understands (conceptually):
trait Greeter {
type __greet_return_type: Future<Output = String>;
fn greet(&self) -> Self::__greet_return_type;
}
// Each impl provides a concrete hidden type:
impl Greeter for MyGreeter {
type __greet_return_type = /* compiler-generated type */;
fn greet(&self) -> Self::__greet_return_type {
async { "Hello!".to_string() }
}
}
Comparison: Three Approaches
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ASYNC TRAIT APPROACHES COMPARED โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ 1. async-trait crate (Boxing) โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ Pros: โ
โ โ Works on stable Rust (any version) โ
โ โ Object safe (dyn Trait works) โ
โ โ Simple to use โ
โ โ Automatic Send/Sync handling โ
โ Cons: โ
โ โ Heap allocation per call โ
โ โ Cannot inline futures โ
โ โ Vtable indirection overhead โ
โ โ Proc macro dependency โ
โ โ
โ 2. GAT-based (Manual) โ
โ โโโโโโโโโโโโโโโโโโโโโ โ
โ Pros: โ
โ โ Zero allocations โ
โ โ Full inlining possible โ
โ โ No macro magic โ
โ โ Maximum performance โ
โ Cons: โ
โ โ NOT object safe (no dyn Trait) โ
โ โ Requires Rust 1.65+ โ
โ โ More verbose trait definitions โ
โ โ Complex lifetime annotations โ
โ โ May need type_alias_impl_trait (unstable) โ
โ โ
โ 3. RPITIT (Rust 1.75+) โ
โ โโโโโโโโโโโโโโโโโโโโโโ โ
โ Pros: โ
โ โ Zero allocations โ
โ โ Native language support โ
โ โ Cleaner syntax than GATs โ
โ โ async fn in traits (coming in 1.78+) โ
โ Cons: โ
โ โ Requires Rust 1.75+ โ
โ โ NOT object safe โ
โ โ Some limitations vs full GATs โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Memory Layout Comparison: Visual Deep Dive
MEMORY LAYOUT: async-trait (Boxing)
===================================
Heap Stack
โโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ 0x00001000: โ โ 0x7fff0000: โ
โ โโโโโโโโโโโโโโโโโโโโโโโ โ โ โโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Future State Machineโ โ โ โ local_var_1: u64 โ โ
โ โ - captured &self โ โ โ โโโโโโโโโโโโโโโโโโโโโโโค โ
โ โ - captured input โ โ โ โ local_var_2: bool โ โ
โ โ - state enum โ โ โ โโโโโโโโโโโโโโโโโโโโโโโค โ
โ โ - locals from async โ โ โ โ future_box: โ โ
โ โ - padding โ โ โ โ .ptr: 0x00001000 โโโผโโผโโโบ Points to heap
โ โ Size: 64-200 bytes โ โ โ โ .vtable: 0x00002000โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโ โ โ โ Size: 16 bytes โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโ
Cost per call:
- malloc: ~10-50 ns
- free: ~10-50 ns
- Cache miss likely: ~100 ns
Total overhead: 30-200 ns per async call
MEMORY LAYOUT: GAT-based (Stack)
================================
Heap Stack
โโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ โ 0x7fff0000: โ
โ (Nothing here!) โ โ โโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ โ local_var_1: u64 โ โ
โ โ โ โโโโโโโโโโโโโโโโโโโโโโโค โ
โ โ โ โ local_var_2: bool โ โ
โ โ โ โโโโโโโโโโโโโโโโโโโโโโโค โ
โ โ โ โ future (inline): โ โ
โ โ โ โ .self_ref: &Svc โ โ
โ โ โ โ .input_ref: &str โ โ
โ โ โ โ .state: Start โ โ
โ โ โ โ Size: 32-64 bytes โ โ
โ โ โ โโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโ
Cost per call:
- malloc: 0 ns (no allocation!)
- free: 0 ns
- Cache: Already in L1/L2 (stack is hot)
Total overhead: ~0 ns per async call
Performance Implications: Deep Analysis
Why stack allocation wins:
- No allocator overhead:
mallocandfreeare surprisingly expensive- Thread-local caches must be checked
- Locks may be needed for thread safety
- Fragmentation bookkeeping
- Cache locality: Stack is always โhotโ in cache
- L1 cache hit: ~1 ns
- L2 cache hit: ~4 ns
- L3 cache hit: ~12 ns
- RAM access: ~100 ns
- Heap-allocated futures often cause L3 misses
- Inlining: Compiler can see through the abstraction
- With boxing: Vtable dispatch prevents inlining
- With GATs: Full inlining possible, optimizer can combine operations
- Branch prediction: Concrete types enable better optimization
- Boxing requires indirect call (vtable lookup)
- GATs enable direct call (address known at compile time)
When boxing is acceptable:
- Low-frequency calls (setup, configuration)
- When you need
dyn Trait(runtime polymorphism) - When the future itself does I/O (network latencyย ยป allocation cost)
- Rapid prototyping
The Core Question Youโre Answering
โWhy does
async fnin a trait usually require a Box?โ
Async functions return a hidden type (the state machine). In a trait, the compiler doesnโt know the size of this state machine for every possible implementation. async-trait solves this by putting that state machine in a Box (pointer-sized). Your goal is to tell the compiler exactly where to find that type without the Box.
Concepts You Must Understand First
Stop and research these before coding:
- Generic Associated Types (GATs)
- How can an associated type have its own lifetime parameters?
- What is the difference between
type Item;andtype Item<'a>;? - Book Reference: โIdiomatic Rustโ Ch. 5 - โAdvanced Traitsโ
- Async Desugaring
- What does an
async fnlook like to the compiler? - How is the state machine struct generated?
- Book Reference: โRust for Rustaceansโ Ch. 8 - โAsynchronous Programmingโ
- What does an
- Higher-Ranked Trait Bounds (HRTBs)
- What does
for<'a>mean in a trait bound? - When do you need it vs a regular lifetime parameter?
- Book Reference: โProgramming Rustโ Ch. 11 - โTraits and Genericsโ
- What does
- Object Safety
- Why can some traits be used with
dyn Traitand others cannot? - What makes a trait โobject safeโ?
- Book Reference: โRust for Rustaceansโ Ch. 3 - โDesigning Interfacesโ
- Why can some traits be used with
Solution Architecture
The Trait Definition with GAT
Your trait should look something like this:
pub trait AsyncService {
/// The future type returned by process().
///
/// Key points:
/// - It has its own lifetime parameter 'a
/// - It must implement Future with the correct Output
/// - It must live at most as long as 'a
/// - The `where Self: 'a` ensures the service outlives the future
type ProcessFut<'a>: Future<Output = String> + 'a
where
Self: 'a;
/// Process input asynchronously.
///
/// The lifetime 'a ties together:
/// - The borrow of self (&'a self)
/// - The borrow of input (&'a str)
/// - The returned future (ProcessFut<'a>)
fn process<'a>(&'a self, input: &'a str) -> Self::ProcessFut<'a>;
}
The Lifetime Bounds Pattern
Understanding why we need where Self: 'a:
// This is the key insight:
type ProcessFut<'a>: Future<Output = String> + 'a
where
Self: 'a; // Self must outlive 'a
// Without this bound, you could write:
let service = MyService::new();
let future = service.process("hello");
drop(service); // Oops! service is gone
future.await; // But future still holds &service -> UB!
// The bound prevents this at compile time
How Impl Blocks Look for Concrete Types
Using impl Future (requires nightly with type_alias_impl_trait):
#![feature(type_alias_impl_trait)]
impl AsyncService for MyService {
type ProcessFut<'a> = impl Future<Output = String> + 'a;
fn process<'a>(&'a self, input: &'a str) -> Self::ProcessFut<'a> {
async move {
format!("{}: {}", self.prefix, input)
}
}
}
Using a concrete named future (stable Rust):
use std::future::Future;
use std::pin::Pin;
use std::task::{Context, Poll};
pub struct ProcessFuture<'a> {
prefix: &'a str,
input: &'a str,
done: bool,
}
impl<'a> Future for ProcessFuture<'a> {
type Output = String;
fn poll(mut self: Pin<&mut Self>, _cx: &mut Context<'_>) -> Poll<Self::Output> {
if self.done {
Poll::Pending // Should not happen
} else {
self.done = true;
Poll::Ready(format!("{}: {}", self.prefix, self.input))
}
}
}
impl AsyncService for MyService {
type ProcessFut<'a> = ProcessFuture<'a>;
fn process<'a>(&'a self, input: &'a str) -> Self::ProcessFut<'a> {
ProcessFuture {
prefix: &self.prefix,
input,
done: false,
}
}
}
The Trade-off: Object Safety vs Zero-Cost
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ THE FUNDAMENTAL TRADE-OFF โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ Object Safety (dyn Trait) vs Zero-Cost Abstraction โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ With async-trait: With GATs: โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Can use dyn Trait โ โ Cannot use dyn Trait โ โ
โ โ Runtime polymorphism โ โ Static dispatch only โ โ
โ โ Heap allocation โ โ Stack allocation โ โ
โ โ Vtable overhead โ โ Zero overhead โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ Choose async-trait when: Choose GATs when: โ
โ - You need trait objects - Maximum performance โ
โ - Plugin systems - Embedded/no_std โ
โ - Heterogeneous collections - Hot paths โ
โ - Rapid development - Known concrete types โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Why GAT-based traits are NOT object safe:
-----------------------------------------
1. Associated types with generics violate object safety rules
2. The compiler cannot create a vtable for generic associated types
3. Each implementor has a different future size/type
trait AsyncService {
type ProcessFut<'a>: Future + 'a where Self: 'a;
// ^^^
// This generic makes the trait NOT object safe
fn process<'a>(&'a self) -> Self::ProcessFut<'a>;
// ^^^
// Generic lifetime on method also affects object safety
}
// This will NOT compile:
fn use_dyn(service: &dyn AsyncService) { }
// ^^^^^^^^^^^^^^^^
// Error: the trait `AsyncService` is not object safe
Phased Implementation Guide
Phase 1: Understand the Problem with Regular Async Trait
Goal: See the compiler errors firsthand
Create a new project and try to write async in a trait:
// Try this and observe the error:
trait Greeter {
async fn greet(&self) -> String; // Won't work!
}
Then try with async-trait:
use async_trait::async_trait;
#[async_trait]
trait Greeter {
async fn greet(&self) -> String;
}
#[async_trait]
impl Greeter for MyGreeter {
async fn greet(&self) -> String {
"Hello!".to_string()
}
}
Use cargo expand to see what async-trait generates:
cargo install cargo-expand
cargo expand
Deliverable: A clear understanding of the generated code with Box::pin.
Phase 2: Define Trait with GAT
Goal: Create the zero-allocation trait definition
use std::future::Future;
pub trait AsyncProcessor {
/// The future type, parameterized by the borrow lifetime
type ProcessFut<'a>: Future<Output = String> + 'a
where
Self: 'a;
/// Optionally: a Send-bound version for use with tokio::spawn
type ProcessFutSend<'a>: Future<Output = String> + Send + 'a
where
Self: 'a + Send;
fn process<'a>(&'a self, input: &'a str) -> Self::ProcessFut<'a>;
fn process_send<'a>(&'a self, input: &'a str) -> Self::ProcessFutSend<'a>
where
Self: Send;
}
Key decisions to make:
- Do you need
Sendbounds? - Do you need
Syncbounds? - Multiple methods or just one?
Phase 3: Implement for Concrete Types
Goal: Create implementations without any boxing
Option A: Using type_alias_impl_trait (nightly)
#![feature(type_alias_impl_trait)]
struct DataProcessor {
prefix: String,
}
impl AsyncProcessor for DataProcessor {
type ProcessFut<'a> = impl Future<Output = String> + 'a;
fn process<'a>(&'a self, input: &'a str) -> Self::ProcessFut<'a> {
async move {
// Simulate some async work
tokio::time::sleep(Duration::from_millis(1)).await;
format!("{}: {}", self.prefix, input)
}
}
}
Option B: Manual future implementation (stable)
use std::pin::Pin;
use std::task::{Context, Poll};
use std::future::Future;
struct DataProcessor {
prefix: String,
}
// Manual future struct
pub struct DataProcessFuture<'a> {
prefix: &'a str,
input: &'a str,
state: ProcessState,
}
enum ProcessState {
Start,
Done,
}
impl<'a> Future for DataProcessFuture<'a> {
type Output = String;
fn poll(mut self: Pin<&mut Self>, _cx: &mut Context<'_>) -> Poll<Self::Output> {
match self.state {
ProcessState::Start => {
self.state = ProcessState::Done;
Poll::Ready(format!("{}: {}", self.prefix, self.input))
}
ProcessState::Done => {
panic!("Future polled after completion")
}
}
}
}
impl AsyncProcessor for DataProcessor {
type ProcessFut<'a> = DataProcessFuture<'a>;
fn process<'a>(&'a self, input: &'a str) -> Self::ProcessFut<'a> {
DataProcessFuture {
prefix: &self.prefix,
input,
state: ProcessState::Start,
}
}
}
Phase 4: Create Benchmark Comparing to async-trait
Goal: Prove the allocation difference with numbers
// benches/allocation_comparison.rs
use criterion::{criterion_group, criterion_main, Criterion, BenchmarkId};
use std::alloc::{GlobalAlloc, Layout, System};
use std::sync::atomic::{AtomicUsize, Ordering};
// Allocation-counting allocator
struct CountingAllocator;
static ALLOCATION_COUNT: AtomicUsize = AtomicUsize::new(0);
static BYTES_ALLOCATED: AtomicUsize = AtomicUsize::new(0);
unsafe impl GlobalAlloc for CountingAllocator {
unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
ALLOCATION_COUNT.fetch_add(1, Ordering::SeqCst);
BYTES_ALLOCATED.fetch_add(layout.size(), Ordering::SeqCst);
System.alloc(layout)
}
unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
System.dealloc(ptr, layout)
}
}
#[global_allocator]
static ALLOCATOR: CountingAllocator = CountingAllocator;
fn reset_counts() {
ALLOCATION_COUNT.store(0, Ordering::SeqCst);
BYTES_ALLOCATED.store(0, Ordering::SeqCst);
}
fn get_counts() -> (usize, usize) {
(
ALLOCATION_COUNT.load(Ordering::SeqCst),
BYTES_ALLOCATED.load(Ordering::SeqCst),
)
}
fn bench_comparison(c: &mut Criterion) {
let mut group = c.benchmark_group("async_trait_comparison");
for n in [100, 1000, 10000].iter() {
group.bench_with_input(BenchmarkId::new("async-trait", n), n, |b, &n| {
let rt = tokio::runtime::Runtime::new().unwrap();
let service = AsyncTraitService::new();
b.iter(|| {
rt.block_on(async {
reset_counts();
for _ in 0..n {
service.process("test").await;
}
get_counts()
})
});
});
group.bench_with_input(BenchmarkId::new("GAT-based", n), n, |b, &n| {
let rt = tokio::runtime::Runtime::new().unwrap();
let service = GatService::new();
b.iter(|| {
rt.block_on(async {
reset_counts();
for _ in 0..n {
service.process("test").await;
}
get_counts()
})
});
});
}
group.finish();
}
criterion_group!(benches, bench_comparison);
criterion_main!(benches);
Phase 5: Explore RPITIT Alternative
Goal: Understand the newest solution (Rust 1.75+)
// Requires Rust 1.75+
trait ModernAsyncService {
fn process(&self, input: &str) -> impl Future<Output = String> + '_;
// With Send bound:
fn process_send(&self, input: &str) -> impl Future<Output = String> + Send + '_
where
Self: Sync;
}
impl ModernAsyncService for MyService {
fn process(&self, input: &str) -> impl Future<Output = String> + '_ {
async move {
format!("{}: {}", self.prefix, input)
}
}
fn process_send(&self, input: &str) -> impl Future<Output = String> + Send + '_
where
Self: Sync,
{
async move {
format!("{}: {}", self.prefix, input)
}
}
}
Compare all three approaches in your benchmark to see which is fastest.
Real World Outcome
A library that allows defining high-performance, zero-allocation async interfaces. Youโll benchmark this against async-trait and show a 0-byte allocation count in the hot path. This demonstrates that GATs enable true zero-cost async abstractions without heap allocation.
Example Build and Benchmark:
$ cargo new --lib boxless-async-trait
Created library `boxless-async-trait` package
$ cd boxless-async-trait
$ cargo add async-trait tokio --features tokio/full
Updating crates.io index
Adding async-trait v0.1.77 to dependencies
Adding tokio v1.35.1 to dependencies.features
$ cargo add criterion --dev --features criterion/async_tokio
Adding criterion v0.5.1 to dev-dependencies
$ cargo bench --bench allocation_comparison
Compiling boxless-async-trait v0.1.0
Finished bench [optimized] target(s) in 4.72s
Running benches/allocation_comparison.rs
====================================================================
Async Trait Performance Comparison
====================================================================
Testing: Process 10,000 async calls
====================================================================
Benchmarking async_trait (boxed)
Warming up for 3.0000 s
Collecting 100 samples in estimated 5.2340 s (2.5M iterations)
async_trait/10k_calls time: [2.0845 us 2.0912 us 2.0987 us]
thrpt: [476.48K elem/s 478.19K elem/s 479.73K elem/s]
Memory Analysis:
Total allocations: 10,000
Bytes allocated: 160,000 (16 bytes per Box)
Allocation rate: 76.66 MB/s
Benchmarking GAT-based (zero-alloc)
Warming up for 3.0000 s
Collecting 100 samples in estimated 5.0123 s (5.1M iterations)
GAT-based/10k_calls time: [982.34 ns 985.67 ns 989.45 ns]
thrpt: [1.0107M elem/s 1.0145M elem/s 1.0179M elem/s]
Memory Analysis:
Total allocations: 0
Bytes allocated: 0
Allocation rate: 0 MB/s
===================================================================
PERFORMANCE SUMMARY:
===================================================================
Metric | async_trait | GAT-based | Improvement
===================================================================
Time per 10k calls | 2.09 us | 985 ns | 2.12x
Throughput | 478K ops/s | 1.01M/s | 2.12x
Allocations | 10,000 | 0 | infinity
Memory allocated | 160 KB | 0 bytes | infinity
CPU cache pressure | HIGH | LOW | Better
===================================================================
$ cargo run --example real_world_usage
Compiling boxless-async-trait v0.1.0
Finished dev [unoptimized + debuginfo] target(s) in 1.82s
Running `target/debug/examples/real_world_usage`
=== Real-World Usage Example ===
[1] Defining trait with GAT-based async method...
trait AsyncService {
type ProcessFut<'a>: Future<Output = String> + 'a
where Self: 'a;
fn process<'a>(&'a self, data: &'a str) -> Self::ProcessFut<'a>;
}
[2] Implementing for concrete type...
struct DataProcessor {
prefix: String,
}
impl AsyncService for DataProcessor {
type ProcessFut<'a> = impl Future<Output = String> + 'a;
fn process<'a>(&'a self, data: &'a str) -> Self::ProcessFut<'a> {
async move { format!("{}: {}", self.prefix, data) }
}
}
[3] Running async operations...
Processing "hello" -> Result: "PREFIX: hello"
Stack allocation at: 0x7ffee4b2c890
Future size: 64 bytes (on stack)
[check] Zero heap allocations
Processing "world" -> Result: "PREFIX: world"
Stack allocation at: 0x7ffee4b2c8d0
[check] Zero heap allocations
[4] Comparison with async-trait...
Processing "hello" with async-trait
Heap allocation at: 0x600002504020
Box size: 16 bytes + Future size: 96 bytes
[warning] 1 heap allocation required
[Summary]
[check] GAT-based async traits enable zero-allocation async
[check] 2.12x faster than async-trait in benchmarks
[check] 100% reduction in heap allocations
[check] Type-safe lifetime management
[check] No vtable indirection (static dispatch)
$ cargo test
Compiling boxless-async-trait v0.1.0
Finished test [unoptimized + debuginfo] target(s) in 1.24s
Running unittests src/lib.rs
running 4 tests
test tests::test_gat_zero_alloc ... ok
test tests::test_lifetime_bounds ... ok
test tests::test_static_dispatch ... ok
test tests::test_vs_async_trait ... ok
test result: ok. 4 passed; 0 failed
Testing Strategy
Allocation Counting Tests
#[cfg(test)]
mod allocation_tests {
use super::*;
use std::alloc::{GlobalAlloc, Layout, System};
use std::sync::atomic::{AtomicUsize, Ordering};
thread_local! {
static ALLOC_COUNT: AtomicUsize = AtomicUsize::new(0);
}
fn with_alloc_count<F, R>(f: F) -> (R, usize)
where
F: FnOnce() -> R,
{
ALLOC_COUNT.with(|c| c.store(0, Ordering::SeqCst));
let result = f();
let count = ALLOC_COUNT.with(|c| c.load(Ordering::SeqCst));
(result, count)
}
#[test]
fn test_zero_allocations() {
let service = GatProcessor::new("test".to_string());
let rt = tokio::runtime::Runtime::new().unwrap();
let (_, allocs) = rt.block_on(async {
with_alloc_count(|| {
// This should not allocate
let future = service.process("input");
futures::executor::block_on(future)
})
});
assert_eq!(allocs, 0, "GAT-based async should not allocate");
}
#[test]
fn test_async_trait_does_allocate() {
let service = BoxedProcessor::new("test".to_string());
let rt = tokio::runtime::Runtime::new().unwrap();
let (_, allocs) = rt.block_on(async {
with_alloc_count(|| {
let future = service.process("input");
futures::executor::block_on(future)
})
});
assert!(allocs > 0, "async-trait should allocate");
}
}
Benchmark Tests
#[cfg(test)]
mod benchmark_tests {
use super::*;
use std::time::Instant;
#[tokio::test]
async fn test_gat_performance() {
let service = GatProcessor::new("prefix".to_string());
let iterations = 100_000;
let start = Instant::now();
for _ in 0..iterations {
let _ = service.process("test").await;
}
let gat_duration = start.elapsed();
let boxed_service = BoxedProcessor::new("prefix".to_string());
let start = Instant::now();
for _ in 0..iterations {
let _ = boxed_service.process("test").await;
}
let boxed_duration = start.elapsed();
println!("GAT: {:?}", gat_duration);
println!("Boxed: {:?}", boxed_duration);
// GAT should be at least 1.5x faster
assert!(gat_duration.as_nanos() * 3 < boxed_duration.as_nanos() * 2);
}
}
Lifetime Verification Tests
#[cfg(test)]
mod lifetime_tests {
use super::*;
#[test]
fn test_future_lifetime_tied_to_service() {
let service = GatProcessor::new("test".to_string());
// This should compile: future borrows service
let future = service.process("hello");
// The future holds a reference to service, so service
// cannot be dropped while future exists
// (This is enforced by the compiler, not a runtime check)
let result = futures::executor::block_on(future);
assert_eq!(result, "test: hello");
}
#[test]
fn test_multiple_concurrent_futures() {
let service = GatProcessor::new("test".to_string());
let rt = tokio::runtime::Runtime::new().unwrap();
rt.block_on(async {
let f1 = service.process("one");
let f2 = service.process("two");
let f3 = service.process("three");
let (r1, r2, r3) = futures::join!(f1, f2, f3);
assert_eq!(r1, "test: one");
assert_eq!(r2, "test: two");
assert_eq!(r3, "test: three");
});
}
}
Common Pitfalls and Debugging
Pitfall 1: Forgetting the where Self: โa Bound
// WRONG: Missing the crucial bound
trait BadTrait {
type Fut<'a>: Future<Output = ()> + 'a;
// ^^^
// Without `where Self: 'a`, the compiler cannot prove safety
fn go<'a>(&'a self) -> Self::Fut<'a>;
}
// CORRECT:
trait GoodTrait {
type Fut<'a>: Future<Output = ()> + 'a
where
Self: 'a; // Essential!
fn go<'a>(&'a self) -> Self::Fut<'a>;
}
Pitfall 2: Trying to Use dyn Trait
// This will NOT compile:
fn accept_any(service: &dyn AsyncService) {
// ^^^^^^^^^^^^^^^^
// Error: the trait `AsyncService` is not object safe
}
// Solution: Use generics instead
fn accept_any<S: AsyncService>(service: &S) {
// This works with static dispatch
}
Pitfall 3: Incorrect Lifetime Annotations
// WRONG: Lifetimes don't match
impl AsyncService for MyService {
type ProcessFut<'a> = impl Future<Output = String> + 'a;
fn process<'a>(&'a self, input: &str) -> Self::ProcessFut<'a> {
// ^^^^
// Error: input has different lifetime than 'a
async move {
format!("{}", input) // input might not live long enough
}
}
}
// CORRECT: All lifetimes aligned
impl AsyncService for MyService {
type ProcessFut<'a> = impl Future<Output = String> + 'a;
fn process<'a>(&'a self, input: &'a str) -> Self::ProcessFut<'a> {
// ^^^^
// Now input shares lifetime 'a with self
async move {
format!("{}", input) // Safe!
}
}
}
Pitfall 4: Send Bounds Complications
// Problem: Your future needs to be Send for tokio::spawn
trait NeedsSend {
type Fut<'a>: Future<Output = ()> + Send + 'a
// ^^^^
// This requires ALL captured data to be Send
where
Self: 'a;
fn go<'a>(&'a self) -> Self::Fut<'a>;
}
// But if your service contains non-Send data:
struct MyService {
data: Rc<String>, // Rc is NOT Send!
}
// This won't compile because &Rc<String> is not Send
// Solution 1: Use Arc instead of Rc
struct MyServiceSend {
data: Arc<String>, // Arc IS Send
}
// Solution 2: Have separate Send and non-Send variants
trait FlexibleService {
type Fut<'a>: Future<Output = ()> + 'a
where Self: 'a;
type FutSend<'a>: Future<Output = ()> + Send + 'a
where Self: 'a + Sync; // Require Sync for Send future
fn go<'a>(&'a self) -> Self::Fut<'a>;
fn go_send<'a>(&'a self) -> Self::FutSend<'a> where Self: Sync;
}
Pitfall 5: type_alias_impl_trait Confusion
// This requires nightly with #![feature(type_alias_impl_trait)]
type ProcessFut<'a> = impl Future<Output = String> + 'a;
// Common error: Using it in the wrong place
// WRONG: Defining at module level without proper context
type MyFut = impl Future<Output = ()>; // Error!
// CORRECT: Define in impl block context
impl AsyncService for MyService {
type ProcessFut<'a> = impl Future<Output = String> + 'a;
// The compiler infers the concrete type from the method body
fn process<'a>(&'a self) -> Self::ProcessFut<'a> {
async move { /* ... */ } // This defines the concrete type
}
}
Pitfall 6: Borrowing Across Await Points
// Problem: Borrowing something that doesn't live long enough
impl AsyncService for MyService {
type Fut<'a> = impl Future<Output = String> + 'a;
fn process<'a>(&'a self) -> Self::Fut<'a> {
async move {
let temp = self.create_temp(); // Returns a String
some_async_operation().await; // await point!
// ^^^^^
// If temp is borrowed across this await, it must live long enough
temp // This is fine because we own temp
}
}
}
// Problematic case:
fn bad_process<'a>(&'a self) -> impl Future<Output = String> + 'a {
async move {
let temp = String::from("hello");
let temp_ref = &temp; // Borrow of local
some_async_operation().await; // await point
temp_ref.to_string() // ERROR: temp_ref across await
// ^^^^^^^^^
// The reference doesn't live long enough
}
}
Questions to Guide Your Design
- Lifetime Elision
- How do you capture the lifetime of
&selfin the returned future? - What happens when the method takes multiple references with different lifetimes?
- How do you capture the lifetime of
- Trait Objects
- Why does this approach make the trait no longer โObject Safeโ?
- Can you use
dyn MyAsyncServicewith this GAT approach? - What alternatives exist when you need runtime polymorphism?
- Send and Sync
- When does your future need to be
Send? - How do you handle
!Sendtypes captured in the async block?
- When does your future need to be
Thinking Exercise
Desugaring the Sugar
Take a standard async fn:
async fn hello(s: &str) -> usize { s.len() }
Now, try to write the same thing without the async keyword:
fn hello(s: &str) -> impl Future<Output = usize> + '_ {
// How do you implement this?
}
Notice the lifetime issues when the input s is used inside the future. The '_ lifetime (elided) is tied to s. Now consider: How does a GAT solve the โnamed return typeโ problem?
Key insight: GATs allow the trait to express โthe future type varies based on the lifetime of the borrowโ - something that was impossible to express before.
The Interview Questions Theyโll Ask
-
โWhy canโt we have
async fnin traits natively (before Rust 1.75)?โAnswer: Because async functions return anonymous types whose size varies per implementation. Traits need to specify return types that work for all implementors, but you canโt name the anonymous future type. Boxing (as async-trait does) provides a uniform size (one pointer), while GATs allow each impl to specify its own concrete type.
-
โWhat is a GAT and how does it solve the lifetime-in-traits problem?โ
Answer: A Generic Associated Type is an associated type that has its own generic parameters (lifetimes or types). For async traits, it solves the problem by allowing the future type to be parameterized by the lifetime of the borrow:
type Fut<'a>: Future + 'a where Self: 'a. This lets each implementor provide a different concrete future type while maintaining lifetime safety. -
โWhat are the performance implications of using
#[async_trait]?โAnswer: async-trait causes a heap allocation (via Box) for every async method call. This has several costs: the allocation itself (~20-50ns), potential cache misses when accessing the boxed future, inability to inline across the vtable dispatch, and increased memory fragmentation. For hot paths called millions of times, this overhead is significant.
-
โHow does the compiler determine the size of an async future?โ
Answer: The compiler analyzes all variables that live across await points (variables that are used after an await but were created before it). These become fields in the generated state machine struct. The struct also contains an enum for the current state. The total size is the sum of all these fields plus alignment padding. Different async functions produce different sized futures.
Extensions and Challenges
Extension 1: Multi-Method Trait with Different Future Types
trait ComplexService {
type FetchFut<'a>: Future<Output = Vec<u8>> + 'a where Self: 'a;
type ProcessFut<'a>: Future<Output = String> + 'a where Self: 'a;
type StoreFut<'a>: Future<Output = ()> + 'a where Self: 'a;
fn fetch<'a>(&'a self, url: &'a str) -> Self::FetchFut<'a>;
fn process<'a>(&'a self, data: &'a [u8]) -> Self::ProcessFut<'a>;
fn store<'a>(&'a self, key: &'a str, value: &'a str) -> Self::StoreFut<'a>;
}
Extension 2: Combine with Towerโs Service Trait Pattern
// Tower-style service with GAT
trait GatService<Request> {
type Response;
type Error;
type Future<'a>: Future<Output = Result<Self::Response, Self::Error>> + 'a
where
Self: 'a;
fn call<'a>(&'a self, req: Request) -> Self::Future<'a>;
}
// Implement middleware that wraps any GatService
struct Logging<S> {
inner: S,
}
impl<S, R> GatService<R> for Logging<S>
where
S: GatService<R>,
R: std::fmt::Debug,
{
type Response = S::Response;
type Error = S::Error;
type Future<'a> = impl Future<Output = Result<S::Response, S::Error>> + 'a
where
Self: 'a;
fn call<'a>(&'a self, req: R) -> Self::Future<'a> {
async move {
println!("Request: {:?}", req);
let result = self.inner.call(req).await;
println!("Response received");
result
}
}
}
Extension 3: Stream-based GAT Trait
use futures::Stream;
trait AsyncIterator {
type Item;
type NextFut<'a>: Future<Output = Option<Self::Item>> + 'a
where
Self: 'a;
fn next<'a>(&'a mut self) -> Self::NextFut<'a>;
}
// Convert to a Stream
struct GatStream<I: AsyncIterator> {
iter: I,
}
impl<I: AsyncIterator> Stream for GatStream<I> {
type Item = I::Item;
fn poll_next(
self: Pin<&mut Self>,
cx: &mut Context<'_>
) -> Poll<Option<Self::Item>> {
// Implementation would require pinning the future
todo!()
}
}
Real-World Connections
Tower Service Trait
The Tower crate (used by Axum, Tonic, etc.) uses a polling-based approach:
// Tower's approach (simplified)
pub trait Service<Request> {
type Response;
type Error;
type Future: Future<Output = Result<Self::Response, Self::Error>>;
fn poll_ready(&mut self, cx: &mut Context<'_>) -> Poll<Result<(), Self::Error>>;
fn call(&mut self, req: Request) -> Self::Future;
}
Tower uses &mut self (not &self) to manage backpressure, which changes the lifetime requirements. This is a design trade-off worth studying.
Axum Handlers
Axumโs handler traits internally use complex type machinery to avoid boxing where possible:
// Axum uses FromRequest extractors with associated futures
pub trait FromRequest<S>: Sized {
type Rejection: IntoResponse;
fn from_request(
req: Request,
state: &S
) -> impl Future<Output = Result<Self, Self::Rejection>> + Send;
}
Database Drivers
Many async database drivers (like sqlx) must balance ergonomics with performance:
// sqlx uses RPITIT in newer versions
pub trait Executor<'c>: Send {
fn execute<'e, 'q: 'e, E>(
self,
query: E
) -> impl Future<Output = Result<QueryResult, Error>> + Send + 'e
where
'c: 'e,
E: Execute<'q, Self::Database>;
}
Hints in Layers
Hint 1: The GAT definition
Start by defining the associated type with a lifetime:
type Fut<'a>: Future<Output = ()> + 'a where Self: 'a;
Hint 2: Implementation
In the implementation, youโll need to use impl Future or a concrete type. Since you canโt use impl Future in associated types easily yet, you might need to use a crate like real-async-trait for inspiration or use Box only during the development phase to see where it hurts.
Hint 3: Capturing Lifetimes
The where Self: 'a bound is crucial. It tells the compiler that the future canโt outlive the service itself.
Hint 4: Testing Zero-Cost
Use a custom allocator that counts allocations to verify your implementation truly allocates nothing in the async path.
Hint 5: Stable Rust Alternative
If you canโt use nightly, implement the Future trait manually for a named struct instead of relying on impl Future.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| GAT Mastery | โIdiomatic Rustโ | Ch. 5 - Advanced Traits |
| Async Internals | โRust for Rustaceansโ | Ch. 8 - Asynchronous Programming |
| Dispatch Performance | โEffective Rustโ | Item 12 - Prefer Generics to Trait Objects |
| Trait Design | โProgramming Rustโ | Ch. 11 - Traits and Generics |
| Future Implementation | โRust for Rustaceansโ | Ch. 8 - Building Your Own Futures |
Summary
This project teaches you one of the most advanced patterns in the Rust async ecosystem. By building a box-less async trait system, you will:
- Understand why async in traits was historically problematic
- Master GATs and their role in expressing complex lifetime relationships
- Learn to reason about memory allocation in async code
- Build benchmarks that prove performance differences
- Understand the trade-off between object safety and zero-cost abstractions
The skills learned here directly apply to understanding and contributing to major async crates like Tower, Axum, and Tokio.
Whatโs Next?
After completing this project, consider:
- Project 8: Building a Custom Runtime - Understand the executor side of async
- Project 1: Manual Pin Projector - Deep dive into why futures need pinning
- Exploring the Tower crateโs service pattern in depth
- Reading the async-trait source code to understand its macro implementation