Contents
In brief
The same async service on Rust holds hundreds of thousands of connections with predictable RSS; on Python you shard workers because memory grows faster than request count. A Habr deep-dive compares where suspended task state lives — stack vs heap — across Tokio, asyncio, .NET, and V8.
What happened
Take a typical handler: accept, read, query the DB, respond. In every language async fn lowers to a state machine: each await is a state variant; locals that survive suspension must be stored somewhere — either on the caller’s stack or in a heap allocation.
Rust’s cargo expand shows enums like HandleState with Start, AwaitingDb, AwaitingFetch. Size is static; the future sits on the stack unless you Box::pin. An empty timer coroutine is tens of bytes; a handler with buffers is kilobytes.
CPython wraps each async def in a coroutine object plus a frame on the heap under the GC. Benchmarking 100,000 empty coroutines on asyncio.sleep lands around 45–50 MB before any real payload.
C# keeps IAsyncStateMachine on the stack until the first real suspension, then boxes to the heap — one allocation per task. JavaScript (V8) stopped spawning extra promises per await after 2018, but there is still no compact stack state machine: in-flight work is promises and reactions on the heap.
| Runtime | 100k empty coroutines | Per task |
|---|---|---|
| Rust / Tokio | hundreds of KB – few MB | hundreds of bytes |
| Python / asyncio | ~45–50 MB | ~0.5–1 KB |
Why it matters
The myth “a sleeping coroutine weighs nothing” breaks at scale. A paused task costs its fattest state frame plus runtime wrapper overhead you cannot drop. Ten thousand idle connections is measurable megabytes, not noise.
For architecture, high-concurrency I/O is not only “Rust is faster on CPU.” It is linear RSS growth with concurrent awaits. Interpreter and managed runtimes hit memory and allocators orders of magnitude sooner.
In Rust, “drop heavy values before await” shrinks the enum. In Python you pay for both frame data and the coroutine object.
In practice
- Measure suspension cost, not call cost: bytes alive across each
await. - In Rust, inspect async block sizes before release; treat
Box::pinas explicit. - In Python, cap concurrency with a semaphore or pool — avoid millions of idle coroutines.
- In .NET, use ValueTask on often-synchronous paths to reduce state-machine boxing.
- In Node.js, under load you often hit GC and in-flight promises before CPU — use heap snapshots.
- When sizing connection limits, budget memory per task for your runtime, not only FDs.
Takeaway
Async is sold as “write sync, get concurrency free.” Only the syntax is free — pauses bill bytes per sleeping task. Rust charges at compile time; Python and managed runtimes hide the bill in the heap until RSS graphs spike. When an async service “eats memory for no reason,” look at what survives each await first.