PetraCache: Building a Memcached-Compatible Cache with RocksDB
The Problem
Memcached is fast. Really fast. But when it restarts, your cache is gone. Cold cache means every request hits your database until the cache warms up again. At scale, this can take down your entire system.
I wanted to explore a different approach: what if we could add persistence to memcached without changing the protocol?
Why memcached Protocol?
mcrouter is Meta’s memcached router—5 billion requests per second in production. Consistent hashing, failover, replication, connection pooling. All battle-tested at massive scale.
But mcrouter only speaks memcached protocol. To leverage it, your backend needs to be memcached-compatible.
That’s the gap PetraCache fills: memcached protocol + persistent storage. Drop it behind mcrouter and you get distributed caching with durability—without reinventing the routing layer.
Enter PetraCache
PetraCache is a memcached-compatible server backed by RocksDB. That’s it.
┌──────────────┐ ┌───────────┐ ┌─────────────────────────┐
│ Your App │────▶│ mcrouter │────▶│ PetraCache │
│ (memcache │ │ (routing, │ │ ├─ memcached protocol │
│ client) │ │ failover)│ │ ├─ RocksDB storage │
└──────────────┘ └───────────┘ │ └─ Data survives │
│ restarts │
└─────────────────────────┘
Your app thinks it’s talking to memcached. mcrouter handles routing. PetraCache handles storage. Everyone does one job well.
Technical Decisions
Why RocksDB?
RocksDB is an LSM-tree storage engine, optimized for write-heavy workloads. It’s battle-tested at Meta, Netflix, and countless other companies.
For a cache workload:
- Block cache keeps hot data in memory (as fast as memcached)
- SST files persist everything to disk (survives restarts)
- Compaction cleans up deleted/expired keys in background
- Compression (LZ4) reduces disk usage with minimal CPU overhead
This gives in-memory speed for hot data and persistence for everything else.
The WAL Trade-off
RocksDB’s Write-Ahead Log (WAL) ensures durability: writes go to WAL first, then memtable. If the process crashes, WAL replays uncommitted writes.
I disabled it.
let mut write_opts = WriteOptions::default();
write_opts.disable_wal(true);
Why? This is a cache. If we lose the last second of writes during a crash, the app will re-populate from the source of truth. The durability guarantee isn’t worth the write latency cost.
Result: writes go directly to memtable (RAM), flushed to disk asynchronously. Much faster.
TTL Expiration: Two Strategies
memcached supports TTL (time-to-live) on keys. PetraCache implements expiration two ways:
1. Lazy expiration (on read)
pub fn get(&self, key: &[u8]) -> Result<Option<StoredValue>> {
match self.db.get(key)? {
Some(bytes) => {
let value = StoredValue::decode(&bytes)?;
if value.is_expired() {
self.db.delete(key)?;
Ok(None) // Pretend it doesn't exist
} else {
Ok(Some(value))
}
}
None => Ok(None),
}
}
When you GET an expired key, we delete it and return nothing. Simple.
2. Compaction filter (background cleanup)
fn ttl_compaction_filter(_level: u32, _key: &[u8], value: &[u8]) -> CompactionDecision {
if value.len() >= 8 {
let expire_at = u64::from_le_bytes(value[0..8].try_into().unwrap());
if expire_at != 0 && current_timestamp() >= expire_at {
return CompactionDecision::Remove;
}
}
CompactionDecision::Keep
}
During RocksDB compaction, we check each key’s expiration. Expired keys are dropped, reclaiming disk space without explicit deletes.
Value Format
Each value stored in RocksDB:
[8 bytes: expire_at][4 bytes: flags][N bytes: data]
- expire_at first: compaction filter can check TTL without decoding data
- Little-endian: matches the CPU, no conversion needed
- Fixed header: O(1) access to metadata
Zero-Copy Parsing
The memcached protocol is text-based:
set mykey 0 3600 5\r\n
hello\r\n
Parsing this without allocations:
pub enum Command<'a> {
Get { keys: Vec<Cow<'a, [u8]>> },
Set { key: Cow<'a, [u8]>, data: Cow<'a, [u8]>, ... },
// ...
}
Cow<'a, [u8]> (Copy-on-Write) borrows from the input buffer when possible, only allocating when necessary. For most requests, zero heap allocations during parsing.
Why Not spawn_blocking for RocksDB?
RocksDB operations can block (disk I/O). Tokio’s golden rule: don’t block the async runtime.
I broke the rule intentionally.
// We do this (direct call)
let value = storage.get(key)?;
// Instead of this
let value = tokio::task::spawn_blocking(move || storage.get(key)).await?;
Why? Block cache hits are ~100 nanoseconds. spawn_blocking overhead is ~5-10 microseconds. For a cache with 95%+ hit ratio, the overhead exceeds the benefit.
If your working set exceeds block cache (lots of disk reads), reconsider this.
Performance
Single instance, Apple Silicon, 1KB values, 50% GET / 50% SET:
$ memtier_benchmark -s 127.0.0.1 -p 11211 --protocol=memcache_text \
--clients=10 --threads=2 --test-time=30 --ratio=1:1 --data-size=1000
Type Ops/sec p50 Latency p99 Latency p99.9 Latency
--------------------------------------------------------------------
Sets 68504.04 0.14ms 0.37ms 0.49ms
Gets 68503.77 0.14ms 0.33ms 0.44ms
Totals 137007.81 0.14ms 0.35ms 0.47ms
137K ops/sec with sub-millisecond latency. Good enough for most use cases.
Scale horizontally with mcrouter: add more PetraCache instances, mcrouter distributes keys via consistent hashing.
mcrouter Configuration Example
Here’s an example setup: Istanbul as primary, Ankara as async replica. All reads go to Istanbul. Writes go to Istanbul first (sync), then replicate to Ankara (async).
{
"pools": {
"istanbul": {
"servers": [
"istanbul-petracache-1:11211",
"istanbul-petracache-2:11211"
]
},
"ankara": {
"servers": [
"ankara-petracache-1:11211",
"ankara-petracache-2:11211"
]
}
},
"route": {
"type": "OperationSelectorRoute",
"default_policy": {
"type": "FailoverRoute",
"children": [
{ "type": "PoolRoute", "pool": "istanbul" },
{ "type": "PoolRoute", "pool": "ankara" }
]
},
"operation_policies": {
"set": {
"type": "AllInitialRoute",
"children": [
{ "type": "PoolRoute", "pool": "istanbul" },
{ "type": "PoolRoute", "pool": "ankara" }
]
},
"delete": {
"type": "AllInitialRoute",
"children": [
{ "type": "PoolRoute", "pool": "istanbul" },
{ "type": "PoolRoute", "pool": "ankara" }
]
}
}
}
}
What this does:
- GET: Routes to Istanbul, fails over to Ankara if Istanbul is down
- SET: Writes to Istanbul (sync), then Ankara (async)
- DELETE: Same as SET—Istanbul first, Ankara async
Two key route types here:
FailoverRoute: Tries Istanbul first. If it fails (timeout, connection refused), automatically retries on Ankara. No manual intervention needed.AllInitialRoute: Waits for the first child (Istanbul) to respond, then fires off the rest (Ankara) without waiting. Your app sees Istanbul latency, Ankara gets eventual consistency.
Istanbul down? GETs automatically fail over to Ankara. When Istanbul recovers, it starts serving again. Zero config changes needed.
What’s Missing
PetraCache is alpha software. Not implemented yet:
add,replace(conditional writes)incr,decr(atomic counters)cas(compare-and-swap)stats(server statistics)flush_all(clear all keys)
For a cache that just needs GET/SET/DELETE, it works today.
What I Learned
Building PetraCache was an exercise in learning by integration. Instead of building everything from scratch, I focused on understanding how proven components work and how to connect them.
- mcrouter: Learned how Meta handles distributed caching at 5B req/sec—consistent hashing, failover strategies, connection pooling
- RocksDB: Dove deep into LSM-trees, compaction filters, write amplification, block cache tuning
- memcached protocol: Implemented the text protocol from scratch, learned zero-copy parsing in Rust
The result is ~2,000 lines of Rust that glue these components together. Not production-hardened yet, but a working prototype that taught me more than any tutorial could.
Sometimes the best way to learn a technology is to build something that depends on it.
Lessons Learned
1. Solve one problem
I needed persistence. That’s it. mcrouter already solved distribution. I didn’t build a distributed cache—I built a storage backend.
2. Measure before optimizing
I assumed RocksDB blocking calls would be a problem. Benchmarks showed they weren’t. spawn_blocking would have added latency for no benefit.
3. Simple beats clever
- No custom storage format—RocksDB handles it
- No custom network protocol—memcached ASCII works
- No custom distribution—mcrouter handles it
The best code is code you don’t write.
4. Trade-offs are features
Disabling WAL isn’t a bug. It’s a deliberate choice: cache semantics don’t require durability. Document the trade-off and move on.
5. Existing solutions are underrated
Before writing code, ask: “Has someone already solved this?” Usually, yes. Your job is to find it and integrate it well.
Try It
git clone https://github.com/umit/petracache
cd petracache
cargo build --release
./target/release/petracache config.toml
Or with Docker (coming soon).
PetraCache is open source under MIT license. Contributions welcome.
Petra (πέτρα) means “rock” in Greek—a nod to RocksDB.