PetraCache: Building a Memcached-Compatible Cache with RocksDB
The Problem
Memcached is fast. Really fast. But when it restarts, your cache is gone. Cold cache means every request hits your database until the cache warms up again. At scale, this can take down your entire system.
I wanted to explore a different approach: what if we could add persistence to memcached without changing the protocol?
Why memcached Protocol?
mcrouter is Meta’s memcached router—5 billion requests per second in production. Consistent hashing, failover, replication, connection pooling. All battle-tested at massive scale.
But mcrouter only speaks memcached protocol. To leverage it, your backend needs to be memcached-compatible.
That’s the gap PetraCache fills: memcached protocol + persistent storage. Drop it behind mcrouter and you get distributed caching with durability—without reinventing the routing layer.
Enter PetraCache
PetraCache is a memcached-compatible server backed by RocksDB. That’s it.
┌──────────────┐ ┌───────────┐ ┌─────────────────────────┐
│ Your App │────▶│ mcrouter │────▶│ PetraCache │
│ (memcache │ │ (routing, │ │ ├─ memcached protocol │
│ client) │ │ failover)│ │ ├─ RocksDB storage │
└──────────────┘ └───────────┘ │ └─ Data survives │
│ restarts │
└─────────────────────────┘
Your app thinks it’s talking to memcached. mcrouter handles routing. PetraCache handles storage. Everyone does one job well.
Technical Decisions
Why RocksDB?
RocksDB is purpose-built for write-heavy workloads like caching:
- LSM-tree architecture: Sequential writes to disk. Writes go to memory first (memtable), flush to disk in batches. Perfect for cache SET operations.
- Block cache: Configurable LRU cache keeps hot data in memory. Cache hit? Sub-microsecond response, competitive with pure in-memory stores.
- Compaction filters: TTL expiration happens during background compaction—no separate cleanup jobs needed.
- Compression per level: LZ4 for upper levels (speed), ZSTD for bottom levels (ratio). Tunable per use case.
- Battle-tested at scale: Powers Meta’s distributed systems, Netflix’s data infrastructure, CockroachDB, TiKV. Billions of operations per second across the industry.
- Tunable for any workload: 100+ configuration options. Prioritize write throughput, read latency, or memory efficiency—it’s all configurable.
RocksDB gives you in-memory speed for hot data and persistence for everything else—with 10+ years of production hardening behind it.
TTL Expiration: Two Strategies
memcached supports TTL (time-to-live) on keys. PetraCache implements expiration two ways:
1. Lazy expiration (on read)
pub fn get(&self, key: &[u8]) -> Result<Option<StoredValue>> {
match self.db.get(key)? {
Some(bytes) => {
let value = StoredValue::decode(&bytes)?;
if value.is_expired() {
self.db.delete(key)?;
Ok(None) // Pretend it doesn't exist
} else {
Ok(Some(value))
}
}
None => Ok(None),
}
}
When you GET an expired key, we delete it and return nothing. Simple.
2. Compaction filter (background cleanup)
fn ttl_compaction_filter(_level: u32, _key: &[u8], value: &[u8]) -> CompactionDecision {
if value.len() >= 8 {
let expire_at = u64::from_le_bytes(value[0..8].try_into().unwrap());
if expire_at != 0 && current_timestamp() >= expire_at {
return CompactionDecision::Remove;
}
}
CompactionDecision::Keep
}
During RocksDB compaction, we check each key’s expiration. Expired keys are dropped, reclaiming disk space without explicit deletes.
Value Format
Each value stored in RocksDB:
[8 bytes: expire_at][4 bytes: flags][N bytes: data]
- expire_at first: compaction filter can check TTL without decoding data
- Little-endian: matches the CPU, no conversion needed
- Fixed header: O(1) access to metadata
Zero-Copy Parsing
The memcached protocol is text-based:
set mykey 0 3600 5\r\n
hello\r\n
Parsing this without allocations:
pub enum Command<'a> {
Get { keys: Vec<Cow<'a, [u8]>> },
Set { key: Cow<'a, [u8]>, data: Cow<'a, [u8]>, ... },
// ...
}
Cow<'a, [u8]> (Copy-on-Write) borrows from the input buffer when possible, only allocating when necessary. For most requests, zero heap allocations during parsing.
Why Not spawn_blocking for RocksDB?
RocksDB operations can block (disk I/O). Tokio’s golden rule: don’t block the async runtime.
I broke the rule intentionally.
// We do this (direct call)
let value = storage.get(key)?;
// Instead of this
let value = tokio::task::spawn_blocking(move || storage.get(key)).await?;
Why? Block cache hits are ~100 nanoseconds. spawn_blocking overhead is ~5-10 microseconds. For a cache with 95%+ hit ratio, the overhead exceeds the benefit.
If your working set exceeds block cache (lots of disk reads), reconsider this.
Performance
Single instance, Apple Silicon, 1KB values, 50% GET / 50% SET:
$ memtier_benchmark -s 127.0.0.1 -p 11211 --protocol=memcache_text \
--clients=10 --threads=2 --test-time=30 --ratio=1:1 --data-size=1000
Type Ops/sec p50 Latency p99 Latency p99.9 Latency
--------------------------------------------------------------------
Sets 68504.04 0.14ms 0.37ms 0.49ms
Gets 68503.77 0.14ms 0.33ms 0.44ms
Totals 137007.81 0.14ms 0.35ms 0.47ms
137K ops/sec with sub-millisecond latency. Good enough for most use cases.
Scale horizontally with mcrouter: add more PetraCache instances, mcrouter distributes keys via consistent hashing.
mcrouter Configuration Example
Here’s an example setup: Istanbul as primary, Ankara as async replica. All reads go to Istanbul. Writes go to Istanbul first (sync), then replicate to Ankara (async).
{
"pools": {
"istanbul": {
"servers": [
"istanbul-petracache-1:11211",
"istanbul-petracache-2:11211"
]
},
"ankara": {
"servers": [
"ankara-petracache-1:11211",
"ankara-petracache-2:11211"
]
}
},
"route": {
"type": "OperationSelectorRoute",
"default_policy": {
"type": "FailoverRoute",
"children": [
{ "type": "PoolRoute", "pool": "istanbul" },
{ "type": "PoolRoute", "pool": "ankara" }
]
},
"operation_policies": {
"set": {
"type": "AllInitialRoute",
"children": [
{ "type": "PoolRoute", "pool": "istanbul" },
{ "type": "PoolRoute", "pool": "ankara" }
]
},
"delete": {
"type": "AllInitialRoute",
"children": [
{ "type": "PoolRoute", "pool": "istanbul" },
{ "type": "PoolRoute", "pool": "ankara" }
]
}
}
}
}
What this does:
- GET: Routes to Istanbul, fails over to Ankara if Istanbul is down
- SET: Writes to Istanbul (sync), then Ankara (async)
- DELETE: Same as SET—Istanbul first, Ankara async
Two key route types here:
FailoverRoute: Tries Istanbul first. If it fails (timeout, connection refused), automatically retries on Ankara. No manual intervention needed.AllInitialRoute: Waits for the first child (Istanbul) to respond, then fires off the rest (Ankara) without waiting. Your app sees Istanbul latency, Ankara gets eventual consistency.
Istanbul down? GETs automatically fail over to Ankara. When Istanbul recovers, it starts serving again. Zero config changes needed.
Roadmap
PetraCache covers the core operations. Next up:
add,replace(conditional writes)incr,decr(atomic counters)cas(compare-and-swap)stats(server statistics)flush_all(clear all keys)
For most cache workloads—GET/SET/DELETE—it’s ready to go.
The Philosophy
Building PetraCache reinforced a core belief: the best engineering is often assembly, not invention.
- mcrouter handles distribution—Meta’s 10 years of battle-testing at 5B req/sec
- RocksDB handles storage—proven across thousands of deployments at Netflix, Meta, and beyond
- memcached protocol handles compatibility—20 years of client library ecosystem
The result is ~2,000 lines of Rust that connect these proven components into something new. No need to reinvent consistent hashing. No need to build a custom storage engine. No need to design a new protocol.
You don’t need to reinvent the wheel to build a great car. Sometimes the engineering is in knowing which wheels to pick and how to connect them.
Lessons Learned
1. Solve one problem
I needed persistence. That’s it. mcrouter already solved distribution. I didn’t build a distributed cache—I built a storage backend.
2. Measure before optimizing
I assumed RocksDB blocking calls would be a problem. Benchmarks showed they weren’t. spawn_blocking would have added latency for no benefit.
3. Simple beats clever
- No custom storage format—RocksDB handles it
- No custom network protocol—memcached ASCII works
- No custom distribution—mcrouter handles it
The best code is code you don’t write.
4. Trade-offs are features
Every architectural decision is a trade-off. Document them clearly and move on.
5. Existing solutions are underrated
Before writing code, ask: “Has someone already solved this?” Usually, yes. Your job is to find it and integrate it well.
Try It
git clone https://github.com/umit/petracache
cd petracache
cargo build --release
./target/release/petracache config.toml
Or with Docker (coming soon).
PetraCache is open source under MIT license. Contributions welcome.
Petra (πέτρα) means “rock” in Greek—a nod to RocksDB.