PetraCache: Building a Memcached-Compatible Cache with RocksDB

Posted on Feb 3, 2026

The Problem

Memcached is fast. Really fast. But when it restarts, your cache is gone. Cold cache means every request hits your database until the cache warms up again. At scale, this can take down your entire system.

I wanted to explore a different approach: what if we could add persistence to memcached without changing the protocol?

Why memcached Protocol?

mcrouter is Meta’s memcached router—5 billion requests per second in production. Consistent hashing, failover, replication, connection pooling. All battle-tested at massive scale.

But mcrouter only speaks memcached protocol. To leverage it, your backend needs to be memcached-compatible.

That’s the gap PetraCache fills: memcached protocol + persistent storage. Drop it behind mcrouter and you get distributed caching with durability—without reinventing the routing layer.

Enter PetraCache

PetraCache is a memcached-compatible server backed by RocksDB. That’s it.

┌──────────────┐     ┌───────────┐     ┌─────────────────────────┐
│ Your App     │────▶│ mcrouter  │────▶│ PetraCache              │
│ (memcache    │     │ (routing, │     │  ├─ memcached protocol  │
│  client)     │     │  failover)│     │  ├─ RocksDB storage     │
└──────────────┘     └───────────┘     │  └─ Data survives       │
                                       │     restarts            │
                                       └─────────────────────────┘

Your app thinks it’s talking to memcached. mcrouter handles routing. PetraCache handles storage. Everyone does one job well.

Technical Decisions

Why RocksDB?

RocksDB is an LSM-tree storage engine, optimized for write-heavy workloads. It’s battle-tested at Meta, Netflix, and countless other companies.

For a cache workload:

  • Block cache keeps hot data in memory (as fast as memcached)
  • SST files persist everything to disk (survives restarts)
  • Compaction cleans up deleted/expired keys in background
  • Compression (LZ4) reduces disk usage with minimal CPU overhead

This gives in-memory speed for hot data and persistence for everything else.

The WAL Trade-off

RocksDB’s Write-Ahead Log (WAL) ensures durability: writes go to WAL first, then memtable. If the process crashes, WAL replays uncommitted writes.

I disabled it.

let mut write_opts = WriteOptions::default();
write_opts.disable_wal(true);

Why? This is a cache. If we lose the last second of writes during a crash, the app will re-populate from the source of truth. The durability guarantee isn’t worth the write latency cost.

Result: writes go directly to memtable (RAM), flushed to disk asynchronously. Much faster.

TTL Expiration: Two Strategies

memcached supports TTL (time-to-live) on keys. PetraCache implements expiration two ways:

1. Lazy expiration (on read)

pub fn get(&self, key: &[u8]) -> Result<Option<StoredValue>> {
    match self.db.get(key)? {
        Some(bytes) => {
            let value = StoredValue::decode(&bytes)?;
            if value.is_expired() {
                self.db.delete(key)?;
                Ok(None)  // Pretend it doesn't exist
            } else {
                Ok(Some(value))
            }
        }
        None => Ok(None),
    }
}

When you GET an expired key, we delete it and return nothing. Simple.

2. Compaction filter (background cleanup)

fn ttl_compaction_filter(_level: u32, _key: &[u8], value: &[u8]) -> CompactionDecision {
    if value.len() >= 8 {
        let expire_at = u64::from_le_bytes(value[0..8].try_into().unwrap());
        if expire_at != 0 && current_timestamp() >= expire_at {
            return CompactionDecision::Remove;
        }
    }
    CompactionDecision::Keep
}

During RocksDB compaction, we check each key’s expiration. Expired keys are dropped, reclaiming disk space without explicit deletes.

Value Format

Each value stored in RocksDB:

[8 bytes: expire_at][4 bytes: flags][N bytes: data]
  • expire_at first: compaction filter can check TTL without decoding data
  • Little-endian: matches the CPU, no conversion needed
  • Fixed header: O(1) access to metadata

Zero-Copy Parsing

The memcached protocol is text-based:

set mykey 0 3600 5\r\n
hello\r\n

Parsing this without allocations:

pub enum Command<'a> {
    Get { keys: Vec<Cow<'a, [u8]>> },
    Set { key: Cow<'a, [u8]>, data: Cow<'a, [u8]>, ... },
    // ...
}

Cow<'a, [u8]> (Copy-on-Write) borrows from the input buffer when possible, only allocating when necessary. For most requests, zero heap allocations during parsing.

Why Not spawn_blocking for RocksDB?

RocksDB operations can block (disk I/O). Tokio’s golden rule: don’t block the async runtime.

I broke the rule intentionally.

// We do this (direct call)
let value = storage.get(key)?;

// Instead of this
let value = tokio::task::spawn_blocking(move || storage.get(key)).await?;

Why? Block cache hits are ~100 nanoseconds. spawn_blocking overhead is ~5-10 microseconds. For a cache with 95%+ hit ratio, the overhead exceeds the benefit.

If your working set exceeds block cache (lots of disk reads), reconsider this.

Performance

Single instance, Apple Silicon, 1KB values, 50% GET / 50% SET:

$ memtier_benchmark -s 127.0.0.1 -p 11211 --protocol=memcache_text \
    --clients=10 --threads=2 --test-time=30 --ratio=1:1 --data-size=1000

Type         Ops/sec     p50 Latency     p99 Latency   p99.9 Latency
--------------------------------------------------------------------
Sets        68504.04         0.14ms          0.37ms          0.49ms
Gets        68503.77         0.14ms          0.33ms          0.44ms
Totals     137007.81         0.14ms          0.35ms          0.47ms

137K ops/sec with sub-millisecond latency. Good enough for most use cases.

Scale horizontally with mcrouter: add more PetraCache instances, mcrouter distributes keys via consistent hashing.

mcrouter Configuration Example

Here’s an example setup: Istanbul as primary, Ankara as async replica. All reads go to Istanbul. Writes go to Istanbul first (sync), then replicate to Ankara (async).

{
  "pools": {
    "istanbul": {
      "servers": [
        "istanbul-petracache-1:11211",
        "istanbul-petracache-2:11211"
      ]
    },
    "ankara": {
      "servers": [
        "ankara-petracache-1:11211",
        "ankara-petracache-2:11211"
      ]
    }
  },
  "route": {
    "type": "OperationSelectorRoute",
    "default_policy": {
      "type": "FailoverRoute",
      "children": [
        { "type": "PoolRoute", "pool": "istanbul" },
        { "type": "PoolRoute", "pool": "ankara" }
      ]
    },
    "operation_policies": {
      "set": {
        "type": "AllInitialRoute",
        "children": [
          { "type": "PoolRoute", "pool": "istanbul" },
          { "type": "PoolRoute", "pool": "ankara" }
        ]
      },
      "delete": {
        "type": "AllInitialRoute",
        "children": [
          { "type": "PoolRoute", "pool": "istanbul" },
          { "type": "PoolRoute", "pool": "ankara" }
        ]
      }
    }
  }
}

What this does:

  • GET: Routes to Istanbul, fails over to Ankara if Istanbul is down
  • SET: Writes to Istanbul (sync), then Ankara (async)
  • DELETE: Same as SET—Istanbul first, Ankara async

Two key route types here:

  • FailoverRoute: Tries Istanbul first. If it fails (timeout, connection refused), automatically retries on Ankara. No manual intervention needed.
  • AllInitialRoute: Waits for the first child (Istanbul) to respond, then fires off the rest (Ankara) without waiting. Your app sees Istanbul latency, Ankara gets eventual consistency.

Istanbul down? GETs automatically fail over to Ankara. When Istanbul recovers, it starts serving again. Zero config changes needed.

What’s Missing

PetraCache is alpha software. Not implemented yet:

  • add, replace (conditional writes)
  • incr, decr (atomic counters)
  • cas (compare-and-swap)
  • stats (server statistics)
  • flush_all (clear all keys)

For a cache that just needs GET/SET/DELETE, it works today.

What I Learned

Building PetraCache was an exercise in learning by integration. Instead of building everything from scratch, I focused on understanding how proven components work and how to connect them.

  • mcrouter: Learned how Meta handles distributed caching at 5B req/sec—consistent hashing, failover strategies, connection pooling
  • RocksDB: Dove deep into LSM-trees, compaction filters, write amplification, block cache tuning
  • memcached protocol: Implemented the text protocol from scratch, learned zero-copy parsing in Rust

The result is ~2,000 lines of Rust that glue these components together. Not production-hardened yet, but a working prototype that taught me more than any tutorial could.

Sometimes the best way to learn a technology is to build something that depends on it.

Lessons Learned

1. Solve one problem

I needed persistence. That’s it. mcrouter already solved distribution. I didn’t build a distributed cache—I built a storage backend.

2. Measure before optimizing

I assumed RocksDB blocking calls would be a problem. Benchmarks showed they weren’t. spawn_blocking would have added latency for no benefit.

3. Simple beats clever

  • No custom storage format—RocksDB handles it
  • No custom network protocol—memcached ASCII works
  • No custom distribution—mcrouter handles it

The best code is code you don’t write.

4. Trade-offs are features

Disabling WAL isn’t a bug. It’s a deliberate choice: cache semantics don’t require durability. Document the trade-off and move on.

5. Existing solutions are underrated

Before writing code, ask: “Has someone already solved this?” Usually, yes. Your job is to find it and integrate it well.

Try It

git clone https://github.com/umit/petracache
cd petracache
cargo build --release
./target/release/petracache config.toml

Or with Docker (coming soon).


PetraCache is open source under MIT license. Contributions welcome.

Petra (πέτρα) means “rock” in Greek—a nod to RocksDB.