Caching
BatleHub sits between your build tools and upstream registries. This page explains exactly how caching works: from the moment a client sends a request, through the cache lookup, to the response — and how to tune every part of that path.
How the cache works
Request lifecycle
Every request to /proxy/{registry}/... passes through this pipeline:
Client
│
▼
AuthMiddleware ← validates bearer token / OIDC / Kubernetes SA
│ sets Identity (user_id, role, groups)
▼
RateLimitMiddleware ← checks + increments counters in the cache backend
│ 429 if any bucket exhausted (block mode)
▼
RBAC check ← validates registry permissions for the caller's role
│ 403 if permission denied
▼
Rules check ← release age gate, deny_latest, require_signed_release
│ 403 if a rule fires
▼
Metadata cache lookup ← checked in the cache backend (memory / postgres / redis)
│
├─ HIT: proceed with cached metadata
│
└─ MISS: fetch from upstream, store in cache backend with TTL
│
▼
Artifact storage lookup ← checked in blob storage (filesystem / S3)
│
├─ HIT: stream artifact from storage to client
│
└─ MISS: fetch from upstream, store in blob storage, stream to clientTwo layers of state
BatleHub separates two kinds of cached state with different lifetimes and backends:
| Layer | What is stored | Where | Lifetime |
|---|---|---|---|
| Metadata cache | Version lists, release info, package metadata | Cache backend ([cache]) | metadata_ttl_secs (default 5 min), then re-fetched |
| Artifact storage | Tarballs, .crate files, VSIX packages, Go module zips | Blob storage ([storage]) | Permanent by default; controlled by eviction policy |
| Rate-limit counters | Per-user / per-group request counts | Cache backend ([cache]) | One fixed window (window_secs), then auto-reset |
Metadata is intentionally short-lived: version lists change as packages are published upstream. Artifacts are stored permanently because a .crate or tarball at a given version never changes.
Cache backend — [cache]
The [cache] section selects the storage engine for metadata and rate-limit counters. Three backends are available:
# In-process memory (default — no extra infrastructure needed)
[cache]
type = "memory"
# PostgreSQL — persistent, shared across all server replicas
[cache]
type = "postgres"
# Uses the same database URL as [database]; no extra config needed.
# Redis — persistent, shared, TTL-based eviction, lower latency than Postgres
[cache]
type = "redis"
url = "redis://localhost:6379"Choosing a backend
| Backend | Persistence | Multi-instance | Extra infra | Best for |
|---|---|---|---|---|
memory | No — resets on restart | No — each instance has its own counters | None | Local dev, single-node |
postgres | Yes | Yes — all instances share one DB | PostgreSQL (already required) | Production, multi-replica |
redis | Yes | Yes — all instances share one cluster | Redis | High-throughput production |
Single-node deployments
memory is the default and requires no extra config. Switch to postgres or redis when running multiple server replicas or when you need rate-limit counters and metadata cache to survive server restarts.
Redis feature flag
The Redis backend is compiled only when the cache-redis Cargo feature is enabled. The official Docker image includes it. When building from source, add --features cache-redis to the cargo build command.
What changes between backends
Metadata cache: When a client requests a version list and the result is in the cache, the backend is queried (hash map lookup, DB row read, or Redis GET). On a miss, the upstream is contacted and the result is stored with a TTL.
Rate-limit counters: Each increment call atomically bumps a counter keyed by rl:{registry}:user:{user_id} (or rl:{registry}:group:{group}) and returns the new count plus the window-reset timestamp. With memory, this is a Mutex<HashMap> operation. With postgres, it is an INSERT … ON CONFLICT DO UPDATE … RETURNING count that is serialisable under concurrent load. With redis, it is an atomic INCR with a conditional EXPIRE on first write.
Per-registry cache policy — [registries.cache]
Each registry has its own [registries.cache] block that controls:
- How long metadata is considered fresh
- Whether to serve stale metadata when the upstream is down
- When artifacts are evicted from blob storage
- How to pre-fill the cache before the first client request
[registries.cache]
metadata_ttl_secs = 300 # re-check version lists every 5 min (default)
serve_stale = true # serve cached metadata on upstream 5xx (default)
# Artifact TTL (optional) — re-fetch artifacts older than this many seconds
artifact_ttl_secs = 2592000 # delete/re-fetch artifacts older than 30 days
# Additional eviction strategies — all optional, compose with each other
idle_days = 14 # delete artifacts not accessed for 14 days
max_size_bytes = 10737418240 # 10 GiB storage cap — evicts LRU when exceeded
keep_latest_n = 5 # keep only the 5 most recent versions per package
# Warming
warm_packages = ["lodash", "react", "typescript@5.4.5"]
warm_latest_n = 3 # warm the 3 most recent versions of bare-name entries
warm_concurrency = 4 # up to 4 parallel downloadsMetadata TTL and serve_stale
Metadata (version lists, release info) is cached for metadata_ttl_secs seconds. After expiry:
- If the upstream responds: the fresh data is stored and the TTL is reset.
- If the upstream returns 5xx and
serve_stale = true(the default): the stale cached data is returned so clients continue to work during upstream outages. - If the upstream returns 5xx and
serve_stale = false: the error is propagated to the client as502 Bad Gateway.
Set metadata_ttl_secs = 0 to always re-check upstream on every request (useful in local and hybrid modes where you want the index to be always fresh).
Artifact TTL
By default, once an artifact is downloaded it is kept until an eviction policy removes it. Set artifact_ttl_secs to make artifacts expire by age — the next request after expiry re-fetches from upstream and resets the clock:
[registries.cache]
artifact_ttl_secs = 86400 # re-fetch artifacts older than 24 hoursThis is useful for registries where packages may be updated in-place (uncommon for public registries, but possible with some private mirrors).
Eviction policies
Eviction is checked lazily — an artifact is evicted when it would be served and a policy says it has expired. Strategies compose: the first policy to fire triggers the eviction.
Artifacts without cache metadata
If an artifact was stored before the artifact_cache_meta migration was applied (added in a recent release), it has no cached_at timestamp. When artifact_ttl_secs is configured, such artifacts are conservatively treated as expired and re-fetched from upstream on next access — the correct behavior is to get a fresh copy rather than serve an artifact of unknown age forever.
| Policy | Config key | Effect |
|---|---|---|
| Age | artifact_ttl_secs | Remove artifacts older than N seconds |
| Idle | idle_days | Remove artifacts not accessed for N days |
| Size cap | max_size_bytes | When storage exceeds the cap, the least-recently-used artifacts are removed until usage is back under the cap |
| Version count | keep_latest_n | Keep only the N most recently cached versions per package; older versions are removed when a new one is stored |
Omitting all four fields disables eviction — artifacts are kept indefinitely.
Cache warming
Cache warming pre-fetches packages so they are available with zero latency before any client requests them. Configure it in [registries.cache]:
[registries.cache]
warm_packages = ["lodash", "react@18.2.0", "serde"]
warm_latest_n = 3 # warm the 3 most recent versions of bare-name entries
warm_concurrency = 4 # up to 4 parallel downloads during warming- Bare name (
"lodash"): warms thewarm_latest_nmost recent versions. - Pinned version (
"react@18.2.0"): warms exactly one version.
Warming runs at startup in the background — the HTTP server is immediately available while warming proceeds. You can also trigger warming on-demand via the admin API:
# Warm using the registry's configured warm_latest_n
curl -X POST http://localhost:8080/api/v1/admin/registries/npm/warm \
-H "Authorization: Bearer <admin-token>" \
-H "Content-Type: application/json" \
-d '{"package": "lodash"}'
# Warm a specific version
curl -X POST http://localhost:8080/api/v1/admin/registries/npm/warm \
-H "Authorization: Bearer <admin-token>" \
-H "Content-Type: application/json" \
-d '{"package": "lodash@4.17.21"}'Registry support
Version enumeration (needed to warm bare package names) is implemented for npm, Cargo, OpenVSX, and Go modules. For GitHub and VS Code Marketplace, pass a pinned version string to warm a specific version.
Content-addressable deduplication
Artifact bytes are stored at a content-addressed key (blob/{sha256}) in blob storage. Each logical artifact path (e.g. artifact:npm/lodash:4.17.21) holds a reference to the blob rather than the bytes themselves. A reference-count table tracks how many logical keys point to each blob.
This means:
- The same package published to two registries stores only one copy.
- A yanked and re-released version at the same content is stored once.
- Deduplication is transparent — no configuration required.
The deduplication tables (artifact_dedup_index, artifact_dedup_refs) are created by the database migration and maintained automatically.
Rate limiting and the cache backend
Rate-limit counters are stored in the same backend as the metadata cache. This means:
- With
[cache] type = "memory": counters are per-process. Restarting the server or running multiple replicas gives each process its own independent counters. - With
[cache] type = "postgres"ortype = "redis": counters are shared across all server replicas and survive restarts. A user who hits the limit on one replica cannot bypass it by routing their next request to a different replica.
Configure rate limiting per registry:
[registries.rate_limit]
requests_per_window = 200 # per authenticated user, or per client IP for anonymous
window_secs = 60
enforcement = "block" # "block" returns 429; "warn" lets request through
# Shared pool for all CI bot group members combined:
[[registries.rate_limit.groups]]
name = "oidc:ci-bots"
requests_per_window = 5000
window_secs = 60Response headers
| Header | Condition | Value |
|---|---|---|
X-RateLimit-Limit | Every proxied response (when rate limiting is configured) | The most-restrictive limit that applied to this request |
Retry-After | 429 response (block mode) | Seconds until the current window resets |
X-RateLimit-Reset | 429 response (block mode) | Unix timestamp when the current window resets |
X-RateLimit-Warning: rate-limit-exceeded | Over-limit response (warn mode) | Present when the request was allowed despite exceeding the limit |
Fixed-window semantics
BatleHub uses a fixed window counter (not a sliding window or token bucket). Each window is aligned to the Unix epoch:
window_start = floor(now_unix / window_secs) * window_secsFor example, with window_secs = 60, windows run from :00 to :59 of each minute, then reset. The X-RateLimit-Reset header gives the exact Unix timestamp of the next window boundary.
Fail-open behaviour
If the cache backend is unavailable when a rate-limit counter needs to be incremented, the request is allowed rather than rejected. This prevents the cache backend from becoming a single point of failure for the entire proxy.
When a bucket cannot be incremented due to a store error, BatleHub emits a WARN log entry (rate-limit store unavailable … failing open) so the outage is visible in your observability tooling even though individual requests are not blocked. Monitor for these warnings and check the health endpoint if you suspect rate limiting is not being enforced.
Worked examples
Single-node with memory cache (default)
No extra config needed — works out of the box:
[database]
type = "postgresql"
url = "postgresql://batlehub:changeme@localhost:5432/batlehub"
# [cache] defaults to type = "memory"
[[registries]]
type = "npm"
name = "npm"
[registries.cache]
metadata_ttl_secs = 300
keep_latest_n = 10Multi-replica with PostgreSQL cache
All replicas share the same database, so metadata cache and rate-limit counters are consistent:
[database]
type = "postgresql"
url = "postgresql://batlehub:changeme@db:5432/batlehub"
[cache]
type = "postgres"
[[registries]]
type = "npm"
name = "npm"
[registries.cache]
metadata_ttl_secs = 60
serve_stale = true
[registries.rate_limit]
requests_per_window = 1000
window_secs = 60
enforcement = "block"High-throughput with Redis cache
Redis provides lower per-operation latency than PostgreSQL for hot paths:
[cache]
type = "redis"
url = "redis://redis:6379"
[[registries]]
type = "npm"
name = "npm"
[registries.cache]
metadata_ttl_secs = 120
[registries.rate_limit]
requests_per_window = 5000
window_secs = 60
enforcement = "block"
[[registries.rate_limit.groups]]
name = "oidc:ci-bots"
requests_per_window = 50000
window_secs = 60Aggressive eviction (space-constrained)
[registries.cache]
metadata_ttl_secs = 600
artifact_ttl_secs = 604800 # 7 days
idle_days = 3
max_size_bytes = 5368709120 # 5 GiB
keep_latest_n = 3
warm_packages = ["lodash", "react", "axios"]
warm_latest_n = 1