Data Plane

Object Storage Migration: MinIO → RustFS on TrueNAS SCALE

Date: 2026-06-05 Status: Runbook Applies to: TrueNAS SCALE 25.04.2 (Fangtooth)


Why this migration

The MinIO Docker image is now effectively unmaintained (and AIStor is MinIO's rebrand — same on-disk format), so the lakehouse object-storage layer moves to a two-tier S3 design:

  • Local, on-prem (authoritative working set) on TrueNAS ZFS.
  • Cloudflare R2 — cloud (exposure tier only). Holds only buckets that must reach the rest of the Cloudflare infrastructure: public/shared artifacts (e.g. Rerun recordings) and buckets backing aegean.ai's websites. Anything that does not need public/Cloudflare/website exposure stays local and is not mirrored.
  • Sync: rclone replicates the R2-destined buckets between local and R2 on a schedule (R2 has no native inbound replication; consistency is eventual).

This is the platform standard tracked in Jira AURA-667.

Chosen engine: RustFS (in-place binary replacement)

The local engine is RustFS — an open-source, S3-compatible, Rust object store positioned as a MinIO drop-in. We use its binary-replacement path: RustFS reads MinIO's existing data directory in place, with no data copy.

Why in-place works here (and why a copy would otherwise be needed)

MinIO does not store one object as one file. For every object it writes a directory bucket/objectname/ containing an xl.meta file (metadata + inlined small data) and, for larger objects, erasure-coded part.N files, plus a .minio.sys metadata tree. A generic POSIX-backed S3 gateway (e.g. versitygw — see the appendix) expects the opposite layout (bucket = directory, object = a single file), so it cannot read MinIO's data directly and requires an S3-API copy to transcode every object.

RustFS is different: it implements MinIO's on-disk format well enough to serve the existing data directory in place. Stop MinIO, point RustFS at the same path, start RustFS — bucket metadata, versioning, object-locks, IAM, and lifecycle rules carry over, with downtime typically under five minutes.

Trade-offs you are accepting

  • Pre-GA software. RustFS is in beta (latest 1.0.0-beta.6, May 2026; Apache-2.0; GA targeted ~July 2026). The pre-migration ZFS snapshot (below) is what makes running pre-GA code on authoritative data acceptable.
  • Opaque on disk. Objects remain in MinIO-style erasure layout, so a ZFS snapshot is a disaster-recovery backup but not browsable individual files (same as MinIO today; unlike the versitygw POSIX alternative).
  • Single-node, single/multi-drive only — not distributed clusters.
  • Not migrated: event notifications, site replication, LDAP/OIDC.

Source configuration (reference)

Values read from the existing minio app:

SettingValue
Poolmarathon
MinIO data (host path)/mnt/marathon/minio-dataset (mounted at /export)
MinIO API port9000 (console 9002)
Access keyIJDY6D4LMF3Z34F1BY7S
Secret key(keep out of source control — rotate after migration)
MinIO data owner (UID:GID)473:473

Mount the host path, not MinIO's /export

In the MinIO app, Mount Path /export is just the in-container mount point; Host Path /mnt/marathon/minio-dataset is the real on-disk location holding .minio.sys + the bucket dirs. RustFS must mount the host path. Sanity-check it before launch — ls -la /mnt/marathon/minio-dataset should show .minio.sys and your buckets at the top level; if they're nested a level down, mount that subdirectory instead.

We reuse the MinIO root keys for RustFS so clients change nothing (same endpoint, same keys).

The UID is the gotcha, and it rules out the catalog app. MinIO's data is owned by 473:473, and RustFS must run as a UID that can read it. The TrueNAS catalog app can't do this: it forces UID 568 and points RustFS at its own /rustfs/data0 volume rather than your dataset, so it crash-loops with Permission denied. Deploy a Custom App instead (Step 3) — it lets you set both the volume path and the UID. Run the container as 473 (matching the existing owner, no chown — cleanest and rollback-safe), or as 568 if you deliberately chowned the dataset. Do not use RustFS's image default of 10001 — it just diverges from the platform apps user and complicates rollback.

Step 1 — Snapshot the MinIO dataset (your only rollback)

RustFS's docs provide no rollback guidance and it may rewrite metadata in place, so snapshot first (System Settings → Shell):

zfs snapshot marathon/minio-dataset@pre-rustfs

Step 2 — Stop the old apps

  • Apps → minio → Stop. RustFS and MinIO cannot both own the data directory.
  • If you trialled versitygw earlier, stop/delete that app and you may delete its empty marathon/objects dataset — it is not part of the RustFS plan.

Step 3 — Deploy RustFS as a Custom App (not the catalog app)

The catalog app does NOT work for binary replacement

The TrueNAS RustFS catalog app runs a fresh, self-managed store — it points RustFS at its own internal path /rustfs/data0 (often a root-owned ixVolume), not at your MinIO dataset, and it enforces a minimum UID/GID of 568. Pointed at MinIO's 473-owned data it crash-loops with Io error: Permission denied (os error 13) and never adopts the data. Use a Custom App instead — it lets you override the volume path and the UID, which is exactly what in-place replacement requires.

Apps → Discover Apps → Custom App → Install via YAML. The two settings that make binary replacement work are RUSTFS_VOLUMES=/data (overrides the /rustfs/data0 default so RustFS reads your MinIO data) and user: matching the data owner:

services:
  rustfs:
    image: rustfs/rustfs:latest          # pin a release tag (e.g. :1.0.0-beta.6) for reproducibility
    restart: unless-stopped
    user: "473:473"                       # MUST match the MinIO data owner (see note)
    environment:
      RUSTFS_VOLUMES: "/data"             # point RustFS at the MinIO data, NOT /rustfs/data0
      RUSTFS_ADDRESS: "0.0.0.0:9000"      # take over MinIO's old API port (MinIO is stopped)
      RUSTFS_CONSOLE_ENABLE: "true"
      RUSTFS_CONSOLE_ADDRESS: "0.0.0.0:9001"          # bind all interfaces explicitly
      RUSTFS_CONSOLE_CORS_ALLOWED_ORIGINS: "*"        # or "https://<your-console-host>"
      RUSTFS_CORS_ALLOWED_ORIGINS: "*"                # main S3 API CORS
      RUSTFS_ACCESS_KEY: "IJDY6D4LMF3Z34F1BY7S"   # reuse MinIO root key → clients unchanged
      RUSTFS_SECRET_KEY: "<minio-secret>"
    ports:
      - "9000:9000"    # S3 API
      - "9001:9001"    # web console
    volumes:
      - /mnt/marathon/minio-dataset:/data   # the existing MinIO dataset, mounted at /data

Console behind Cloudflare is buggy on the current beta

The S3 API port (9000) and console port (9001) above are both published, so the console works on the LAN (http://<nas-ip>:9001). But RustFS's console currently embeds its internal port in API calls, so reached through a Cloudflare hostname (port 443, no port) those calls fail (rustfs#966, #3062). Recommended split:

  • S3 API (:9000) → Cloudflare — a plain S3 endpoint proxies fine (and R2 is the real cloud exposure tier anyway).
  • Console (:9001) → reach over LAN / Tailscale, not Cloudflare. It's an admin UI; it doesn't need public ingress, and this sidesteps the proxy bug.

If you must expose the console via Cloudflare, keep API + console on the same hostname (split domains are the worst case in the issues) and set the two *_CORS_ALLOWED_ORIGINS vars above; expect flakiness until those issues are fixed.

Match user: to the data owner — check it first:

stat -c '%U:%G (%u:%g)' /mnt/marathon/minio-dataset

If it's still 473, use user: "473:473" (no chown — cleanest, rollback-safe). If you already chowned it to 568, use user: "568:568". The only rule is container user == data owner; otherwise RustFS hits Permission denied on /data and exits. Avoid the image default 10001 — it just diverges from the platform apps user and complicates rollback.

Step 4 — Verify

RustFS ships a web console (unlike versitygw):

  • Console: http://<nas-ip>:9001 → log in → confirm your buckets and objects list.
  • S3 API round-trip (same keys, port 9000):
docker run --rm --network host \
  -e RCLONE_CONFIG_S3_TYPE=s3 -e RCLONE_CONFIG_S3_PROVIDER=Other \
  -e RCLONE_CONFIG_S3_ENDPOINT=http://127.0.0.1:9000 \
  -e RCLONE_CONFIG_S3_ACCESS_KEY_ID=IJDY6D4LMF3Z34F1BY7S \
  -e RCLONE_CONFIG_S3_SECRET_ACCESS_KEY='<minio-secret>' \
  rclone/rclone lsd S3:        # should list your MinIO buckets

Cross-check the bucket list against data-plane/datasets.yml (the dataset manifest) to confirm nothing is missing.

Step 5 — Roll back if needed

If buckets/objects don't list correctly, revert instantly and restart MinIO:

zfs rollback marathon/minio-dataset@pre-rustfs

Step 6 — Wire the R2 exposure-tier sync

R2 stays the cloud/exposure tier; only the local engine changed. Schedule an rclone → R2 sync for the public/website buckets via System → Advanced → Cron Jobs (R2 has no native inbound replication; consistency is eventual).

Step 7 — Rotate the root key

If the MinIO secret was ever exposed, change RUSTFS_SECRET_KEY in the app and update clients.

Step 8 — SSO via JumpCloud OIDC (optional)

To let people sign in to the RustFS console (and obtain temporary S3 credentials) with their org identity, federate RustFS to an external IdP.

Why OIDC, and why JumpCloud (not Google directly)

  • RustFS supports OIDC only — no LDAP. Its external-auth implementation is OIDC (OidcSys + STS AssumeRoleWithWebIdentity, with JWT claims mapped to IAM policies). So "JumpCloud LDAP" / "Google Secure LDAP" are not options regardless of what the IdPs offer.
  • Use JumpCloud, not raw Google. The value of an IdP here is mapping group membership → RustFS IAM policy, which needs group info in the JWT. JumpCloud emits group claims in its OIDC token (and already federates your Google Workspace users). Google's OIDC tokens do not contain Workspace group membership (groups live in the Admin SDK, not the ID token), so Google-direct authenticates users but can't drive group-based authorization. JumpCloud sits in front of Google and fills that gap.

RustFS OIDC config (OidcProviderConfig)

KeyValue
config_urlJumpCloud discovery URL (.../.well-known/openid-configuration)
client_id / client_secretfrom the JumpCloud OIDC app
scopesopenid,profile,email + whatever scope surfaces groups
groups_claim / roles_claimthe JWT claim carrying group/role names (e.g. groups)
email_claim / username_claimidentity (RustFS resolves preferred_usernameemailsub)

Setup

  1. JumpCloud → SSO → add a Custom OIDC app:
    • Grant type Authorization Code + PKCE.
    • Redirect URI = the RustFS console OIDC callback (exact path from your console, under /rustfs/console/...) — it must match exactly.
    • Add a groups attribute/claim to the app and attach the user groups RustFS should see. Record client_id, client_secret, and the discovery URL.
  2. RustFS: configure the OIDC provider with the keys above (config_url, client_id, client_secret, groups_claim).
  3. RustFS IAM: define policies and map claim values → policies (e.g. group data-admins → admin policy, data-readers → read-only).
  4. Test the console Login with SSO flow; programmatic clients can use AssumeRoleWithWebIdentity with a JumpCloud JWT, or keep static keys.

Caveats (beta)

  • RustFS OIDC is beta with active bugs — notably issuer trailing-slash handling that breaks some IdPs (rustfs#2349, #2049). Match JumpCloud's issuer URL exactly (with/without the trailing /).
  • Binary replacement dropped MinIO's old OIDC/LDAP config, so you configure this fresh.
  • Keep the static root access key as a break-glass admin in case SSO misbehaves.

Running long S3 jobs detached

The TrueNAS web UI Shell has an idle timeout and a fragile websocket; a foreground command tied to it gets SIGHUP'd on disconnect. For any long-running S3 job (e.g. an R2 sync), run the container detached so the Docker daemon owns it:

docker run -d --name r2-sync --network host <env...> \
  rclone/rclone sync LOCAL:bucket R2:bucket --checksum --transfers 16 --stats 30s --stats-one-line -v
 
docker logs -f r2-sync       # follow; Ctrl-C only stops following, the job keeps running
docker wait r2-sync          # blocks until done, prints exit code

SSH into the NAS is also steadier than the web Shell, but with -d it doesn't matter if the connection drops.

Appendix — versitygw (copy-based) alternative

If you ever want browsable objects as plain files on ZFS (so zfs snapshot yields individually restorable files) and production-grade maturity, the alternative is versitygw with its posix backend. It cannot read MinIO's directory in place, so it requires an S3-API copy:

  1. Create a fresh dataset (marathon/objects), acltype=posix. (xattr=sa is an optional perf tweak; TrueNAS SCALE often reverts it to on, which is fine.)
  2. Install the versitygw catalog app (posix backend, host path the new dataset, UID/GID 568 — the catalog app enforces a 568 minimum), on a spare port.
  3. rclone sync from MinIO's S3 endpoint to versitygw's, then rclone check.
  4. Cut clients over and decommission MinIO.

Trade-off vs RustFS: versitygw needs the copy (time + temporary double space) and a plain copy does not preserve versioning/object-locks/IAM, but you gain browsable files, snapshot-as-real-backup, and mature code.