Operations Runbook
This runbook provides practical operating procedures for FLUX using TypeScript SDK workflows.
Start and Health Check
- start broker
- run TypeScript SDK smoke test
Smoke test:
import { FLUXClient } from '@flux/typescript-sdk';
const client = new FLUXClient({ host: '127.0.0.1', port: 9092 });
await client.connect();
await client.produce('orders', 'ops', 'health-check', '1');
const join = await client.join('ops', 'orders', 'checker', 'round_robin');
const sync = await client.sync('ops', 'orders', 'checker', join.generation);
await client.heartbeat('ops', 'orders', 'checker', sync.generation);
await client.leave('ops', 'orders', 'checker', sync.generation);
await client.close();
Incident: REPLICATION_TIMEOUT
Possible causes:
- ISR below configured minimum
- follower progress stale
- replica lag too high
Immediate checks:
- verify
FLUX_MIN_ISRand lag timeout settings - inspect logs for under-replicated partition warnings
- validate producer ack mode (
acks=allis stricter)
Incident: GENERATION_MISMATCH
Meaning:
- consumer is stale or reassigned after rebalance
Actions:
- if using high-level runtime API, rely on automatic rejoin behavior
- if using low-level API, run join/sync again and refresh generation
- retry commit only after assignment is current
Incident: NOT_LEADER
Meaning:
- partition role is follower on this broker
Actions:
- verify role transition workflows
- in low-level admin/debug flows, restore role to leader before local writes
Data Inspection
Inspect data directory for:
- segment logs
- index files
offsets.jsongroups.json
Backup Guidance (Current)
For single-node setups:
- stop broker cleanly
- snapshot entire
FLUX_DATA_DIR - restore to same or compatible runtime config
Upgrade Guidance (Current)
- run full test suite before rollout
- deploy one broker process replacement
- run TypeScript smoke test after restart