Architecture
The service uses a layered architecture to keep runtime behavior explicit and testable.
Component Topology
-
HTTP API Layer
- validates input paths and query params
- handles request/response encoding
- applies presign verification on download
-
Service Layer
- orchestrates upload use-cases
- binds storage writes with metadata persistence operations
- computes object checksum/version metadata
-
Repository Layer
- executes transactional DB logic
- manages object version updates (
is_latest) - manages replication job claim/update transitions
-
Storage Adapter
- writes primary object content
- reads source content for worker replication
- writes secondary replicas
-
Replication Worker
- continuously polls claimable jobs
- executes replication attempts
- marks completion/failure with retry scheduling
Write Path (Upload) Sequence
- Parse and validate
{bucket}/{objectKey}. - Generate
version_id. - Persist object bytes to primary storage path.
- Open DB transaction.
- Mark previous versions for same
(bucket, object_key)as non-latest. - Insert new object metadata row.
- Insert replication job row.
- Commit transaction.
Critical property:
- metadata and replication-job insertion are atomic with each other.
- worker processing is decoupled and asynchronous by design.
Replication Path Sequence
- Worker atomically claims next due
pendingjob (FOR UPDATE SKIP LOCKEDpattern). - Job state becomes
running. - Worker reads primary object bytes and writes secondary replicas.
- Worker records per-node replica status rows.
- On success, job transitions to
completed. - On failure, job transitions back to
pendingwith incremented attempt and delayednext_run_at, or tofailedat retry limit.
Job State Machine
pending -> running -> completedpending -> running -> pending(retry path)pending -> running -> failed(terminal path)
State changes are guarded at SQL update time (update requires expected current status) to reduce stale-worker race effects.
Idempotency Strategy
- Worker model is at-least-once; duplicate execution is expected.
- Replica records use conflict-safe insertion plus unique object/node semantics.
- Replayed work should not create duplicate logical replica rows.
Failure Boundaries
- If primary file write fails: request fails, no metadata commit.
- If metadata transaction fails: request fails; job not enqueued.
- If replication fails: upload remains valid (primary + metadata), job retries asynchronously.
Current Architectural Tradeoffs
- Local filesystem nodes are used for simplicity in MVP.
- Upload path currently needs streaming optimization to bound memory usage.
- Schema lifecycle still needs migration-driven management.