Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Nomad vs nomad-lite

nomad-lite draws direct inspiration from HashiCorp Nomad, a production-grade workload orchestrator. This page documents where the two systems overlap and where they diverge.

nomad-lite is not a drop-in replacement for Nomad. It is a purpose-built implementation of the core distributed scheduling concepts — Raft consensus, distributed job assignment, worker liveness, graceful failover — written from scratch in Rust as a learning and experimentation platform. Understanding where Nomad sets the bar makes it easier to reason about which parts of that bar nomad-lite has already cleared and which remain ahead.


Scheduler Types

Nomad ships three built-in schedulers, each with a fundamentally different execution model:

SchedulerNomadnomad-lite
batchRun-to-completion jobs; retries on failure✅ Core model
serviceLong-running daemons; keeps N instances alive, restarts on crashNot supported
systemRuns exactly one instance on every node (log shippers, exporters)Not supported

nomad-lite is a pure batch scheduler. Every job is expected to start, do work, and exit. Long-running services and system-wide daemons are outside its current scope.


Task Drivers (How Jobs Run)

Nomad abstracts the execution runtime through pluggable task drivers:

DriverNomadnomad-lite
dockerFull image + entrypoint + env controlPartial — fixed Alpine image, sh -c only
exec / raw_execDirect process execution, no containerNot supported
javaJVM workloads with classpath managementNot supported
qemuFull virtual machine executionNot supported
podman, containerdOCI-compatible runtimes via pluginsNot supported

nomad-lite runs all jobs as docker run --rm alpine:latest sh -c <command>. The image is configurable at the node level but uniform across all jobs — submitters cannot choose a per-job image or entrypoint.


Scheduling Features

FeatureNomadnomad-lite
Placement algorithmBin-packing (CPU + memory aware)Least-loaded (running job count)
Resource constraintsCPU, memory, GPU, disk declared per jobNot supported
Node constraintsAttribute expressions (kernel.name == linux)Not supported
AffinitiesSoft preferences for placementNot supported
SpreadDistribute across failure domains (AZ, rack)Not supported
Job prioritiesInteger 1–100; high priority preempts lowNot supported
PreemptionEvict lower-priority running jobs to place high-priority onesNot supported

nomad-lite’s assigner selects the worker with the fewest running jobs. This is a reasonable approximation of load balancing but ignores actual resource consumption — a node running one heavy job looks the same as one running one trivial job.


Job Lifecycle

FeatureNomadnomad-lite
Job submissionHCL job spec fileSingle string command (gRPC / CLI)
Job timeoutskill_timeout, max_kill_timeout per taskNode-level 30 s wall-clock timeout (kills container + marks Failed); per-job configurable timeout not supported
Retry policyrestart stanza: attempts, delay, modeNot supported
Reschedule on node lossreschedule stanza with backoffNot supported
Job cancellationnomad job stopnomad-lite job cancel <job-id>
Job priorities1–100 integer fieldNot supported
Parameterized jobsparameterized stanza; dispatch via CLI/APINot supported
Periodic jobsCron expression in job specNot supported
Job dependencies (DAG)Not native; requires external toolingNot supported
Rolling updatesupdate stanza with canary, max_parallelNot supported
Job versioningTracks job spec history, auto-reverts on failureNot supported

Consensus and State

FeatureNomadnomad-lite
Consensus protocolRaft (via HashiCorp Raft library)Custom Raft implementation in Rust
State persistenceDurable BoltDB-backed Raft log✅ Optional RocksDB-backed log + snapshot via --data-dir; in-memory if omitted
Log compactionRaft snapshots to BoltDBIn-memory prefix truncation + snapshot; persisted to RocksDB when --data-dir is set
Multi-regionFederation across regions with replicationSingle cluster only
Leader electionRandomized timeoutsRandomized timeouts (150–300 ms)
Leadership transfernomad operator raft transfer-leadershipnomad-lite cluster transfer-leader
Node drainnomad node drainnomad-lite cluster drain

nomad-lite now supports optional state persistence via --data-dir. When enabled, every Raft log entry, term change, voted-for value, and snapshot is written to a local RocksDB store before being acknowledged. A crashed node can rejoin, replay its log, and resume without losing any committed state. Without --data-dir the node runs in-memory only.


Security

FeatureNomadnomad-lite
Transport encryptionmTLS for all RPC✅ mTLS implemented
ACL systemToken-based with policies and rolesNot supported
Vault integrationSecrets injection at job startNot supported
NamespacesMulti-tenant isolationNot supported
Sentinel policiesFine-grained job submission governanceNot supported

nomad-lite implements the transport layer (mTLS) but has no authorization model. Any client that presents a valid certificate can submit jobs, drain nodes, or transfer leadership. In a controlled environment this is acceptable; in a shared environment it is not.


Observability

FeatureNomadnomad-lite
MetricsPrometheus endpoint (/v1/metrics)Not supported
Distributed tracingOpenTelemetry supportNot supported
Health endpoints/v1/agent/health (live + ready)/health/live + /health/ready
Audit loggingImmutable audit log (Enterprise)Not supported
Web UIFull job and cluster management UI✅ Basic dashboard (status + job list)
Log streamingnomad alloc logs -fNot supported

Client API

FeatureNomadnomad-lite
ProtocolHTTP/JSON REST + gRPCgRPC (primary) + HTTP (dashboard only)
Job submissionHCL / JSON job specstring command field
StreamingEvent stream, log tailingStreamJobs gRPC streaming
PaginationCursor-based✅ Cursor-based ListJobs
Batch submissionMultiple allocations per jobNot supported
Leader redirectX-Nomad-Index + redirect hints✅ CLI auto-redirects to leader

Feature Summary

                    ┌─────────────────────────────────────────┐
                    │         nomad-lite feature coverage     │
                    └─────────────────────────────────────────┘

 Consensus layer
   ✅  Leader election (randomized timeouts)
   ✅  Log replication (AppendEntries)
   ✅  Log compaction + snapshots
   ✅  Leadership transfer
   ✅  InstallSnapshot for lagging followers
   ✅  Dedicated heartbeat channel (no HOL blocking)
   ✅  Proposal backpressure (bounded queue, immediate RESOURCE_EXHAUSTED)

 Cluster operations
   ✅  Node drain (stop accepting, finish in-flight, transfer leadership)
   ✅  Graceful shutdown (SIGTERM/SIGINT with 30 s drain window)
   ✅  mTLS for all cluster and client communication
   ✅  Peer liveness tracking

 Job scheduling
   ✅  Distributed job assignment via Raft (all nodes agree on assignments)
   ✅  Least-loaded worker selection
   ✅  Worker heartbeat liveness (5 s window)
   ✅  Batch status update replication
   ✅  Job output stored on executing node, fetched on demand

 Client-facing
   ✅  SubmitJob / CancelJob / GetJobStatus / ListJobs (paginated) / StreamJobs
   ✅  GetClusterStatus
   ✅  GetRaftLogEntries (debug)
   ✅  CLI with automatic leader redirect
   ✅  Web dashboard
   ✅  /health/live + /health/ready endpoints

 Persistence
   ✅  State persistence (RocksDB-backed Raft log + snapshot via --data-dir)

 Not supported / Out of scope
   ⬜  Job timeouts, retries, priorities
   ⬜  Reschedule on node failure
   ⬜  Typed job payloads (HttpCallback, WasmModule, DockerImage)
   ⬜  Parameterized job templates
   ⬜  Periodic / cron jobs
   ⬜  Resource-aware placement
   ⬜  Job dependency graphs (DAG)
   ⬜  Shared output storage
   ⬜  Prometheus metrics / OpenTelemetry tracing
   ⬜  Authorization (ACL tokens)