Introduction

nomad-lite is a distributed job scheduler with custom Raft consensus, similar to Nomad or Kubernetes scheduler. Jobs are shell commands executed in isolated Docker containers across a cluster.

Features

Custom Raft Consensus - Leader election, log replication, and fault tolerance from scratch
Distributed Scheduling - Jobs executed across cluster with automatic failover
Unified CLI - Single binary for both server and client with automatic leader redirect
mTLS Security - Mutual TLS for all gRPC communication
Docker Sandboxing - Jobs run in isolated containers with restricted capabilities
Web Dashboard - Real-time monitoring and job management
gRPC + REST APIs - Type-safe client communication
Job Cancellation - Cancel pending or running jobs via CLI, gRPC, or REST dashboard
Health Endpoints - /health/live (process alive) and /health/ready (leader known) for orchestrator integration
Graceful Shutdown - SIGTERM/SIGINT handling with drain period for in-flight work
Leader Draining & Transfer - Voluntary leadership transfer and node draining for safe maintenance
Batch Replication - Multiple job status updates batched into a single Raft log entry for reduced consensus overhead
State Persistence - Optional RocksDB-backed Raft log, term, and snapshot storage via --data-dir; nodes survive restarts and rejoin without losing committed state
Log Compaction - Automatic in-memory log prefix truncation with snapshot transfer to slow followers
Proposal Backpressure - Bounded proposal queue (256 slots); clients get an immediate RESOURCE_EXHAUSTED when the leader is overloaded and a DEADLINE_EXCEEDED if a commit stalls, rather than hanging indefinitely

Requirements

Dependency	Version	Installation
Rust	1.56+	`curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs \| sh`
protoc	3.0+	`apt install protobuf-compiler` / `brew install protobuf`
Docker	20.0+	`apt install docker.io` / `brew install --cask docker`

Nomad vs nomad-lite

nomad-lite draws direct inspiration from HashiCorp Nomad, a production-grade workload orchestrator. This page documents where the two systems overlap and where they diverge.

nomad-lite is not a drop-in replacement for Nomad. It is a purpose-built implementation of the core distributed scheduling concepts — Raft consensus, distributed job assignment, worker liveness, graceful failover — written from scratch in Rust as a learning and experimentation platform. Understanding where Nomad sets the bar makes it easier to reason about which parts of that bar nomad-lite has already cleared and which remain ahead.

Scheduler Types

Nomad ships three built-in schedulers, each with a fundamentally different execution model:

Scheduler	Nomad	nomad-lite
batch	Run-to-completion jobs; retries on failure	✅ Core model
service	Long-running daemons; keeps N instances alive, restarts on crash	Not supported
system	Runs exactly one instance on every node (log shippers, exporters)	Not supported

nomad-lite is a pure batch scheduler. Every job is expected to start, do work, and exit. Long-running services and system-wide daemons are outside its current scope.

Task Drivers (How Jobs Run)

Nomad abstracts the execution runtime through pluggable task drivers:

Driver	Nomad	nomad-lite
`docker`	Full image + entrypoint + env control	Partial — fixed Alpine image, `sh -c` only
`exec` / `raw_exec`	Direct process execution, no container	Not supported
`java`	JVM workloads with classpath management	Not supported
`qemu`	Full virtual machine execution	Not supported
`podman`, `containerd`	OCI-compatible runtimes via plugins	Not supported

nomad-lite runs all jobs as docker run --rm alpine:latest sh -c <command>. The image is configurable at the node level but uniform across all jobs — submitters cannot choose a per-job image or entrypoint.

Scheduling Features

Feature	Nomad	nomad-lite
Placement algorithm	Bin-packing (CPU + memory aware)	Least-loaded (running job count)
Resource constraints	CPU, memory, GPU, disk declared per job	Not supported
Node constraints	Attribute expressions (`kernel.name == linux`)	Not supported
Affinities	Soft preferences for placement	Not supported
Spread	Distribute across failure domains (AZ, rack)	Not supported
Job priorities	Integer 1–100; high priority preempts low	Not supported
Preemption	Evict lower-priority running jobs to place high-priority ones	Not supported

nomad-lite’s assigner selects the worker with the fewest running jobs. This is a reasonable approximation of load balancing but ignores actual resource consumption — a node running one heavy job looks the same as one running one trivial job.

Job Lifecycle

Feature	Nomad	nomad-lite
Job submission	HCL job spec file	Single string command (gRPC / CLI)
Job timeouts	`kill_timeout`, `max_kill_timeout` per task	Node-level 30 s wall-clock timeout (kills container + marks Failed); per-job configurable timeout not supported
Retry policy	`restart` stanza: attempts, delay, mode	Not supported
Reschedule on node loss	`reschedule` stanza with backoff	Not supported
Job cancellation	`nomad job stop`	✅ `nomad-lite job cancel <job-id>`
Job priorities	1–100 integer field	Not supported
Parameterized jobs	`parameterized` stanza; dispatch via CLI/API	Not supported
Periodic jobs	Cron expression in job spec	Not supported
Job dependencies (DAG)	Not native; requires external tooling	Not supported
Rolling updates	`update` stanza with canary, max_parallel	Not supported
Job versioning	Tracks job spec history, auto-reverts on failure	Not supported

Consensus and State

Feature	Nomad	nomad-lite
Consensus protocol	Raft (via HashiCorp Raft library)	Custom Raft implementation in Rust
State persistence	Durable BoltDB-backed Raft log	✅ Optional RocksDB-backed log + snapshot via `--data-dir`; in-memory if omitted
Log compaction	Raft snapshots to BoltDB	In-memory prefix truncation + snapshot; persisted to RocksDB when `--data-dir` is set
Multi-region	Federation across regions with replication	Single cluster only
Leader election	Randomized timeouts	Randomized timeouts (150–300 ms)
Leadership transfer	`nomad operator raft transfer-leadership`	✅ `nomad-lite cluster transfer-leader`
Node drain	`nomad node drain`	✅ `nomad-lite cluster drain`

nomad-lite now supports optional state persistence via --data-dir. When enabled, every Raft log entry, term change, voted-for value, and snapshot is written to a local RocksDB store before being acknowledged. A crashed node can rejoin, replay its log, and resume without losing any committed state. Without --data-dir the node runs in-memory only.

Security

Feature	Nomad	nomad-lite
Transport encryption	mTLS for all RPC	✅ mTLS implemented
ACL system	Token-based with policies and roles	Not supported
Vault integration	Secrets injection at job start	Not supported
Namespaces	Multi-tenant isolation	Not supported
Sentinel policies	Fine-grained job submission governance	Not supported

nomad-lite implements the transport layer (mTLS) but has no authorization model. Any client that presents a valid certificate can submit jobs, drain nodes, or transfer leadership. In a controlled environment this is acceptable; in a shared environment it is not.

Observability

Feature	Nomad	nomad-lite
Metrics	Prometheus endpoint (`/v1/metrics`)	Not supported
Distributed tracing	OpenTelemetry support	Not supported
Health endpoints	`/v1/agent/health` (live + ready)	✅ `/health/live` + `/health/ready`
Audit logging	Immutable audit log (Enterprise)	Not supported
Web UI	Full job and cluster management UI	✅ Basic dashboard (status + job list)
Log streaming	`nomad alloc logs -f`	Not supported

Client API

Feature	Nomad	nomad-lite
Protocol	HTTP/JSON REST + gRPC	gRPC (primary) + HTTP (dashboard only)
Job submission	HCL / JSON job spec	`string command` field
Streaming	Event stream, log tailing	✅ `StreamJobs` gRPC streaming
Pagination	Cursor-based	✅ Cursor-based `ListJobs`
Batch submission	Multiple allocations per job	Not supported
Leader redirect	`X-Nomad-Index` + redirect hints	✅ CLI auto-redirects to leader

Feature Summary

                    ┌─────────────────────────────────────────┐
                    │         nomad-lite feature coverage     │
                    └─────────────────────────────────────────┘

 Consensus layer
   ✅  Leader election (randomized timeouts)
   ✅  Log replication (AppendEntries)
   ✅  Log compaction + snapshots
   ✅  Leadership transfer
   ✅  InstallSnapshot for lagging followers
   ✅  Dedicated heartbeat channel (no HOL blocking)
   ✅  Proposal backpressure (bounded queue, immediate RESOURCE_EXHAUSTED)

 Cluster operations
   ✅  Node drain (stop accepting, finish in-flight, transfer leadership)
   ✅  Graceful shutdown (SIGTERM/SIGINT with 30 s drain window)
   ✅  mTLS for all cluster and client communication
   ✅  Peer liveness tracking

 Job scheduling
   ✅  Distributed job assignment via Raft (all nodes agree on assignments)
   ✅  Least-loaded worker selection
   ✅  Worker heartbeat liveness (5 s window)
   ✅  Batch status update replication
   ✅  Job output stored on executing node, fetched on demand

 Client-facing
   ✅  SubmitJob / CancelJob / GetJobStatus / ListJobs (paginated) / StreamJobs
   ✅  GetClusterStatus
   ✅  GetRaftLogEntries (debug)
   ✅  CLI with automatic leader redirect
   ✅  Web dashboard
   ✅  /health/live + /health/ready endpoints

 Persistence
   ✅  State persistence (RocksDB-backed Raft log + snapshot via --data-dir)

 Not supported / Out of scope
   ⬜  Job timeouts, retries, priorities
   ⬜  Reschedule on node failure
   ⬜  Typed job payloads (HttpCallback, WasmModule, DockerImage)
   ⬜  Parameterized job templates
   ⬜  Periodic / cron jobs
   ⬜  Resource-aware placement
   ⬜  Job dependency graphs (DAG)
   ⬜  Shared output storage
   ⬜  Prometheus metrics / OpenTelemetry tracing
   ⬜  Authorization (ACL tokens)

Getting Started

Quick Start

# Build and install
cargo install --path .

# Run single node with dashboard
nomad-lite server --node-id 1 --port 50051 --dashboard-port 8080

# Open http://localhost:8080

# Submit a job (in another terminal)
nomad-lite job submit "echo hello"

# Check job status
nomad-lite job status <job-id>

# View cluster status
nomad-lite cluster status

Running a Cluster

Option 1: Docker Compose (Recommended)

The easiest way to run a 3-node cluster. Choose between production (mTLS) or development (no TLS) setup.

Production Setup (mTLS)

# Generate certificates (one-time setup)
./scripts/gen-test-certs.sh ./certs

# Start cluster
docker-compose up --build

# Stop cluster
docker-compose down

Using the CLI with mTLS:

# Check cluster status
nomad-lite cluster \
  --addr "https://127.0.0.1:50051" \
  --ca-cert ./certs/ca.crt \
  --cert ./certs/client.crt \
  --key ./certs/client.key \
  status

# Find the leader node from cluster status output, then submit to it
# Example: if Node 3 is leader, use port 50053
nomad-lite job \
  --addr "https://127.0.0.1:50053" \
  --ca-cert ./certs/ca.crt \
  --cert ./certs/client.crt \
  --key ./certs/client.key \
  submit "echo hello from mTLS"

# Get job status
nomad-lite job \
  --addr "https://127.0.0.1:50051" \
  --ca-cert ./certs/ca.crt \
  --cert ./certs/client.crt \
  --key ./certs/client.key \
  status <job-id>

# List all jobs
nomad-lite job \
  --addr "https://127.0.0.1:50051" \
  --ca-cert ./certs/ca.crt \
  --cert ./certs/client.crt \
  --key ./certs/client.key \
  list

Development Setup (No TLS)

# Start cluster
docker-compose -f docker-compose.dev.yml up --build

# Stop cluster
docker-compose -f docker-compose.dev.yml down

Using the CLI without TLS:

# Check cluster status
nomad-lite cluster --addr "http://127.0.0.1:50051" status

# Find the leader node from cluster status output, then submit to it
# Example: if Node 1 is leader, use port 50051
nomad-lite job --addr "http://127.0.0.1:50051" submit "echo hello"

# Get job status
nomad-lite job --addr "http://127.0.0.1:50051" status <job-id>

# List all jobs
nomad-lite job --addr "http://127.0.0.1:50051" list

Cluster Endpoints:

Node	gRPC	Dashboard
1	`localhost:50051`	`localhost:8081`
2	`localhost:50052`	`localhost:8082`
3	`localhost:50053`	`localhost:8083`

Note: Auto-redirect doesn’t work with Docker Compose because nodes report internal addresses (e.g., node1:50051) that aren’t accessible from the host. Always check which node is leader using cluster status and connect directly to it for job submissions.

Option 2: Local Multi-Node

Run nodes directly without Docker:

# Terminal 1
nomad-lite server --node-id 1 --port 50051 --dashboard-port 8081 \
  --peers "2:127.0.0.1:50052,3:127.0.0.1:50053"

# Terminal 2
nomad-lite server --node-id 2 --port 50052 --dashboard-port 8082 \
  --peers "1:127.0.0.1:50051,3:127.0.0.1:50053"

# Terminal 3
nomad-lite server --node-id 3 --port 50053 --dashboard-port 8083 \
  --peers "1:127.0.0.1:50051,2:127.0.0.1:50052"

With state persistence (survives restarts):

# Add --data-dir to each node; state is preserved across restarts
nomad-lite server --node-id 1 --port 50051 --dashboard-port 8081 \
  --peers "2:127.0.0.1:50052,3:127.0.0.1:50053" \
  --data-dir /var/lib/nomad-lite/node1

nomad-lite server --node-id 2 --port 50052 --dashboard-port 8082 \
  --peers "1:127.0.0.1:50051,3:127.0.0.1:50053" \
  --data-dir /var/lib/nomad-lite/node2

nomad-lite server --node-id 3 --port 50053 --dashboard-port 8083 \
  --peers "1:127.0.0.1:50051,2:127.0.0.1:50052" \
  --data-dir /var/lib/nomad-lite/node3

Each node stores its Raft log, term, voted-for, and job snapshot in a local RocksDB instance. Omit --data-dir to run with in-memory state (useful for development and testing).

With mTLS:

# Generate certificates first
./scripts/gen-test-certs.sh ./certs

# Add TLS flags to each node
--tls --ca-cert ./certs/ca.crt --cert ./certs/node1.crt --key ./certs/node1.key

CLI Reference

The nomad-lite binary provides both server and client functionality.

nomad-lite
├── server                        # Start a server node
├── job                           # Job management
│   ├── submit <COMMAND>         # Submit a new job
│   ├── status <JOB_ID>          # Get job status
│   ├── cancel <JOB_ID>          # Cancel a pending or running job
│   └── list                     # List all jobs
├── cluster                       # Cluster management
│   ├── status                   # Get cluster info
│   ├── transfer-leader          # Transfer leadership to another node
│   └── drain                    # Drain node for maintenance
└── log                           # Raft log inspection
    └── list                     # View committed log entries

Server Options

Flag	Default	Description
`--node-id`	1	Unique node identifier
`--port`	50051	gRPC server port
`--dashboard-port`	-	Web dashboard port (optional)
`--peers`	“”	Peer addresses: `"id:host:port,..."`
`--image`	alpine:latest	Docker image for jobs
`--tls`	false	Enable mTLS
`--ca-cert`	-	CA certificate path
`--cert`	-	Node certificate path
`--key`	-	Node private key path
`--allow-insecure`	false	Run without TLS if certs fail
`--data-dir`	-	Path to RocksDB data directory for persistence (optional; omit for in-memory)
`--advertise-addr`	`127.0.0.1:<port>`	Address reported to peers and shown in cluster status; defaults to the listen port on loopback; override when nodes must be reachable at a specific hostname or external IP (e.g., `192.168.1.10:50051`)

Client Options

Flag	Default	Description
`-a, --addr`	`http://127.0.0.1:50051`	Server address
`-o, --output`	`table`	Output format: `table` or `json`
`--ca-cert`	-	CA certificate for TLS
`--cert`	-	Client certificate for mTLS
`--key`	-	Client private key for mTLS

`job submit` Options

Flag	Default	Description
`--image IMAGE`	server default (`alpine:latest`)	Docker image to run this job in; overrides the server-wide `--image` setting for this job only

Command Examples

Submit a job:

nomad-lite job submit "echo hello"
# Job submitted successfully!
# Job ID: ef319e40-c888-490d-8349-e9c05f78cf5a

# Override the Docker image for this specific job
nomad-lite job submit --image python:3.12-alpine "python3 -c 'print(42)'"
# Job submitted successfully!
# Job ID: 3b7a1c22-...

Get job status:

nomad-lite job status ef319e40-c888-490d-8349-e9c05f78cf5a
# Job ID:          ef319e40-c888-490d-8349-e9c05f78cf5a
# Status:          COMPLETED
# Exit Code:       0
# Assigned Worker: 1
# Executed By:     1
# Output:
#   hello

Cancel a job:

nomad-lite job cancel ef319e40-c888-490d-8349-e9c05f78cf5a
# job ef319e40-c888-490d-8349-e9c05f78cf5a cancelled

Cancelling a job that is already terminal returns an error:

nomad-lite job cancel ef319e40-c888-490d-8349-e9c05f78cf5a
# Error: job is already completed

List all jobs:

nomad-lite job list
# JOB ID                                 STATUS       WORKER   COMMAND
# ------------------------------------------------------------------------------
# ef319e40-c888-490d-8349-e9c05f78cf5a   COMPLETED    1        echo hello
#
# Showing 1 of 1 jobs

List jobs with pagination:

nomad-lite job list --page-size 50 --all  # Fetch all pages
nomad-lite job list --stream              # Use streaming API

Filter jobs:

# Filter by status
nomad-lite job list --status pending
nomad-lite job list --status completed

# Filter by worker node
nomad-lite job list --worker 2

# Filter by command substring (case-insensitive)
nomad-lite job list --command-substr echo

# Filter by creation time range (Unix ms timestamps)
nomad-lite job list --created-after-ms 1700000000000 --created-before-ms 1710000000000

# Combine filters
nomad-lite job list --status pending --command-substr echo --worker 1

Filter flags for job list:

Flag	Description
`--status STATUS`	Only jobs with this status: `pending`, `running`, `completed`, `failed`, `cancelled`
`--worker WORKER_ID`	Only jobs whose `assigned_worker` or `executed_by` matches this node ID
`--command-substr SUBSTR`	Case-insensitive substring match on the job command
`--created-after-ms MS`	Only jobs created at or after this Unix timestamp (milliseconds)
`--created-before-ms MS`	Only jobs created at or before this Unix timestamp (milliseconds)

Note: Filters are not supported with --stream (only --status applies to the streaming API).

Get cluster status:

nomad-lite cluster status
# Cluster Status
# ========================================
# Term:   5
# Leader: Node 1
#
# Nodes:
# ID       ADDRESS                   STATUS
# ---------------------------------------------
# 1        0.0.0.0:50051             [+] alive
# 2        127.0.0.1:50052           [+] alive
# 3        127.0.0.1:50053           [+] alive

Transfer leadership:

# Transfer to a specific node
nomad-lite cluster -a http://127.0.0.1:50051 transfer-leader --to 2
# Leadership transferred successfully!
# New leader: Node 2

# Auto-select best candidate
nomad-lite cluster -a http://127.0.0.1:50051 transfer-leader
# Leadership transferred successfully!
# New leader: Node 3

Drain a node for maintenance:

# Drain the node: stops accepting jobs, waits for running jobs, transfers leadership
nomad-lite cluster -a http://127.0.0.1:50051 drain
# Draining node...
# Node drained successfully.
# Node drained successfully

# Verify leadership moved
nomad-lite cluster -a http://127.0.0.1:50052 status

View Raft log entries:

nomad-lite log list
# Raft Log Entries
# ================================================================================
# Commit Index: 6  |  Last Log Index: 6
#
# INDEX  TERM   COMMITTED  TYPE                 DETAILS
# --------------------------------------------------------------------------------
# 1      1      yes        Noop
# 2      1      yes        SubmitJob            job_id=bd764021-..., cmd=echo job1
# 3      1      yes        SubmitJob            job_id=1cce681f-..., cmd=echo job2
# 4      1      yes        SubmitJob            job_id=26694755-..., cmd=echo job3
# 5      1      yes        BatchUpdateJobStatus 3 updates
# 6      1      yes        UpdateJobStatus      job_id=17cc39b2-..., status=Completed
#
# Showing 6 entries

View log entries after compaction:

When log compaction has occurred, earlier entries are replaced by a snapshot:

nomad-lite log list
# Raft Log Entries
# ================================================================================
# First Available: 1001  |  Commit Index: 1050  |  Last Log Index: 1050
# (Entries 1-1000 were compacted into a snapshot)
#
# INDEX  TERM   COMMITTED  TYPE                 DETAILS
# ...

View log entries with pagination:

nomad-lite log list --start-index 1 --limit 50  # Start from index 1, max 50 entries
nomad-lite log list --start-index 10            # Start from index 10

JSON output:

nomad-lite job -o json list
nomad-lite cluster -o json status
nomad-lite log -o json list

Automatic Leader Redirect

For local clusters (non-Docker), the CLI automatically redirects to the leader if you connect to a follower. This works for both submit and cancel:

# Connect to follower node (port 50052), CLI auto-redirects to leader
nomad-lite job -a http://127.0.0.1:50052 submit "echo hello"
# Redirecting to leader at http://127.0.0.1:50051...
# Job submitted successfully!

nomad-lite job -a http://127.0.0.1:50052 cancel <job-id>
# Redirecting to leader at http://127.0.0.1:50051...
# job <job-id> cancelled

API Reference

REST API (Dashboard)

The web dashboard exposes a REST API for job and cluster management.

Get cluster status:

curl http://localhost:8081/api/cluster
# Response:
# {
#   "node_id": 1,
#   "role": "leader",
#   "current_term": 5,
#   "leader_id": 1,
#   "commit_index": 3,
#   "last_applied": 3,
#   "log_length": 3,
#   "nodes": [
#     { "node_id": 1, "address": "0.0.0.0:50051", "is_alive": true },
#     { "node_id": 2, "address": "127.0.0.1:50052", "is_alive": true },
#     { "node_id": 3, "address": "127.0.0.1:50053", "is_alive": false }
#   ]
# }

Submit a job:

curl -X POST http://localhost:8081/api/jobs \
  -H "Content-Type: application/json" \
  -d '{"command": "echo hello"}'
# Response:
# {
#   "job_id": "ef319e40-c888-490d-8349-e9c05f78cf5a",
#   "status": "pending"
# }

# With a specific Docker image (overrides the server default for this job)
curl -X POST http://localhost:8081/api/jobs \
  -H "Content-Type: application/json" \
  -d '{"command": "python3 -c '\''print(42)'\''", "image": "python:3.12-alpine"}'

Cancel a job:

curl -X DELETE http://localhost:8081/api/jobs/ef319e40-c888-490d-8349-e9c05f78cf5a
# Response (success):
# {
#   "success": true,
#   "error": null
# }

# Response (already terminal):
# HTTP 400
# {
#   "success": false,
#   "error": "job is already completed"
# }

List all jobs:

curl http://localhost:8081/api/jobs
# Response:
# [
#   {
#     "id": "ef319e40-c888-490d-8349-e9c05f78cf5a",
#     "command": "echo hello",
#     "status": "completed",
#     "executed_by": 1,
#     "output": "hello\n",
#     "error": null,
#     "created_at": "2026-01-28T12:45:41.231558433+00:00",
#     "completed_at": "2026-01-28T12:45:41.678341558+00:00"
#   }
# ]

Liveness probe:

curl http://localhost:8081/health/live
# Response (always 200 while the process is alive):
# {
#   "status": "ok"
# }

Readiness probe:

curl http://localhost:8081/health/ready
# Response when a leader has been elected (200):
# {
#   "status": "ok",
#   "leader_id": 1
# }

# Response during startup or mid-election (503):
# {
#   "status": "no_leader",
#   "leader_id": null
# }

gRPC API

SchedulerService (client-facing)

Method	Description	Leader Only
`SubmitJob(command, image?)`	Submit a job; `image` overrides the server-default Docker image for this job	Yes
`CancelJob(job_id)`	Cancel a pending or running job	Yes
`GetJobStatus(job_id)`	Get job status	No
`ListJobs(page_size, page_token, status_filter, worker_id_filter, command_filter, created_after_ms, created_before_ms)`	List jobs (paginated, filterable)	No
`StreamJobs()`	Stream jobs	No
`GetClusterStatus()`	Cluster info	Forwarded to leader
`GetRaftLogEntries()`	View Raft log entries	Forwarded to leader
`TransferLeadership(target)`	Transfer leadership	Yes
`DrainNode()`	Drain node for maintenance	No

ListJobs request fields

Field	Type	Default	Description
`page_size`	uint32	100	Max results per page (capped at 1000)
`page_token`	string	“”	Token from the previous response for the next page
`status_filter`	JobStatus	UNSPECIFIED	Only return jobs with this status; 0/UNSPECIFIED = no filter
`worker_id_filter`	uint64	0	Only return jobs whose `assigned_worker` or `executed_by` matches; 0 = no filter
`command_filter`	string	“”	Case-insensitive substring match on the command; empty = no filter
`created_after_ms`	int64	0	Only return jobs created at or after this Unix timestamp (ms); 0 = no bound
`created_before_ms`	int64	0	Only return jobs created at or before this Unix timestamp (ms); 0 = no bound

total_count in the response reflects the filtered result set size (not the total queue size).

SubmitJob error codes

gRPC status	Meaning	Client action
`OK`	Job accepted and committed	—
`FAILED_PRECONDITION`	Node is not the leader	Redirect to the node ID in the message
`RESOURCE_EXHAUSTED`	Leader proposal queue is full (>256 pending)	Retry with exponential backoff
`DEADLINE_EXCEEDED`	Raft did not commit the entry within 5 seconds	Retry; may indicate a degraded cluster
`UNAVAILABLE`	Node is draining, or the Raft loop has stopped	Retry on a different node
`INVALID_ARGUMENT`	Empty command string, or command exceeds 1024 bytes	Fix the request

CancelJob error codes

gRPC status	Meaning	Client action
`OK`	Job cancelled and committed	—
`FAILED_PRECONDITION`	Node is not the leader, or job is already in a terminal state	Redirect to leader / check job status
`NOT_FOUND`	Job ID does not exist	—
`RESOURCE_EXHAUSTED`	Leader proposal queue is full	Retry with exponential backoff
`DEADLINE_EXCEEDED`	Raft did not commit within 5 seconds	Retry
`INVALID_ARGUMENT`	Malformed job UUID	Fix the request

InternalService (node-to-node, not client-facing)

Method	Description
`GetJobOutput(job_id)`	Fetch job output from the node that executed it
`WorkerHeartbeat(node_id)`	Worker liveness signal sent every 2 s to the leader; auto-registers on first call; workers not seen for 5 s are excluded from job assignment
`ForwardJobStatus(updates)`	Follower worker forwards completed job status to the leader for Raft replication

RaftService (node-to-node, consensus protocol)

Method	Description
`AppendEntries`	Log replication and heartbeats
`RequestVote`	Leader election voting
`TimeoutNow`	Trigger immediate election on the target node (used by `TransferLeadership`)
`InstallSnapshot`	Transfer compacted state to slow followers

Architecture

Cluster Overview

graph LR
    subgraph Clients
        CLI[CLI]
        REST[REST]
        GRPC[gRPC]
    end

    subgraph Cluster[3-Node Raft Cluster]
        N1[Node 1<br/>gRPC :50051<br/>Dashboard :8081]
        N2[Node 2<br/>gRPC :50052<br/>Dashboard :8082]
        N3[Node 3<br/>gRPC :50053<br/>Dashboard :8083]

        N1 <-->|Raft RPCs| N2
        N1 <-->|Raft RPCs| N3
        N2 <-->|Raft RPCs| N3
    end

    subgraph Docker[Docker Containers]
        D1[Container]
        D2[Container]
        D3[Container]
    end

    CLI -->|Submit to leader| N1
    REST -->|HTTP API| N1
    GRPC -->|gRPC API| N1

    N1 -->|docker run| D1
    N2 -->|docker run| D2
    N3 -->|docker run| D3

    classDef leader fill:#90EE90,stroke:#006400,stroke-width:2px
    classDef follower fill:#FFE4B5,stroke:#FF8C00,stroke-width:2px
    class N1 leader
    class N2,N3 follower

Node Internal Architecture

graph TB
    subgraph External[External APIs]
        GRPC[gRPC Server<br/>SubmitJob, GetStatus, ListJobs]
        DASH[Dashboard<br/>REST API + Web UI]
    end

    subgraph Core[Core Components]
        RAFT[Raft Module<br/>Leader Election<br/>Log Replication]
        LOG[Raft Log<br/>In-Memory or RocksDB]
    end

    subgraph Loops[Background Loops]
        SCHED[Scheduler Loop<br/>Event-driven<br/>Applies commits<br/>Assigns jobs on leader]
        WORKER[Worker Loop<br/>Event-driven + 2s heartbeat<br/>Executes assigned jobs]
    end

    QUEUE[Job Queue<br/>State Machine]
    DOCKER[Docker Container<br/>Sandboxed Execution]

    %% External to Core
    GRPC -->|Commands| RAFT
    DASH -->|Commands| RAFT
    DASH -->|Query| QUEUE

    %% Core
    RAFT <--> LOG

    %% Loops subscribe/interact
    SCHED -.->|Subscribe commits| RAFT
    SCHED -->|Apply entries & Assign jobs| QUEUE
    WORKER -->|Notified & Update| QUEUE
    WORKER -->|Execute| DOCKER

    classDef api fill:#87CEEB,stroke:#4682B4
    classDef core fill:#90EE90,stroke:#006400
    classDef loop fill:#FFE4B5,stroke:#FF8C00
    classDef state fill:#E6E6FA,stroke:#9370DB

    class GRPC,DASH api
    class RAFT,LOG core
    class SCHED,WORKER loop
    class QUEUE,DOCKER state

Key Points:

Every node runs all components (gRPC, Dashboard, Raft, Scheduler, Worker)
Only the leader’s Scheduler Loop assigns jobs; followers just apply committed entries
Job assignments travel through Raft (AssignJob command) so every node’s queue reflects the same state
Workers send WorkerHeartbeat gRPC to the leader every 2 s to signal liveness; workers that miss heartbeats for more than 5 s are excluded from job assignment
Workers discover assigned jobs by querying the local job queue directly (no node-local in-memory map needed)
Job output is stored only on the executing node (fetched via GetJobOutput RPC when queried)

Data Flow

sequenceDiagram
    participant Client as CLI Client
    participant N1 as Node 1 (Leader)<br/>gRPC Server
    participant Raft1 as Node 1<br/>Raft Module
    participant N2 as Node 2 (Follower)<br/>Raft Module
    participant N3 as Node 3 (Follower)<br/>Raft Module
    participant Log1 as Node 1<br/>Raft Log
    participant Queue1 as Node 1<br/>Job Queue
    participant Sched as Scheduler Loop<br/>(Leader Only)
    participant Worker as Worker Loop<br/>(Node 2)
    participant Docker as Docker Container

    Note over Client,Docker: Job Submission Flow (Write Operation)

    Client->>N1: 1. SubmitJob("echo hello")
    N1->>N1: 2. Create Job object<br/>job_id = uuid
    N1->>Raft1: 3. Propose SubmitJob command
    Raft1->>Log1: 4. Append to local log<br/>(index=N, term=T)

    par Replicate to Followers
        Raft1->>N2: 5a. AppendEntries RPC<br/>(log entry)
        Raft1->>N3: 5b. AppendEntries RPC<br/>(log entry)
    end

    N2-->>Raft1: 6a. ACK (success)
    N3-->>Raft1: 6b. ACK (success)

    Note over Raft1: Majority reached (2/3)

    Raft1-->>N1: 7. Commit confirmed
    N1->>Queue1: 8. Add job to queue<br/>Status: PENDING
    N1-->>Client: 9. Response: job_id

    Note over Client,Docker: Job Assignment Flow (Leader Scheduler Loop - event-driven)

    Sched->>Raft1: 10. Subscribe to commits
    Raft1-->>Sched: 11. Commit notification
    Sched->>Queue1: 12. Apply committed entry<br/>(idempotent add)

    Sched->>Queue1: 13. Check pending jobs
    Queue1-->>Sched: 14. Job found (PENDING)
    Sched->>Sched: 15. Select least-loaded live worker<br/>(Node 2, via WorkerHeartbeat liveness)
    Sched->>Queue1: 16. Optimistic local assign<br/>Status: RUNNING (leader only)
    Sched->>Raft1: 17. Propose AssignJob(job_id, worker=2)

    par Replicate AssignJob to Followers
        Raft1->>N2: 18a. AppendEntries (AssignJob)
        Raft1->>N3: 18b. AppendEntries (AssignJob)
    end

    N2-->>Raft1: 19a. ACK
    N3-->>Raft1: 19b. ACK

    Note over N2,N3: Followers apply AssignJob<br/>Status: RUNNING on all nodes

    Note over Client,Docker: Job Execution Flow (Worker Loop - notified on assignment)

    Worker->>Queue1: 20. Notified, query jobs_assigned_to(node2)
    Queue1-->>Worker: 21. Job assigned to me<br/>(RUNNING status)

    Worker->>Docker: 22. docker run alpine:latest<br/>--network=none --read-only<br/>--memory=256m --cpus=0.5<br/>echo hello

    Docker-->>Worker: 23. Output: "hello\n"<br/>Exit code: 0

    Worker->>Queue1: 24. Update local job result<br/>Status: COMPLETED<br/>Output stored locally

    Note over Worker,Raft1: Follower worker calls ForwardJobStatus → leader proposes to Raft

    Worker->>Raft1: 25. ForwardJobStatus (job done, exit=0)
    Raft1->>N2: 26. AppendEntries (UpdateJobStatus)
    Raft1->>N3: 26. AppendEntries (UpdateJobStatus)

    Note over Client,Docker: Job Status Query Flow (Read Operation)

    Client->>N1: 27. GetJobStatus(job_id)
    N1->>Queue1: 28. Query job queue
    Queue1-->>N1: 29. Job metadata<br/>(executed_by: Node 2)
    N1->>N2: 30. GetJobOutput RPC<br/>(fetch from executor)
    N2-->>N1: 31. Output: "hello"
    N1-->>Client: 32. Response: COMPLETED<br/>Output: "hello"

    Note over Client,Docker: Leader Election Flow (Failure Scenario)

    Note over Raft1: Leader crashes!

    Note over N2,N3: Election timeout (150-300ms)<br/>No heartbeat received

    N2->>N2: Election timeout triggered
    N2->>N2: Increment term, become candidate
    N2->>N3: RequestVote RPC (term=T+1)
    N3-->>N2: VoteGranted

    Note over N2: Won election with majority

    N2->>N2: Become Leader
    Note over N2: Scheduler loop now active<br/>(assigns jobs)

    Note over N2,N3: Node 2 now leads cluster

Security

mTLS

All gRPC communication (node-to-node, client-to-node) can be secured with mutual TLS:

Both parties authenticate via certificates signed by cluster CA
All traffic encrypted with TLS 1.2+
Generate certs: ./scripts/gen-test-certs.sh ./certs

Docker Sandboxing

Jobs run in isolated containers with:

Restriction	Setting
Network	`--network=none`
Capabilities	`--cap-drop=ALL`
Filesystem	`--read-only`
Privileges	`--security-opt=no-new-privileges`
Memory	`--memory=256m`
CPU	`--cpus=0.5`
Wall-clock timeout	30s (hardcoded; jobs exceeding this are killed and marked Failed)

Raft Consensus

Timing

Election timeout: 150-300ms (randomized)
Heartbeat interval: 50ms
Peer liveness window: 3s — a node is shown as is_alive: false in GetClusterStatus if no successful AppendEntries/heartbeat has been received from it within this window

Log Replication

Client sends command to leader
Leader appends to log and replicates via AppendEntries
Majority acknowledgment → committed
Applied to state machine

Proposal backpressure

The leader’s proposal channel holds at most 256 pending commands. If the channel is full, SubmitJob returns RESOURCE_EXHAUSTED immediately — the caller should retry with exponential backoff. Additionally, if the Raft loop accepts the command but quorum is lost before the entry commits, SubmitJob returns DEADLINE_EXCEEDED after 5 seconds. Both guarantees ensure clients always get a bounded response rather than hanging indefinitely.

State Machine Commands

Each log entry carries one of the following commands applied to the job queue on every node:

Command	Description
`SubmitJob`	Adds a new job in Pending state
`AssignJob`	Assigns a pending job to a specific worker, setting it to Running on all nodes
`RegisterWorker`	Records a worker in the cluster-wide registry
`UpdateJobStatus` / `BatchUpdateJobStatus`	Marks one or more jobs Completed or Failed

AssignJob is proposed by the leader’s scheduler loop after it selects the least-loaded live worker. Because this command travels through the Raft log, every follower applies the same assignment atomically — workers discover their jobs by querying the local job queue rather than a node-local in-memory map.

Log Compaction

When the in-memory log exceeds 1000 entries, the committed prefix is replaced with a snapshot of the current state (job queue + worker registrations). This bounds memory usage regardless of throughput. Followers that fall too far behind receive the snapshot via InstallSnapshot RPC instead of replaying individual entries.

State Persistence

By default, all Raft state (log, term, voted-for, snapshot) is held in memory and lost on restart. Pass --data-dir <path> to enable a local RocksDB store. On startup the node reloads its persisted term, voted-for, and committed log entries and replays any stored snapshot into the job queue before rejoining the cluster. Storage failures cause an immediate crash (crash-fail-safe) rather than silent data loss.

Safety Guarantees

Election safety: One leader per term
Leader append-only: Never overwrites log
Log matching: Same index/term = identical
Leader completeness: Committed entries persist

Cluster Sizing

Nodes	Majority	Fault Tolerance
3	2	1 failure
5	3	2 failures
7	4	3 failures

Use odd numbers—even numbers add overhead without improving fault tolerance.

Testing

cargo test                # Run all tests
cargo test --lib          # Unit tests only
cargo test --test <name>  # Specific test suite

Test Suites

Suite	Description
`lib` (unit)	Config, Raft state machine, RPC serialization
`scheduler_tests`	Job queue, worker assignment, heartbeats
`raft_rpc_tests`	AppendEntries, RequestVote, InstallSnapshot, term handling, compaction boundary cases, re-election vote reset
`integration_tests`	Multi-node election, replication, consistency
`failover_tests`	Leader crash, re-election, quorum loss
`partition_tests`	Network partitions, split-brain prevention, healing
`chaos_tests`	Rapid leader churn, network flapping, cascading failures, full isolation recovery
`tls_tests`	mTLS certificate loading, encrypted cluster communication
`executor_tests`	Docker sandbox command execution
`internal_service_tests`	Internal node-to-node API, job output fetching, `ForwardJobStatus` to leader
`distribution_tests`	Jobs spread across nodes (not just leader), `WorkerHeartbeat` RPC, dead-worker exclusion
`dashboard_tests`	REST API endpoints
`leadership_transfer_tests`	Voluntary leadership transfer, auto-select, non-leader rejection
`drain_tests`	Node draining, job rejection during drain, leadership handoff
`compaction_tests`	Log compaction trigger, snapshot transfer to slow followers, multiple compaction rounds, AppendEntries fallback when below compaction threshold, state consistency
`backpressure_tests`	Full proposal channel returns `RESOURCE_EXHAUSTED` immediately; stalled Raft loop returns `DEADLINE_EXCEEDED` after 5 s
`persistence_tests`	RocksDB storage: fresh DB returns `None`, hard state/log reload, log truncation, snapshot compaction deletes old entries, post-compaction state reconstruction on restart, node term survives restart

Keyboard shortcuts

nomad-lite