API Reference

REST API (Dashboard)

The web dashboard exposes a REST API for job and cluster management.

Get cluster status:

curl http://localhost:8081/api/cluster
# Response:
# {
#   "node_id": 1,
#   "role": "leader",
#   "current_term": 5,
#   "leader_id": 1,
#   "commit_index": 3,
#   "last_applied": 3,
#   "log_length": 3,
#   "nodes": [
#     { "node_id": 1, "address": "0.0.0.0:50051", "is_alive": true },
#     { "node_id": 2, "address": "127.0.0.1:50052", "is_alive": true },
#     { "node_id": 3, "address": "127.0.0.1:50053", "is_alive": false }
#   ]
# }

Submit a job:

curl -X POST http://localhost:8081/api/jobs \
  -H "Content-Type: application/json" \
  -d '{"command": "echo hello"}'
# Response:
# {
#   "job_id": "ef319e40-c888-490d-8349-e9c05f78cf5a",
#   "status": "pending"
# }

# With a specific Docker image (overrides the server default for this job)
curl -X POST http://localhost:8081/api/jobs \
  -H "Content-Type: application/json" \
  -d '{"command": "python3 -c '\''print(42)'\''", "image": "python:3.12-alpine"}'

Cancel a job:

curl -X DELETE http://localhost:8081/api/jobs/ef319e40-c888-490d-8349-e9c05f78cf5a
# Response (success):
# {
#   "success": true,
#   "error": null
# }

# Response (already terminal):
# HTTP 400
# {
#   "success": false,
#   "error": "job is already completed"
# }

List all jobs:

curl http://localhost:8081/api/jobs
# Response:
# [
#   {
#     "id": "ef319e40-c888-490d-8349-e9c05f78cf5a",
#     "command": "echo hello",
#     "status": "completed",
#     "executed_by": 1,
#     "output": "hello\n",
#     "error": null,
#     "created_at": "2026-01-28T12:45:41.231558433+00:00",
#     "completed_at": "2026-01-28T12:45:41.678341558+00:00"
#   }
# ]

Liveness probe:

curl http://localhost:8081/health/live
# Response (always 200 while the process is alive):
# {
#   "status": "ok"
# }

Readiness probe:

curl http://localhost:8081/health/ready
# Response when a leader has been elected (200):
# {
#   "status": "ok",
#   "leader_id": 1
# }

# Response during startup or mid-election (503):
# {
#   "status": "no_leader",
#   "leader_id": null
# }

gRPC API

SchedulerService (client-facing)

Method	Description	Leader Only
`SubmitJob(command, image?)`	Submit a job; `image` overrides the server-default Docker image for this job	Yes
`CancelJob(job_id)`	Cancel a pending or running job	Yes
`GetJobStatus(job_id)`	Get job status	No
`ListJobs(page_size, page_token, status_filter, worker_id_filter, command_filter, created_after_ms, created_before_ms)`	List jobs (paginated, filterable)	No
`StreamJobs()`	Stream jobs	No
`GetClusterStatus()`	Cluster info	Forwarded to leader
`GetRaftLogEntries()`	View Raft log entries	Forwarded to leader
`TransferLeadership(target)`	Transfer leadership	Yes
`DrainNode()`	Drain node for maintenance	No

ListJobs request fields

Field	Type	Default	Description
`page_size`	uint32	100	Max results per page (capped at 1000)
`page_token`	string	“”	Token from the previous response for the next page
`status_filter`	JobStatus	UNSPECIFIED	Only return jobs with this status; 0/UNSPECIFIED = no filter
`worker_id_filter`	uint64	0	Only return jobs whose `assigned_worker` or `executed_by` matches; 0 = no filter
`command_filter`	string	“”	Case-insensitive substring match on the command; empty = no filter
`created_after_ms`	int64	0	Only return jobs created at or after this Unix timestamp (ms); 0 = no bound
`created_before_ms`	int64	0	Only return jobs created at or before this Unix timestamp (ms); 0 = no bound

total_count in the response reflects the filtered result set size (not the total queue size).

SubmitJob error codes

gRPC status	Meaning	Client action
`OK`	Job accepted and committed	—
`FAILED_PRECONDITION`	Node is not the leader	Redirect to the node ID in the message
`RESOURCE_EXHAUSTED`	Leader proposal queue is full (>256 pending)	Retry with exponential backoff
`DEADLINE_EXCEEDED`	Raft did not commit the entry within 5 seconds	Retry; may indicate a degraded cluster
`UNAVAILABLE`	Node is draining, or the Raft loop has stopped	Retry on a different node
`INVALID_ARGUMENT`	Empty command string, or command exceeds 1024 bytes	Fix the request

CancelJob error codes

gRPC status	Meaning	Client action
`OK`	Job cancelled and committed	—
`FAILED_PRECONDITION`	Node is not the leader, or job is already in a terminal state	Redirect to leader / check job status
`NOT_FOUND`	Job ID does not exist	—
`RESOURCE_EXHAUSTED`	Leader proposal queue is full	Retry with exponential backoff
`DEADLINE_EXCEEDED`	Raft did not commit within 5 seconds	Retry
`INVALID_ARGUMENT`	Malformed job UUID	Fix the request

InternalService (node-to-node, not client-facing)

Method	Description
`GetJobOutput(job_id)`	Fetch job output from the node that executed it
`WorkerHeartbeat(node_id)`	Worker liveness signal sent every 2 s to the leader; auto-registers on first call; workers not seen for 5 s are excluded from job assignment
`ForwardJobStatus(updates)`	Follower worker forwards completed job status to the leader for Raft replication

RaftService (node-to-node, consensus protocol)

Method	Description
`AppendEntries`	Log replication and heartbeats
`RequestVote`	Leader election voting
`TimeoutNow`	Trigger immediate election on the target node (used by `TransferLeadership`)
`InstallSnapshot`	Transfer compacted state to slow followers

Keyboard shortcuts

nomad-lite