Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

API Reference

REST API (Dashboard)

The web dashboard exposes a REST API for job and cluster management.

Get cluster status:

curl http://localhost:8081/api/cluster
# Response:
# {
#   "node_id": 1,
#   "role": "leader",
#   "current_term": 5,
#   "leader_id": 1,
#   "commit_index": 3,
#   "last_applied": 3,
#   "log_length": 3,
#   "nodes": [
#     { "node_id": 1, "address": "0.0.0.0:50051", "is_alive": true },
#     { "node_id": 2, "address": "127.0.0.1:50052", "is_alive": true },
#     { "node_id": 3, "address": "127.0.0.1:50053", "is_alive": false }
#   ]
# }

Submit a job:

curl -X POST http://localhost:8081/api/jobs \
  -H "Content-Type: application/json" \
  -d '{"command": "echo hello"}'
# Response:
# {
#   "job_id": "ef319e40-c888-490d-8349-e9c05f78cf5a",
#   "status": "pending"
# }

# With a specific Docker image (overrides the server default for this job)
curl -X POST http://localhost:8081/api/jobs \
  -H "Content-Type: application/json" \
  -d '{"command": "python3 -c '\''print(42)'\''", "image": "python:3.12-alpine"}'

Cancel a job:

curl -X DELETE http://localhost:8081/api/jobs/ef319e40-c888-490d-8349-e9c05f78cf5a
# Response (success):
# {
#   "success": true,
#   "error": null
# }

# Response (already terminal):
# HTTP 400
# {
#   "success": false,
#   "error": "job is already completed"
# }

List all jobs:

curl http://localhost:8081/api/jobs
# Response:
# [
#   {
#     "id": "ef319e40-c888-490d-8349-e9c05f78cf5a",
#     "command": "echo hello",
#     "status": "completed",
#     "executed_by": 1,
#     "output": "hello\n",
#     "error": null,
#     "created_at": "2026-01-28T12:45:41.231558433+00:00",
#     "completed_at": "2026-01-28T12:45:41.678341558+00:00"
#   }
# ]

Liveness probe:

curl http://localhost:8081/health/live
# Response (always 200 while the process is alive):
# {
#   "status": "ok"
# }

Readiness probe:

curl http://localhost:8081/health/ready
# Response when a leader has been elected (200):
# {
#   "status": "ok",
#   "leader_id": 1
# }

# Response during startup or mid-election (503):
# {
#   "status": "no_leader",
#   "leader_id": null
# }

gRPC API

SchedulerService (client-facing)

MethodDescriptionLeader Only
SubmitJob(command, image?)Submit a job; image overrides the server-default Docker image for this jobYes
CancelJob(job_id)Cancel a pending or running jobYes
GetJobStatus(job_id)Get job statusNo
ListJobs(page_size, page_token, status_filter, worker_id_filter, command_filter, created_after_ms, created_before_ms)List jobs (paginated, filterable)No
StreamJobs()Stream jobsNo
GetClusterStatus()Cluster infoForwarded to leader
GetRaftLogEntries()View Raft log entriesForwarded to leader
TransferLeadership(target)Transfer leadershipYes
DrainNode()Drain node for maintenanceNo

ListJobs request fields

FieldTypeDefaultDescription
page_sizeuint32100Max results per page (capped at 1000)
page_tokenstring“”Token from the previous response for the next page
status_filterJobStatusUNSPECIFIEDOnly return jobs with this status; 0/UNSPECIFIED = no filter
worker_id_filteruint640Only return jobs whose assigned_worker or executed_by matches; 0 = no filter
command_filterstring“”Case-insensitive substring match on the command; empty = no filter
created_after_msint640Only return jobs created at or after this Unix timestamp (ms); 0 = no bound
created_before_msint640Only return jobs created at or before this Unix timestamp (ms); 0 = no bound

total_count in the response reflects the filtered result set size (not the total queue size).

SubmitJob error codes

gRPC statusMeaningClient action
OKJob accepted and committed
FAILED_PRECONDITIONNode is not the leaderRedirect to the node ID in the message
RESOURCE_EXHAUSTEDLeader proposal queue is full (>256 pending)Retry with exponential backoff
DEADLINE_EXCEEDEDRaft did not commit the entry within 5 secondsRetry; may indicate a degraded cluster
UNAVAILABLENode is draining, or the Raft loop has stoppedRetry on a different node
INVALID_ARGUMENTEmpty command string, or command exceeds 1024 bytesFix the request

CancelJob error codes

gRPC statusMeaningClient action
OKJob cancelled and committed
FAILED_PRECONDITIONNode is not the leader, or job is already in a terminal stateRedirect to leader / check job status
NOT_FOUNDJob ID does not exist
RESOURCE_EXHAUSTEDLeader proposal queue is fullRetry with exponential backoff
DEADLINE_EXCEEDEDRaft did not commit within 5 secondsRetry
INVALID_ARGUMENTMalformed job UUIDFix the request

InternalService (node-to-node, not client-facing)

MethodDescription
GetJobOutput(job_id)Fetch job output from the node that executed it
WorkerHeartbeat(node_id)Worker liveness signal sent every 2 s to the leader; auto-registers on first call; workers not seen for 5 s are excluded from job assignment
ForwardJobStatus(updates)Follower worker forwards completed job status to the leader for Raft replication

RaftService (node-to-node, consensus protocol)

MethodDescription
AppendEntriesLog replication and heartbeats
RequestVoteLeader election voting
TimeoutNowTrigger immediate election on the target node (used by TransferLeadership)
InstallSnapshotTransfer compacted state to slow followers