The Hermes Bible — 繁體中文版

Section: Core Features · URL: https://hermesbible.com/docs/user-guide/features/api-server

API Server 將 hermes-agent 暴露為一個相容 OpenAI 的 HTTP 端點。任何支援 OpenAI 格式的前端 — Open WebUI、LobeChat、LibreChat、NextChat、ChatBox 以及數百個其他工具 — 都可以連接到 hermes-agent 並將其作為後端使用。

你的 Agent 會使用完整的工具集（終端機、檔案操作、網路搜尋、記憶體、技能）來處理請求，並傳回最終回應。在串流模式下，工具進度指示器會以 inline 方式顯示，讓前端能夠展示 Agent 正在執行的操作。

提示 — 一個後端就能搞定模型 + 工具

Hermes 本身需要設定好 provider 和工具後端，API Server 才能發揮作用。Nous Portal 訂閱可以一次解決這兩個需求 — 300 多個模型加上透過 Tool Gateway 提供的 web/image/TTS/browser 功能。在啟動 API Server 之前執行一次 hermes setup --portal，Open WebUI 或 LobeChat 等前端就能獲得一個功能完整的工具後端。

快速入門

1. 啟用 API Server

在 ~/.hermes/.env 中加入：

API_SERVER_ENABLED=true
API_SERVER_KEY=change-me-local-dev
# Optional: only if a browser must call Hermes directly
# API_SERVER_CORS_ORIGINS=http://localhost:3000

2. 啟動 Gateway

hermes gateway

你會看到：

[API Server] API server listening on http://127.0.0.1:8642

3. 連接前端

將任何相容 OpenAI 的客戶端指向 http://localhost:8642/v1：

# Test with curl
curl http://localhost:8642/v1/chat/completions \
  -H "Authorization: Bearer change-me-local-dev" \
  -H "Content-Type: application/json" \
  -d '{"model": "hermes-agent", "messages": [{"role": "user", "content": "Hello!"}]}'

或者連接 Open WebUI、LobeChat 或其他前端 — 詳見 Open WebUI 整合指南取得逐步說明。

API 端點

POST /v1/chat/completions

標準的 OpenAI Chat Completions 格式。無狀態 — 每個請求透過 messages 陣列包含完整的對話記錄。

請求：

{
  "model": "hermes-agent",
  "messages": [
    {"role": "system", "content": "You are a Python expert."},
    {"role": "user", "content": "Write a fibonacci function"}
  ],
  "stream": false
}

回應：

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1710000000,
  "model": "hermes-agent",
  "choices": [{
    "index": 0,
    "message": {"role": "assistant", "content": "Here's a fibonacci function..."},
    "finish_reason": "stop"
  }],
  "usage": {"prompt_tokens": 50, "completion_tokens": 200, "total_tokens": 250}
}

內嵌圖片輸入： 使用者訊息可以將 content 以 text 和 image_url 部件的陣列形式傳送。支援遠端 http(s) URL 和 data:image/... URL：

{
  "model": "hermes-agent",
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What is in this image?"},
        {"type": "image_url", "image_url": {"url": "https://example.com/cat.png", "detail": "high"}}
      ]
    }
  ]
}

上傳檔案（file / input_file / file_id）和非圖片的 data: URL 會回傳 400 unsupported_content_type。

串流（"stream": true）：以 Server-Sent Events (SSE) 回傳逐 token 的回應區塊。對於 Chat Completions，串流使用標準的 chat.completion.chunk 事件加上 Hermes 自定義的 hermes.tool.progress 事件，用於工具啟動的使用者體驗。對於 Responses，串流使用 OpenAI Responses 事件類型，例如 response.created、response.output_text.delta、response.output_item.added、response.output_item.done 和 response.completed。

串流中的工具進度：

Chat Completions：Hermes 會發出 event: hermes.tool.progress 以提供工具啟動的可見性，而不會污染已保存的助理文字。
Responses：Hermes 會在 SSE 串流中發出規範原生的 function_call 和 function_call_output 輸出項目，讓客戶端能夠即時渲染結構化的工具 UI。

POST /v1/responses

OpenAI Responses API 格式。支援透過 previous_response_id 進行伺服器端對話狀態管理 — 伺服器儲存完整的對話歷史記錄（包括工具呼叫和結果），因此多輪對話的上下文無需客戶端管理即可保留。

請求：

{
  "model": "hermes-agent",
  "input": "What files are in my project?",
  "instructions": "You are a helpful coding assistant.",
  "store": true
}

回應：

{
  "id": "resp_abc123",
  "object": "response",
  "status": "completed",
  "model": "hermes-agent",
  "output": [
    {"type": "function_call", "name": "terminal", "arguments": "{\"command\": \"ls\"}", "call_id": "call_1"},
    {"type": "function_call_output", "call_id": "call_1", "output": "README.md src/ tests/"},
    {"type": "message", "role": "assistant", "content": [{"type": "output_text", "text": "Your project has..."}]}
  ],
  "usage": {"input_tokens": 50, "output_tokens": 200, "total_tokens": 250}
}

內嵌圖片輸入： input[].content 可以包含 input_text 和 input_image 部件。支援遠端 URL 和 data:image/... URL：

{
  "model": "hermes-agent",
  "input": [
    {
      "role": "user",
      "content": [
        {"type": "input_text", "text": "Describe this screenshot."},
        {"type": "input_image", "image_url": "data:image/png;base64,iVBORw0K..."}
      ]
    }
  ]
}

上傳檔案（input_file / file_id）和非圖片的 data: URL 會回傳 400 unsupported_content_type。

使用 previous_response_id 進行多輪對話

透過串接回應來跨輪次維持完整上下文（包括工具呼叫）：

{
  "input": "Now show me the README",
  "previous_response_id": "resp_abc123"
}

伺服器會從儲存的回應鏈重建完整對話 — 所有先前的工具呼叫和結果都會保留。串接的請求也會共享相同的 session，因此多輪對話在儀表板和 session 歷史記錄中會顯示為單一條目。

命名對話

使用 conversation 參數取代追蹤回應 ID：

{"input": "Hello", "conversation": "my-project"}
{"input": "What's in src/?", "conversation": "my-project"}
{"input": "Run the tests", "conversation": "my-project"}

伺服器會自動串接到該對話中的最新回應。類似 gateway session 的 /title 指令。

GET /v1/responses/{id}

透過 ID 取回先前儲存的回應。

DELETE /v1/responses/{id}

刪除已儲存的回應。

GET /v1/models

列出 Agent 作為可用模型。公開的模型名稱預設為 profile 名稱（或預設 profile 使用 hermes-agent）。大多數前端都需要此端點來發現可用模型。

GET /v1/capabilities

回傳 API Server 穩定表面的機器可讀描述，供外部 UI、協調器和插件橋接使用。

{
  "object": "hermes.api_server.capabilities",
  "platform": "hermes-agent",
  "model": "hermes-agent",
  "auth": {"type": "bearer", "required": true},
  "features": {
    "chat_completions": true,
    "responses_api": true,
    "run_submission": true,
    "run_status": true,
    "run_events_sse": true,
    "run_stop": true
  }
}

在整合儀表板、瀏覽器 UI 或控制平面時使用此端點，讓它們能夠發現運行中的 Hermes 版本是否支援 runs、串流、取消和 session 連續性，而無需依賴內部 Python 實作。

GET /health

健康檢查。回傳 {"status": "ok"}。也可透過 GET /v1/health 存取，以符合期望 /v1/ 前綴的 OpenAI 相容客戶端。

GET /health/detailed

擴充的健康檢查，同時報告活躍 session、執行中的 Agent 和資源使用情況。適用於監控/可觀測性工具。

Runs API（適合串流的替代方案）

除了 /v1/chat/completions 和 /v1/responses 之外，伺服器還提供 runs API，用於長時間執行的 session，讓客戶端可以訂閱進度事件，而無需自行管理串流。

POST /v1/runs

建立新的 Agent run。回傳 run_id，可用於訂閱進度事件。

{
  "run_id": "run_abc123",
  "status": "started"
}

Run 接受簡單的 input 字串以及可選的 session_id、instructions、conversation_history 或 previous_response_id。提供 session_id 時，Hermes 會在 run 狀態中顯示它，讓外部 UI 可以將 run 與自己的對話 ID 進行關聯。

GET /v1/runs/{run_id}

查詢當前 run 狀態。適用於需要狀態資訊但不希望保持 SSE 連線的儀表板，或在導航後需要重新連接的 UI。

{
  "object": "hermes.run",
  "run_id": "run_abc123",
  "status": "completed",
  "session_id": "space-session",
  "model": "hermes-agent",
  "output": "Done.",
  "usage": {"input_tokens": 50, "output_tokens": 200, "total_tokens": 250}
}

在終止狀態（completed、failed 或 cancelled）之後，狀態會短暫保留，以支援輪詢和 UI 狀態同步。

GET /v1/runs/{run_id}/events

Server-Sent Events 串流，提供 run 的工具呼叫進度、token 差異和生命週期事件。專為儀表板和厚客戶端設計，支援在不遺失狀態的情況下附著/脫離。

POST /v1/runs/{run_id}/stop

中斷正在執行的 Agent 回合。端點會立即回傳 {"status": "stopping"}，同時 Hermes 會要求活躍 Agent 在下一個安全中斷點停止。

POST /v1/runs/{run_id}/approval

解決等待人類決策的 run 的待處理審批（例如，受審批政策限制的工具呼叫）。請求主體包含審批決策；run 會在記錄決策後恢復執行。此端點會在 /v1/capabilities 中以 run_approval 功能標示，讓外部 UI 可以在顯示審批提示之前偵測到支援。

Jobs API（背景排程工作）

伺服器提供輕量級的 jobs CRUD 介面，用於從遠端客戶端管理排程/背景 Agent run。所有端點都受相同的 bearer 驗證保護。

GET /api/jobs

列出所有排程任務。

POST /api/jobs

建立新的排程任務。請求主體接受與 hermes cron 相同的格式 — prompt、schedule、skills、provider 覆蓋、delivery target。

GET /api/jobs/{job_id}

取得單一任務的定義和最近執行狀態。

PATCH /api/jobs/{job_id}

更新現有任務的欄位（prompt、schedule 等）。部分更新會合併到現有設定中。

DELETE /api/jobs/{job_id}

移除任務。同時取消任何正在進行的 run。

POST /api/jobs/{job_id}/pause

暫停任務而不刪除它。下次排程執行的時間戳會被暫停，直到恢復為止。

POST /api/jobs/{job_id}/resume

恢復先前暫停的任務。

POST /api/jobs/{job_id}/run

立即觸發任務執行，不受排程限制。

Sessions API（透過 REST 管理 Session）

外部 UI 可以透過 REST 管理 Hermes session，而無需啟動儀表板。所有端點都受 API_SERVER_KEY 保護，位於 /api/sessions/* 下。

方法	路徑	說明
`GET`	`/api/sessions`	列出 session（分頁 — `limit`、`offset`、`source`、`include_children`）
`POST`	`/api/sessions`	建立空白 session
`GET`	`/api/sessions/{id}`	讀取 session 中繼資料
`PATCH`	`/api/sessions/{id}`	更新標題或 `end_reason`
`DELETE`	`/api/sessions/{id}`	刪除 session
`GET`	`/api/sessions/{id}/messages`	Session 的訊息歷史記錄
`POST`	`/api/sessions/{id}/fork`	透過 `SessionDB` 血緣分岔 session（符合 CLI `/branch` 語意）
`POST`	`/api/sessions/{id}/chat`	執行一次同步 Agent 回合
`POST`	`/api/sessions/{id}/chat/stream`	單一回合的 SSE 封裝 — 發出 `assistant.delta`、`tool.started`、`tool.completed`、`run.completed` 事件

/v1/capabilities 透過 session_* 功能標誌和 endpoints.session_* 項目公開完整的 API 表面，讓外部 UI 可以偵測支援並安全降級。chat 和 chat/stream 載荷支援內嵌圖片（多模態感知路徑）。

# fork a session and run one turn
curl -X POST http://localhost:8642/api/sessions/$ID/fork \
  -H "Authorization: Bearer $API_SERVER_KEY" \
  -d '{"title": "explore alt path"}'

# stream a turn over SSE
curl -N -X POST http://localhost:8642/api/sessions/$ID/chat/stream \
  -H "Authorization: Bearer $API_SERVER_KEY" \
  -d '{"input": "what files changed in the last hour?"}'

技能和工具集探索

GET /v1/skills 和 GET /v1/toolsets 讓外部客戶端可以透過 REST 以確定性的方式列舉 Agent 的能力，而無需詢問模型。兩者都是唯讀的，受 API_SERVER_KEY 保護。

curl http://localhost:8642/v1/skills \
  -H "Authorization: Bearer $API_SERVER_KEY"
# → [{"name": "github-pr-workflow", "description": "...", "category": "..."}, ...]

curl http://localhost:8642/v1/toolsets \
  -H "Authorization: Bearer $API_SERVER_KEY"
# → [{"name": "core", "label": "...", "description": "...", "enabled": true,
#     "configured": true, "tools": ["read_file", "write_file", ...]}, ...]

/v1/skills 回傳技能中心內部使用的相同中繼資料。/v1/toolsets 回傳為 api_server 平台解析的工具集，包含每個工具集展開後的具體 tools 清單。兩者都在 /v1/capabilities 的 endpoints.* 中公開。

長期記憶體範圍設定（`X-Hermes-Session-Key`）

像 Open WebUI 這樣的多使用者前端需要一個穩定的、按通道區分的長期記憶體（Honcho 等）識別碼，該識別碼獨立於對話範圍的 X-Hermes-Session-Id（後者會在 /new 時輪替）。在 /v1/chat/completions、/v1/responses 或 /v1/runs 上傳遞 X-Hermes-Session-Key，Hermes 會將其傳遞給 AIAgent(gateway_session_key=...)，Honcho 記憶體提供者會用它來推導穩定的作用域。

POST /v1/chat/completions HTTP/1.1
Authorization: Bearer ***
X-Hermes-Session-Id: transcript-alpha
X-Hermes-Session-Key: agent:main:webui:dm:user-42

規則：最長 256 字元，控制字元（\r、\n、\x00）會被拒絕，值會在回應中回傳（JSON + SSE）。/v1/capabilities 透過 "session_key_header": "X-Hermes-Session-Key" 公開支援。若未提供 key，Honcho 的 per-session 策略會為每個 session_id 產生不同的作用域 — 這正是 Hermes 在此功能之前的預設行為。

System Prompt 處理

當前端傳送 system 訊息（Chat Completions）或 instructions 欄位（Responses API）時，hermes-agent 會將其疊加在核心 system prompt 之上。你的 Agent 保留所有工具、記憶體和技能 — 前端的 system prompt 只會增加額外的指令。

這表示你可以為不同前端自訂行為，同時不損失任何能力：

Open WebUI system prompt：「你是 Python 專家。始終包含型別標註。」
Agent 仍然擁有終端機、檔案工具、網路搜尋、記憶體等功能。

驗證

透過 Authorization 標頭進行 Bearer token 驗證：

Authorization: Bearer ***

透過 API_SERVER_KEY 環境變數設定金鑰。如果需要瀏覽器直接呼叫 Hermes，請同時設定 API_SERVER_CORS_ORIGINS 為明確的允許清單。

警告 — 安全性

API Server 提供對 hermes-agent 完整工具集的存取，包括終端機指令。API_SERVER_KEY 在每個部署中都是必填的，包括預設的 127.0.0.1 loopback 綁定。當你明確允許瀏覽器呼叫者時，請將 API_SERVER_CORS_ORIGINS 保持精確，以控制瀏覽器存取。

設定

環境變數

變數	預設值	說明
`API_SERVER_ENABLED`	`false`	啟用 API Server
`API_SERVER_PORT`	`8642`	HTTP 伺服器連接埠
`API_SERVER_HOST`	`127.0.0.1`	綁定位址（預設僅限 localhost）
`API_SERVER_KEY`	（必填）	用於驗證的 Bearer token
`API_SERVER_CORS_ORIGINS`	（無）	以逗號分隔的允許瀏覽器來源
`API_SERVER_MODEL_NAME`	（profile 名稱）	`/v1/models` 上的模型名稱。預設使用 profile 名稱，或預設 profile 使用 `hermes-agent`。

config.yaml

# Not yet supported — use environment variables.
# config.yaml support coming in a future release.

安全標頭

所有回應都包含安全標頭：

X-Content-Type-Options: nosniff — 防止 MIME 類型嗅探
Referrer-Policy: no-referrer — 防止 referrer 洩漏

CORS

API Server 預設不會啟用瀏覽器 CORS。

若需要直接的瀏覽器存取，請設定明確的允許清單：

API_SERVER_CORS_ORIGINS=http://localhost:3000,http://127.0.0.1:3000

啟用 CORS 時：

預檢回應包含 Access-Control-Max-Age: 600（10 分鐘快取）
SSE 串流回應包含 CORS 標頭，讓瀏覽器 EventSource 客戶端正常運作
Idempotency-Key 是允許的請求標頭 — 客戶端可以用於去重（回應會按 key 快取 5 分鐘）

大多數已記錄的前端如 Open WebUI 都是伺服器對伺服器連接，完全不需要 CORS。

相容前端

任何支援 OpenAI API 格式的前端都可以使用。已測試/已記錄的整合：

前端	Stars	連接方式
Open WebUI	126k	完整指南
LobeChat	73k	自訂 provider 端點
LibreChat	34k	librechat.yaml 中的自訂端點
AnythingLLM	56k	通用 OpenAI provider
NextChat	87k	BASE_URL 環境變數
ChatBox	39k	API Host 設定
Jan	26k	Remote model 設定
HF Chat-UI	8k	OPENAI_BASE_URL
big-AGI	7k	自訂端點
OpenAI Python SDK	—	`OpenAI(base_url="http://localhost:8642/v1")`
curl	—	直接 HTTP 請求

使用 Profiles 的多使用者設定

若要為多位使用者提供各自獨立的 Hermes 實例（獨立設定、記憶體、技能），請使用 profiles：

# Create a profile per user
hermes profile create alice
hermes profile create bob

# Configure each profile's API server on a different port. API_SERVER_* are env
# vars (not config.yaml keys), so write them to each profile's .env:
cat >> ~/.hermes/profiles/alice/.env <<EOF
API_SERVER_ENABLED=true
API_SERVER_PORT=8643
API_SERVER_KEY=alice-secret
EOF

cat >> ~/.hermes/profiles/bob/.env <<EOF
API_SERVER_ENABLED=true
API_SERVER_PORT=8644
API_SERVER_KEY=bob-secret
EOF

# Start each profile's gateway
hermes -p alice gateway &
hermes -p bob gateway &

每個 profile 的 API Server 會自動將 profile 名稱作為模型 ID 公開：

http://localhost:8643/v1/models → 模型 alice
http://localhost:8644/v1/models → 模型 bob

在 Open WebUI 中，將每個 profile 作為獨立連接新增。模型下拉選單會顯示 alice 和 bob 為不同的模型，每個都由完全獨立的 Hermes 實例提供支援。詳見 Open WebUI 指南。

限制

回應儲存 — 已儲存的回應（用於 previous_response_id）以 SQLite 持久化，並在 gateway 重啟後保留。最多 100 個儲存回應（LRU 淘汰）。
不支援檔案上傳 — 內嵌圖片在 /v1/chat/completions 和 /v1/responses 上都受支援，但上傳檔案（file、input_file、file_id）和非圖片文件輸入不透過 API 提供。
Model 欄位僅為裝飾 — 請求中的 model 欄位會被接受，但實際使用的 LLM 模型是在伺服器端的 config.yaml 中設定。

Proxy 模式

API Server 也作為 gateway proxy 模式的後端。當另一個 Hermes gateway 實例以 GATEWAY_PROXY_URL 指向此 API Server 時，它會將所有訊息轉發到此處，而不是執行自己的 Agent。這支援分離式部署 — 例如，Docker 容器處理 Matrix E2EE 後轉發到主機端的 Agent。

完整設定指南請參閱 Matrix Proxy 模式。