Files

Hermes Agent 89bc5e8c15 docs: Finalize all design documents

Signed-off-by: Hermes Agent <hermes@nosuchhost>

2026-06-16 09:00:33 -04:00

6.9 KiB

Raw Blame History

Design Document: Backend & Real-Time Sync Feature Specification

Overview

The backend service aggregates Kubernetes API data and provides real-time WebSocket connections for shell, logs, and multi-user collaboration.

Backend Architecture

Recommended Stack

Language: Go (for Kubernetes client integration)
Kubernetes Client: kubernetes/client-go
WebSockets: gorilla/websocket
CRDT Layer: Yjs + y-websocket
Storage: Redis or in-memory (single-node), y-leveldb/y-mongodb (multi-node)

Service Responsibilities

Wrap kubectl API access
Provide WebSocket multiplexer for:
- Shell sessions
- Log streaming
- Watch API updates
Handle CRDT sync for shared workspace state
Broadcast user presence/awareness

Kubernetes Data Sources

Required Resources

Namespaces
Deployments
StatefulSets
DaemonSets
Services
Ingresses
ConfigMaps
Secrets (with base64 decoding server-side)
PVCs
Events
CRDs

Live Updates

Implementation: Kubernetes Watch API

GET /api/v1/namespaces/{ns}/pods?watch=true
GET /api/v1/namespaces/{ns}/services?watch=true
GET /apis/apps/v1/namespaces/{ns}/deployments?watch=true
# etc for each resource type

Aggregation Strategy

Single endpoint: /api/watch that multiplexes all watch streams
Or: Resource-specific endpoints (/api/watch/pods, /api/watch/services, etc.)
Include namespace filtering in requests

Secret Decoding

Option 1: Server-side decode before sending
Option 2: Client-side decode (safer, never log)
If client-side: Include decoded flag in response
Never log decoded values

WebSocket Endpoints

Shell Endpoint

/ws/shell

Upstream: kubectl exec -it <pod> -n <ns> -- sh
Protocol: Bidirectional WebSocket
- Client → Server: Keycodes, terminal resize
- Server → Client: stdout, stderr
Terminal: xterm.js on client
Backend: Use k8s Go client's Exec API with streaming

Logs Endpoint

/ws/logs

Upstream: kubectl logs --follow <pod> -n <ns>
Protocol: WebSocket text frames
Streaming: Each line as separate message
Client: Auto-scroll unless user scrolled up

Watch Endpoint

/ws/watch

Purpose: Broadcast resource changes to all connected clients
Payload: JSON Patch or full object update
Filtering: Per-client namespace/resource filters

Multi-User Sync (The Yard)

CRDT Strategy

Library: Yjs (proven CRDT implementation)
Provider:
- Single-node: in-memory or Redis
- Multi-node: y-leveldb, y-mongodb, or y-webrtc
WebSocket: y-websocket for real-time sync

Synced State

Only sync the structure of the workspace, not content:

✅ Krate positions (wx, wy)
✅ Krate existence (create/delete)
✅ Krate minimize state
✅ Window layout (grid positions, sizes)
✅ Room/canvas camera state (optional)
❌ Window content (logs, YAML, shell) — each client streams independently

Presence/Awareness

Broadcast ephemeral user state:

{
  userId: string,
  name: string,
  color: string,
  cursorPosition: { x, y } | null,
  activeKrateId: string | null,
  spotlightQuery: string | null,
  timestamp: number
}

Implementation:

Each client publishes to Yjs Awareness channel
Admin panel subscribes to awareness updates
Update interval: every 5 seconds or on significant change

Real-Time Shell Implementation

Client-Side (xterm.js)

1. Connect WebSocket
2. Initialize xterm with {
   rows: 24,
   cols: 80,
   fontFamily: 'IBM Plex Mono',
   fontSize: 12
}
3. Attach xterm to WebSocket
   - term.write() → send to server
   - WebSocket.onmessage → term.write()
4. Handle resize: term.resize(cols, rows)
5. Attach to DOM

Server-Side (Go + k8s client)

1. Extract pod/ns from URL: /ws/shell?pod=xxx&ns=yyy
2. Use k8s-exec Go library:
   req := clientset.CoreV1().RESTClient().Post()
   req.Name(pod).Namespace(ns).Resource("pods").SubResource("exec")
   req.VersionedParams(&v1.ExecOptions{
     Stdin: true,
     Stdout: true,
     Stderr: true,
     Terminal: true,
   }, scheme.ParameterCodec)
   // Create exec stream
3. Duplex: WebSocket ↔ Exec stream
4. Handle close gracefully

Real-Time Logs Implementation

Client-Side

1. Connect WebSocket
2. Maintain scroll position state
3. On new message:
   - Append to log buffer
   - If wasAtBottom: scroll to bottom
   - Else: keep existing scroll position
4. Colorize lines on receive (ERROR/WARN/INFO patterns)

Server-Side

1. kubectl logs --follow logic via exec stream
2. Stream lines as WebSocket messages
3. Add keep-alive ping every 30s
4. Reconnect on disconnect (client-side)

Backend API Endpoints

HTTP Endpoints (REST)

GET  /api/cluster              # Cluster metadata
GET  /api/resources            # All resources (cached, for initial load)
GET  /api/resources/pods       # Filtered by namespace (optional)
GET  /api/resources/deployments
GET  /api/resources/services
GET  /api/resources/secrets
GET  /api/resources/configmaps
GET  /api/resources/namespaces
GET  /api/resources/crds
GET  /api/resource/{kind}/{name}?ns={namespace}
GET  /api/health               # Backend health check

WebSocket Endpoints

/ws/shell                      # Shell session
/ws/logs                       # Log streaming
/ws/watch                      # Resource watch updates
/ws/sync                       # CRDT sync + awareness

Error Handling

WebSocket Disconnects

Shell: Notify user, keep terminal buffer (don't clear)
Logs: Auto-reconnect with exponential backoff
Watch: Auto-reconnect, fetch delta on reconnect

Backpressure

Shell: Buffer output if client slow, drop old lines if buffer full
Logs: Same strategy
Watch: Batch updates if high frequency

Connection Limits

Per-user limits: max concurrent shells/logs per namespace
Global limits: max connections
Reject with 429 if limit exceeded

Security Considerations

Authentication

JWT or session cookie
Validate token on WebSocket upgrade
Bind connections to user ID

Authorization

Namespace-level RBAC
User can only access resources they have permission for
Backend must enforce (not just client-side filtering)

Secret Handling

Never log decoded secrets
Sanitize before display
Consider explicit user action to reveal (future enhancement)

Scalability

Single-Node (Development)

In-memory Yjs provider
Redis for presence
Direct k8s API calls

Multi-Node (Production)

Shared Redis for Yjs CRDT state
WebSocket leader election (one node handles all CRDT updates)
or: y-webrtc for peer-to-peer (simpler, less reliable)

Health Monitoring

WebSocket connection count
Active shell sessions
API latency (p50, p95, p99)
CRDT sync lag
Error rates by endpoint

Testing Strategy

Unit tests for Kubernetes API wrappers
Integration tests with kind (Kubernetes in Docker)
Load tests for WebSocket connections
CRDT conflict resolution tests

6.9 KiB Raw Blame History

Design Document: Backend & Real-Time Sync Feature Specification

Overview

Backend Architecture

Recommended Stack

Service Responsibilities

Kubernetes Data Sources

Required Resources

Live Updates

Aggregation Strategy

Secret Decoding

WebSocket Endpoints

Shell Endpoint

Logs Endpoint

Watch Endpoint

Multi-User Sync (The Yard)

CRDT Strategy

Synced State

Presence/Awareness

Real-Time Shell Implementation

Client-Side (xterm.js)

Server-Side (Go + k8s client)

Real-Time Logs Implementation

Client-Side

Server-Side

Backend API Endpoints

HTTP Endpoints (REST)

WebSocket Endpoints

Error Handling

WebSocket Disconnects

Backpressure

Connection Limits

Security Considerations

Authentication

Authorization

Secret Handling

Scalability

Single-Node (Development)

Multi-Node (Production)

Health Monitoring

Testing Strategy

6.9 KiB

Raw Blame History