6.9 KiB
6.9 KiB
Design Document: Backend & Real-Time Sync Feature Specification
Overview
The backend service aggregates Kubernetes API data and provides real-time WebSocket connections for shell, logs, and multi-user collaboration.
Backend Architecture
Recommended Stack
- Language: Go (for Kubernetes client integration)
- Kubernetes Client: kubernetes/client-go
- WebSockets: gorilla/websocket
- CRDT Layer: Yjs + y-websocket
- Storage: Redis or in-memory (single-node), y-leveldb/y-mongodb (multi-node)
Service Responsibilities
- Wrap kubectl API access
- Provide WebSocket multiplexer for:
- Shell sessions
- Log streaming
- Watch API updates
- Handle CRDT sync for shared workspace state
- Broadcast user presence/awareness
Kubernetes Data Sources
Required Resources
- Namespaces
- Deployments
- StatefulSets
- DaemonSets
- Services
- Ingresses
- ConfigMaps
- Secrets (with base64 decoding server-side)
- PVCs
- Events
- CRDs
Live Updates
Implementation: Kubernetes Watch API
GET /api/v1/namespaces/{ns}/pods?watch=true
GET /api/v1/namespaces/{ns}/services?watch=true
GET /apis/apps/v1/namespaces/{ns}/deployments?watch=true
# etc for each resource type
Aggregation Strategy
- Single endpoint:
/api/watchthat multiplexes all watch streams - Or: Resource-specific endpoints (
/api/watch/pods,/api/watch/services, etc.) - Include namespace filtering in requests
Secret Decoding
- Option 1: Server-side decode before sending
- Option 2: Client-side decode (safer, never log)
- If client-side: Include decoded flag in response
- Never log decoded values
WebSocket Endpoints
Shell Endpoint
/ws/shell
- Upstream:
kubectl exec -it <pod> -n <ns> -- sh - Protocol: Bidirectional WebSocket
- Client → Server: Keycodes, terminal resize
- Server → Client: stdout, stderr
- Terminal: xterm.js on client
- Backend: Use k8s Go client's Exec API with streaming
Logs Endpoint
/ws/logs
- Upstream:
kubectl logs --follow <pod> -n <ns> - Protocol: WebSocket text frames
- Streaming: Each line as separate message
- Client: Auto-scroll unless user scrolled up
Watch Endpoint
/ws/watch
- Purpose: Broadcast resource changes to all connected clients
- Payload: JSON Patch or full object update
- Filtering: Per-client namespace/resource filters
Multi-User Sync (The Yard)
CRDT Strategy
- Library: Yjs (proven CRDT implementation)
- Provider:
- Single-node: in-memory or Redis
- Multi-node: y-leveldb, y-mongodb, or y-webrtc
- WebSocket: y-websocket for real-time sync
Synced State
Only sync the structure of the workspace, not content:
- ✅ Krate positions (wx, wy)
- ✅ Krate existence (create/delete)
- ✅ Krate minimize state
- ✅ Window layout (grid positions, sizes)
- ✅ Room/canvas camera state (optional)
- ❌ Window content (logs, YAML, shell) — each client streams independently
Presence/Awareness
Broadcast ephemeral user state:
{
userId: string,
name: string,
color: string,
cursorPosition: { x, y } | null,
activeKrateId: string | null,
spotlightQuery: string | null,
timestamp: number
}
Implementation:
- Each client publishes to Yjs Awareness channel
- Admin panel subscribes to awareness updates
- Update interval: every 5 seconds or on significant change
Real-Time Shell Implementation
Client-Side (xterm.js)
1. Connect WebSocket
2. Initialize xterm with {
rows: 24,
cols: 80,
fontFamily: 'IBM Plex Mono',
fontSize: 12
}
3. Attach xterm to WebSocket
- term.write() → send to server
- WebSocket.onmessage → term.write()
4. Handle resize: term.resize(cols, rows)
5. Attach to DOM
Server-Side (Go + k8s client)
1. Extract pod/ns from URL: /ws/shell?pod=xxx&ns=yyy
2. Use k8s-exec Go library:
req := clientset.CoreV1().RESTClient().Post()
req.Name(pod).Namespace(ns).Resource("pods").SubResource("exec")
req.VersionedParams(&v1.ExecOptions{
Stdin: true,
Stdout: true,
Stderr: true,
Terminal: true,
}, scheme.ParameterCodec)
// Create exec stream
3. Duplex: WebSocket ↔ Exec stream
4. Handle close gracefully
Real-Time Logs Implementation
Client-Side
1. Connect WebSocket
2. Maintain scroll position state
3. On new message:
- Append to log buffer
- If wasAtBottom: scroll to bottom
- Else: keep existing scroll position
4. Colorize lines on receive (ERROR/WARN/INFO patterns)
Server-Side
1. kubectl logs --follow logic via exec stream
2. Stream lines as WebSocket messages
3. Add keep-alive ping every 30s
4. Reconnect on disconnect (client-side)
Backend API Endpoints
HTTP Endpoints (REST)
GET /api/cluster # Cluster metadata
GET /api/resources # All resources (cached, for initial load)
GET /api/resources/pods # Filtered by namespace (optional)
GET /api/resources/deployments
GET /api/resources/services
GET /api/resources/secrets
GET /api/resources/configmaps
GET /api/resources/namespaces
GET /api/resources/crds
GET /api/resource/{kind}/{name}?ns={namespace}
GET /api/health # Backend health check
WebSocket Endpoints
/ws/shell # Shell session
/ws/logs # Log streaming
/ws/watch # Resource watch updates
/ws/sync # CRDT sync + awareness
Error Handling
WebSocket Disconnects
- Shell: Notify user, keep terminal buffer (don't clear)
- Logs: Auto-reconnect with exponential backoff
- Watch: Auto-reconnect, fetch delta on reconnect
Backpressure
- Shell: Buffer output if client slow, drop old lines if buffer full
- Logs: Same strategy
- Watch: Batch updates if high frequency
Connection Limits
- Per-user limits: max concurrent shells/logs per namespace
- Global limits: max connections
- Reject with 429 if limit exceeded
Security Considerations
Authentication
- JWT or session cookie
- Validate token on WebSocket upgrade
- Bind connections to user ID
Authorization
- Namespace-level RBAC
- User can only access resources they have permission for
- Backend must enforce (not just client-side filtering)
Secret Handling
- Never log decoded secrets
- Sanitize before display
- Consider explicit user action to reveal (future enhancement)
Scalability
Single-Node (Development)
- In-memory Yjs provider
- Redis for presence
- Direct k8s API calls
Multi-Node (Production)
- Shared Redis for Yjs CRDT state
- WebSocket leader election (one node handles all CRDT updates)
- or: y-webrtc for peer-to-peer (simpler, less reliable)
Health Monitoring
- WebSocket connection count
- Active shell sessions
- API latency (p50, p95, p99)
- CRDT sync lag
- Error rates by endpoint
Testing Strategy
- Unit tests for Kubernetes API wrappers
- Integration tests with kind (Kubernetes in Docker)
- Load tests for WebSocket connections
- CRDT conflict resolution tests