feat: Add comprehensive design documentation for Krates

- Canvas: Infinite zoomable workspace with LOD and navigation
- Spotlight: Fuzzy search with type filters and view shortcuts
- Krate: Window group container with non-overlapping placement
- Detail Window: YAML/Describe/Logs/Shell views with maximize
- Top Bar: Cluster info, user presence, admin toggle
- Admin Drawer: Multi-user presence and spectate functionality
- Minimap: Browse and navigate canvas overview
- Collection Window: List/tree views with filtering and sorting
- Shell/Logs: Real-time terminal and log streaming
- Backend: Go service with K8s API, WebSocket handlers, CRDT sync
- Architecture: Full project structure and tech stack
This commit is contained in:
Hermes Agent
2026-06-16 08:32:47 -04:00
parent b31fa3cde5
commit 78f19cde7d
12 changed files with 2709 additions and 2 deletions

264
design/backend.md Normal file
View File

@@ -0,0 +1,264 @@
# Backend & Real-Time Sync Feature Specification
## Overview
The backend service aggregates Kubernetes API data and provides real-time WebSocket connections for shell, logs, and multi-user collaboration.
## Backend Architecture
### Recommended Stack
- **Language**: Go (for Kubernetes client integration)
- **Kubernetes Client**: kubernetes/client-go
- **WebSockets**: gorilla/websocket
- **CRDT Layer**: Yjs + y-websocket
- **Storage**: Redis or in-memory (single-node), y-leveldb/y-mongodb (multi-node)
### Service Responsibilities
1. Wrap kubectl API access
2. Provide WebSocket multiplexer for:
- Shell sessions
- Log streaming
- Watch API updates
3. Handle CRDT sync for shared workspace state
4. Broadcast user presence/awareness
## Kubernetes Data Sources
### Required Resources
- Namespaces
- Deployments
- StatefulSets
- DaemonSets
- Services
- Ingresses
- ConfigMaps
- Secrets (with base64 decoding server-side)
- PVCs
- Events
- CRDs
### Live Updates
Implementation: Kubernetes Watch API
```
GET /api/v1/namespaces/{ns}/pods?watch=true
GET /api/v1/namespaces/{ns}/services?watch=true
GET /apis/apps/v1/namespaces/{ns}/deployments?watch=true
# etc for each resource type
```
### Aggregation Strategy
- Single endpoint: `/api/watch` that multiplexes all watch streams
- Or: Resource-specific endpoints (`/api/watch/pods`, `/api/watch/services`, etc.)
- Include namespace filtering in requests
### Secret Decoding
- Option 1: Server-side decode before sending
- Option 2: Client-side decode (safer, never log)
- If client-side: Include decoded flag in response
- **Never log decoded values**
## WebSocket Endpoints
### Shell Endpoint
```
/ws/shell
```
- **Upstream**: `kubectl exec -it <pod> -n <ns> -- sh`
- **Protocol**: Bidirectional WebSocket
- Client → Server: Keycodes, terminal resize
- Server → Client: stdout, stderr
- **Terminal**: xterm.js on client
- **Backend**: Use k8s Go client's Exec API with streaming
### Logs Endpoint
```
/ws/logs
```
- **Upstream**: `kubectl logs --follow <pod> -n <ns>`
- **Protocol**: WebSocket text frames
- **Streaming**: Each line as separate message
- **Client**: Auto-scroll unless user scrolled up
### Watch Endpoint
```
/ws/watch
```
- **Purpose**: Broadcast resource changes to all connected clients
- **Payload**: JSON Patch or full object update
- **Filtering**: Per-client namespace/resource filters
## Multi-User Sync (The Yard)
### CRDT Strategy
- **Library**: Yjs (proven CRDT implementation)
- **Provider**:
- Single-node: in-memory or Redis
- Multi-node: y-leveldb, y-mongodb, or y-webrtc
- **WebSocket**: y-websocket for real-time sync
### Synced State
Only sync the *structure* of the workspace, not content:
- ✅ Krate positions (wx, wy)
- ✅ Krate existence (create/delete)
- ✅ Krate minimize state
- ✅ Window layout (grid positions, sizes)
- ✅ Room/canvas camera state (optional)
- ❌ Window content (logs, YAML, shell) — each client streams independently
### Presence/Awareness
Broadcast ephemeral user state:
```
{
userId: string,
name: string,
color: string,
cursorPosition: { x, y } | null,
activeKrateId: string | null,
spotlightQuery: string | null,
timestamp: number
}
```
Implementation:
- Each client publishes to Yjs Awareness channel
- Admin panel subscribes to awareness updates
- Update interval: every 5 seconds or on significant change
## Real-Time Shell Implementation
### Client-Side (xterm.js)
```
1. Connect WebSocket
2. Initialize xterm with {
rows: 24,
cols: 80,
fontFamily: 'IBM Plex Mono',
fontSize: 12
}
3. Attach xterm to WebSocket
- term.write() → send to server
- WebSocket.onmessage → term.write()
4. Handle resize: term.resize(cols, rows)
5. Attach to DOM
```
### Server-Side (Go + k8s client)
```
1. Extract pod/ns from URL: /ws/shell?pod=xxx&ns=yyy
2. Use k8s-exec Go library:
req := clientset.CoreV1().RESTClient().Post()
req.Name(pod).Namespace(ns).Resource("pods").SubResource("exec")
req.VersionedParams(&v1.ExecOptions{
Stdin: true,
Stdout: true,
Stderr: true,
Terminal: true,
}, scheme.ParameterCodec)
// Create exec stream
3. Duplex: WebSocket ↔ Exec stream
4. Handle close gracefully
```
## Real-Time Logs Implementation
### Client-Side
```
1. Connect WebSocket
2. Maintain scroll position state
3. On new message:
- Append to log buffer
- If wasAtBottom: scroll to bottom
- Else: keep existing scroll position
4. Colorize lines on receive (ERROR/WARN/INFO patterns)
```
### Server-Side
```
1. kubectl logs --follow logic via exec stream
2. Stream lines as WebSocket messages
3. Add keep-alive ping every 30s
4. Reconnect on disconnect (client-side)
```
## Backend API Endpoints
### HTTP Endpoints (REST)
```
GET /api/cluster # Cluster metadata
GET /api/resources # All resources (cached, for initial load)
GET /api/resources/pods # Filtered by namespace (optional)
GET /api/resources/deployments
GET /api/resources/services
GET /api/resources/secrets
GET /api/resources/configmaps
GET /api/resources/namespaces
GET /api/resources/crds
GET /api/resource/{kind}/{name}?ns={namespace}
GET /api/health # Backend health check
```
### WebSocket Endpoints
```
/ws/shell # Shell session
/ws/logs # Log streaming
/ws/watch # Resource watch updates
/ws/sync # CRDT sync + awareness
```
## Error Handling
### WebSocket Disconnects
- Shell: Notify user, keep terminal buffer (don't clear)
- Logs: Auto-reconnect with exponential backoff
- Watch: Auto-reconnect, fetch delta on reconnect
### Backpressure
- Shell: Buffer output if client slow, drop old lines if buffer full
- Logs: Same strategy
- Watch: Batch updates if high frequency
### Connection Limits
- Per-user limits: max concurrent shells/logs per namespace
- Global limits: max connections
- Reject with 429 if limit exceeded
## Security Considerations
### Authentication
- JWT or session cookie
- Validate token on WebSocket upgrade
- Bind connections to user ID
### Authorization
- Namespace-level RBAC
- User can only access resources they have permission for
- Backend must enforce (not just client-side filtering)
### Secret Handling
- Never log decoded secrets
- Sanitize before display
- Consider explicit user action to reveal (future enhancement)
## Scalability
### Single-Node (Development)
- In-memory Yjs provider
- Redis for presence
- Direct k8s API calls
### Multi-Node (Production)
- Shared Redis for Yjs CRDT state
- WebSocket leader election (one node handles all CRDT updates)
- or: y-webrtc for peer-to-peer (simpler, less reliable)
## Health Monitoring
- WebSocket connection count
- Active shell sessions
- API latency (p50, p95, p99)
- CRDT sync lag
- Error rates by endpoint
## Testing Strategy
1. Unit tests for Kubernetes API wrappers
2. Integration tests with kind (Kubernetes in Docker)
3. Load tests for WebSocket connections
4. CRDT conflict resolution tests