feat: Add comprehensive design documentation for Krates

- Canvas: Infinite zoomable workspace with LOD and navigation - Spotlight: Fuzzy search with type filters and view shortcuts - Krate: Window group container with non-overlapping placement - Detail Window: YAML/Describe/Logs/Shell views with maximize - Top Bar: Cluster info, user presence, admin toggle - Admin Drawer: Multi-user presence and spectate functionality - Minimap: Browse and navigate canvas overview - Collection Window: List/tree views with filtering and sorting - Shell/Logs: Real-time terminal and log streaming - Backend: Go service with K8s API, WebSocket handlers, CRDT sync - Architecture: Full project structure and tech stack
2026-06-16 08:32:47 -04:00
parent b31fa3cde5
commit 78f19cde7d
12 changed files with 2709 additions and 2 deletions
--- a/design/backend.md
+++ b/design/backend.md
@@ -0,0 +1,264 @@
+# Backend & Real-Time Sync Feature Specification
+
+## Overview
+The backend service aggregates Kubernetes API data and provides real-time WebSocket connections for shell, logs, and multi-user collaboration.
+
+## Backend Architecture
+
+### Recommended Stack
+- **Language**: Go (for Kubernetes client integration)
+- **Kubernetes Client**: kubernetes/client-go
+- **WebSockets**: gorilla/websocket
+- **CRDT Layer**: Yjs + y-websocket
+- **Storage**: Redis or in-memory (single-node), y-leveldb/y-mongodb (multi-node)
+
+### Service Responsibilities
+1. Wrap kubectl API access
+2. Provide WebSocket multiplexer for:
+   - Shell sessions
+   - Log streaming
+   - Watch API updates
+3. Handle CRDT sync for shared workspace state
+4. Broadcast user presence/awareness
+
+## Kubernetes Data Sources
+
+### Required Resources
+- Namespaces
+- Deployments
+- StatefulSets
+- DaemonSets
+- Services
+- Ingresses
+- ConfigMaps
+- Secrets (with base64 decoding server-side)
+- PVCs
+- Events
+- CRDs
+
+### Live Updates
+Implementation: Kubernetes Watch API
+```
+GET /api/v1/namespaces/{ns}/pods?watch=true
+GET /api/v1/namespaces/{ns}/services?watch=true
+GET /apis/apps/v1/namespaces/{ns}/deployments?watch=true
+# etc for each resource type
+```
+
+### Aggregation Strategy
+- Single endpoint: `/api/watch` that multiplexes all watch streams
+- Or: Resource-specific endpoints (`/api/watch/pods`, `/api/watch/services`, etc.)
+- Include namespace filtering in requests
+
+### Secret Decoding
+- Option 1: Server-side decode before sending
+- Option 2: Client-side decode (safer, never log)
+- If client-side: Include decoded flag in response
+- **Never log decoded values**
+
+## WebSocket Endpoints
+
+### Shell Endpoint
+```
+/ws/shell
+```
+- **Upstream**: `kubectl exec -it <pod> -n <ns> -- sh`
+- **Protocol**: Bidirectional WebSocket
+  - Client → Server: Keycodes, terminal resize
+  - Server → Client: stdout, stderr
+- **Terminal**: xterm.js on client
+- **Backend**: Use k8s Go client's Exec API with streaming
+
+### Logs Endpoint
+```
+/ws/logs
+```
+- **Upstream**: `kubectl logs --follow <pod> -n <ns>`
+- **Protocol**: WebSocket text frames
+- **Streaming**: Each line as separate message
+- **Client**: Auto-scroll unless user scrolled up
+
+### Watch Endpoint
+```
+/ws/watch
+```
+- **Purpose**: Broadcast resource changes to all connected clients
+- **Payload**: JSON Patch or full object update
+- **Filtering**: Per-client namespace/resource filters
+
+## Multi-User Sync (The Yard)
+
+### CRDT Strategy
+- **Library**: Yjs (proven CRDT implementation)
+- **Provider**: 
+  - Single-node: in-memory or Redis
+  - Multi-node: y-leveldb, y-mongodb, or y-webrtc
+- **WebSocket**: y-websocket for real-time sync
+
+### Synced State
+Only sync the *structure* of the workspace, not content:
+- ✅ Krate positions (wx, wy)
+- ✅ Krate existence (create/delete)
+- ✅ Krate minimize state
+- ✅ Window layout (grid positions, sizes)
+- ✅ Room/canvas camera state (optional)
+- ❌ Window content (logs, YAML, shell) — each client streams independently
+
+### Presence/Awareness
+Broadcast ephemeral user state:
+```
+{
+  userId: string,
+  name: string,
+  color: string,
+  cursorPosition: { x, y } | null,
+  activeKrateId: string | null,
+  spotlightQuery: string | null,
+  timestamp: number
+}
+```
+
+Implementation:
+- Each client publishes to Yjs Awareness channel
+- Admin panel subscribes to awareness updates
+- Update interval: every 5 seconds or on significant change
+
+## Real-Time Shell Implementation
+
+### Client-Side (xterm.js)
+```
+1. Connect WebSocket
+2. Initialize xterm with {
+   rows: 24,
+   cols: 80,
+   fontFamily: 'IBM Plex Mono',
+   fontSize: 12
+}
+3. Attach xterm to WebSocket
+   - term.write() → send to server
+   - WebSocket.onmessage → term.write()
+4. Handle resize: term.resize(cols, rows)
+5. Attach to DOM
+```
+
+### Server-Side (Go + k8s client)
+```
+1. Extract pod/ns from URL: /ws/shell?pod=xxx&ns=yyy
+2. Use k8s-exec Go library:
+   req := clientset.CoreV1().RESTClient().Post()
+   req.Name(pod).Namespace(ns).Resource("pods").SubResource("exec")
+   req.VersionedParams(&v1.ExecOptions{
+     Stdin: true,
+     Stdout: true,
+     Stderr: true,
+     Terminal: true,
+   }, scheme.ParameterCodec)
+   // Create exec stream
+3. Duplex: WebSocket ↔ Exec stream
+4. Handle close gracefully
+```
+
+## Real-Time Logs Implementation
+
+### Client-Side
+```
+1. Connect WebSocket
+2. Maintain scroll position state
+3. On new message:
+   - Append to log buffer
+   - If wasAtBottom: scroll to bottom
+   - Else: keep existing scroll position
+4. Colorize lines on receive (ERROR/WARN/INFO patterns)
+```
+
+### Server-Side
+```
+1. kubectl logs --follow logic via exec stream
+2. Stream lines as WebSocket messages
+3. Add keep-alive ping every 30s
+4. Reconnect on disconnect (client-side)
+```
+
+## Backend API Endpoints
+
+### HTTP Endpoints (REST)
+```
+GET  /api/cluster              # Cluster metadata
+GET  /api/resources            # All resources (cached, for initial load)
+GET  /api/resources/pods       # Filtered by namespace (optional)
+GET  /api/resources/deployments
+GET  /api/resources/services
+GET  /api/resources/secrets
+GET  /api/resources/configmaps
+GET  /api/resources/namespaces
+GET  /api/resources/crds
+GET  /api/resource/{kind}/{name}?ns={namespace}
+GET  /api/health               # Backend health check
+```
+
+### WebSocket Endpoints
+```
+/ws/shell                      # Shell session
+/ws/logs                       # Log streaming
+/ws/watch                      # Resource watch updates
+/ws/sync                       # CRDT sync + awareness
+```
+
+## Error Handling
+
+### WebSocket Disconnects
+- Shell: Notify user, keep terminal buffer (don't clear)
+- Logs: Auto-reconnect with exponential backoff
+- Watch: Auto-reconnect, fetch delta on reconnect
+
+### Backpressure
+- Shell: Buffer output if client slow, drop old lines if buffer full
+- Logs: Same strategy
+- Watch: Batch updates if high frequency
+
+### Connection Limits
+- Per-user limits: max concurrent shells/logs per namespace
+- Global limits: max connections
+- Reject with 429 if limit exceeded
+
+## Security Considerations
+
+### Authentication
+- JWT or session cookie
+- Validate token on WebSocket upgrade
+- Bind connections to user ID
+
+### Authorization
+- Namespace-level RBAC
+- User can only access resources they have permission for
+- Backend must enforce (not just client-side filtering)
+
+### Secret Handling
+- Never log decoded secrets
+- Sanitize before display
+- Consider explicit user action to reveal (future enhancement)
+
+## Scalability
+
+### Single-Node (Development)
+- In-memory Yjs provider
+- Redis for presence
+- Direct k8s API calls
+
+### Multi-Node (Production)
+- Shared Redis for Yjs CRDT state
+- WebSocket leader election (one node handles all CRDT updates)
+- or: y-webrtc for peer-to-peer (simpler, less reliable)
+
+## Health Monitoring
+- WebSocket connection count
+- Active shell sessions
+- API latency (p50, p95, p99)
+- CRDT sync lag
+- Error rates by endpoint
+
+## Testing Strategy
+1. Unit tests for Kubernetes API wrappers
+2. Integration tests with kind (Kubernetes in Docker)
+3. Load tests for WebSocket connections
+4. CRDT conflict resolution tests