# Backend & Real-Time Sync Feature Specification ## Overview The backend service aggregates Kubernetes API data and provides real-time WebSocket connections for shell, logs, and multi-user collaboration. ## Backend Architecture ### Recommended Stack - **Language**: Go (for Kubernetes client integration) - **Kubernetes Client**: kubernetes/client-go - **WebSockets**: gorilla/websocket - **CRDT Layer**: Yjs + y-websocket - **Storage**: Redis or in-memory (single-node), y-leveldb/y-mongodb (multi-node) ### Service Responsibilities 1. Wrap kubectl API access 2. Provide WebSocket multiplexer for: - Shell sessions - Log streaming - Watch API updates 3. Handle CRDT sync for shared workspace state 4. Broadcast user presence/awareness ## Kubernetes Data Sources ### Required Resources - Namespaces - Deployments - StatefulSets - DaemonSets - Services - Ingresses - ConfigMaps - Secrets (with base64 decoding server-side) - PVCs - Events - CRDs ### Live Updates Implementation: Kubernetes Watch API ``` GET /api/v1/namespaces/{ns}/pods?watch=true GET /api/v1/namespaces/{ns}/services?watch=true GET /apis/apps/v1/namespaces/{ns}/deployments?watch=true # etc for each resource type ``` ### Aggregation Strategy - Single endpoint: `/api/watch` that multiplexes all watch streams - Or: Resource-specific endpoints (`/api/watch/pods`, `/api/watch/services`, etc.) - Include namespace filtering in requests ### Secret Decoding - Option 1: Server-side decode before sending - Option 2: Client-side decode (safer, never log) - If client-side: Include decoded flag in response - **Never log decoded values** ## WebSocket Endpoints ### Shell Endpoint ``` /ws/shell ``` - **Upstream**: `kubectl exec -it -n -- sh` - **Protocol**: Bidirectional WebSocket - Client → Server: Keycodes, terminal resize - Server → Client: stdout, stderr - **Terminal**: xterm.js on client - **Backend**: Use k8s Go client's Exec API with streaming ### Logs Endpoint ``` /ws/logs ``` - **Upstream**: `kubectl logs --follow -n ` - **Protocol**: WebSocket text frames - **Streaming**: Each line as separate message - **Client**: Auto-scroll unless user scrolled up ### Watch Endpoint ``` /ws/watch ``` - **Purpose**: Broadcast resource changes to all connected clients - **Payload**: JSON Patch or full object update - **Filtering**: Per-client namespace/resource filters ## Multi-User Sync (The Yard) ### CRDT Strategy - **Library**: Yjs (proven CRDT implementation) - **Provider**: - Single-node: in-memory or Redis - Multi-node: y-leveldb, y-mongodb, or y-webrtc - **WebSocket**: y-websocket for real-time sync ### Synced State Only sync the *structure* of the workspace, not content: - ✅ Krate positions (wx, wy) - ✅ Krate existence (create/delete) - ✅ Krate minimize state - ✅ Window layout (grid positions, sizes) - ✅ Room/canvas camera state (optional) - ❌ Window content (logs, YAML, shell) — each client streams independently ### Presence/Awareness Broadcast ephemeral user state: ``` { userId: string, name: string, color: string, cursorPosition: { x, y } | null, activeKrateId: string | null, spotlightQuery: string | null, timestamp: number } ``` Implementation: - Each client publishes to Yjs Awareness channel - Admin panel subscribes to awareness updates - Update interval: every 5 seconds or on significant change ## Real-Time Shell Implementation ### Client-Side (xterm.js) ``` 1. Connect WebSocket 2. Initialize xterm with { rows: 24, cols: 80, fontFamily: 'IBM Plex Mono', fontSize: 12 } 3. Attach xterm to WebSocket - term.write() → send to server - WebSocket.onmessage → term.write() 4. Handle resize: term.resize(cols, rows) 5. Attach to DOM ``` ### Server-Side (Go + k8s client) ``` 1. Extract pod/ns from URL: /ws/shell?pod=xxx&ns=yyy 2. Use k8s-exec Go library: req := clientset.CoreV1().RESTClient().Post() req.Name(pod).Namespace(ns).Resource("pods").SubResource("exec") req.VersionedParams(&v1.ExecOptions{ Stdin: true, Stdout: true, Stderr: true, Terminal: true, }, scheme.ParameterCodec) // Create exec stream 3. Duplex: WebSocket ↔ Exec stream 4. Handle close gracefully ``` ## Real-Time Logs Implementation ### Client-Side ``` 1. Connect WebSocket 2. Maintain scroll position state 3. On new message: - Append to log buffer - If wasAtBottom: scroll to bottom - Else: keep existing scroll position 4. Colorize lines on receive (ERROR/WARN/INFO patterns) ``` ### Server-Side ``` 1. kubectl logs --follow logic via exec stream 2. Stream lines as WebSocket messages 3. Add keep-alive ping every 30s 4. Reconnect on disconnect (client-side) ``` ## Backend API Endpoints ### HTTP Endpoints (REST) ``` GET /api/cluster # Cluster metadata GET /api/resources # All resources (cached, for initial load) GET /api/resources/pods # Filtered by namespace (optional) GET /api/resources/deployments GET /api/resources/services GET /api/resources/secrets GET /api/resources/configmaps GET /api/resources/namespaces GET /api/resources/crds GET /api/resource/{kind}/{name}?ns={namespace} GET /api/health # Backend health check ``` ### WebSocket Endpoints ``` /ws/shell # Shell session /ws/logs # Log streaming /ws/watch # Resource watch updates /ws/sync # CRDT sync + awareness ``` ## Error Handling ### WebSocket Disconnects - Shell: Notify user, keep terminal buffer (don't clear) - Logs: Auto-reconnect with exponential backoff - Watch: Auto-reconnect, fetch delta on reconnect ### Backpressure - Shell: Buffer output if client slow, drop old lines if buffer full - Logs: Same strategy - Watch: Batch updates if high frequency ### Connection Limits - Per-user limits: max concurrent shells/logs per namespace - Global limits: max connections - Reject with 429 if limit exceeded ## Security Considerations ### Authentication - JWT or session cookie - Validate token on WebSocket upgrade - Bind connections to user ID ### Authorization - Namespace-level RBAC - User can only access resources they have permission for - Backend must enforce (not just client-side filtering) ### Secret Handling - Never log decoded secrets - Sanitize before display - Consider explicit user action to reveal (future enhancement) ## Scalability ### Single-Node (Development) - In-memory Yjs provider - Redis for presence - Direct k8s API calls ### Multi-Node (Production) - Shared Redis for Yjs CRDT state - WebSocket leader election (one node handles all CRDT updates) - or: y-webrtc for peer-to-peer (simpler, less reliable) ## Health Monitoring - WebSocket connection count - Active shell sessions - API latency (p50, p95, p99) - CRDT sync lag - Error rates by endpoint ## Testing Strategy 1. Unit tests for Kubernetes API wrappers 2. Integration tests with kind (Kubernetes in Docker) 3. Load tests for WebSocket connections 4. CRDT conflict resolution tests