Back to Documentation

Architecture

Deep dive into the Rust core and distributed state layer

Last updated: 2/8/2026

High-Level Overview

The chatbot platform follows a microservices-inspired architecture with clear separation of concerns:

Core Components

1. Web Server Auth Middleware

Ensures correct Authentification. The layer distinguishes between different User levels and restricts access accordingly.

  • Login & docs routes are protected by UI api-keys
  • Logged in users are authenticated by JWT tokens
  • Chat endpoints requite JWT tokens or customer / persona specific api-keys.
  • Chat JWT tokens can be requested from the /api/chat/get_token endpoint with an api-key

2. Web Server Layer

  • Purpose: HTTP API gateway and request routing
  • Responsibilities:
    • REST API endpoints for chat & user/admin operations
    • Configuration management endpoints
    • Request/response serialization
    • Chat streaming support

3. Chat Engine

  • Purpose: Core conversation management and AI integration
  • Responsibilities:
    • Chat session lifecycle management
    • AI provider abstraction and integration
    • Token streaming implementation
    • Context assembly and prompt construction
    • Tool execution and function calling

4. Configuration System

  • Purpose: Dynamic persona/RAG/MCP and system configuration
  • Responsibilities:
    • Chat persona definitions
    • Tool configurations
    • Context document associations
    • Chat parameters (temperature, max tokens, etc.)

5. RAG System

  • Purpose: Context retrieval and document management
  • Responsibilities:
    • Document ingestion and vectorization
    • Semantic search and retrieval
    • Context ranking and filtering
    • Document metadata management

5. Data Layer

  • PostgreSQL: Configuration, chat history, user management, document storage
  • Qdrant Vector Database: Document embeddings and similarity search
  • File System: Static assets

Data Flow

Chat Request Flow

  1. Request Reception: Web server receives chat request with persona ID
  2. Auth Middleware Api Key JWT extraction & check
  3. Chat Routing:
    • Chat invocation uses an api-key or a JWT-Token that contains all information required to identify the persona.
    • Database access is typically not necessary to ensure the request routing.
    • Chat personas are cached or on demand-loaded from postgres.
    • Session context is cached.
  4. Request moderation: - If configured in persona, the request is checked for inappropriate content.
  5. Context Assembly: Query RAG system for relevant context
  6. Prompt Construction: Build complete prompt with system, context, and user message
  7. AI Processing: Send to AI provider (streaming or batch)
    • AI MCP/Function request handling. Possible follow up LLM requests with function results.
  8. Response Handling: Process and return/stream response to client
  9. History Storage: Save (brief) conversation history to session/database.
  10. Accounting: Saving of token counts etc.

Configuration Management Flow

  1. CRUD Operations: Create/read/update/delete persona configurations
  2. Validation: Ensure configuration integrity and required fields
  3. Hot Reload: Update active configurations without restart
  4. Versioning: Track configuration changes and rollback capability

Key Design Principles

1. Async-First

  • All I/O operations are asynchronous using tokio
  • Non-blocking database operations
  • Concurrent request handling
  • Streaming responses where applicable

2. Configuration-Driven

  • All chat behavior controlled by database configuration
  • No hardcoded prompts or parameters
  • Dynamic persona switching
  • Runtime configuration updates

3. Provider Agnostic

  • Abstract AI provider interface
  • Support multiple AI backends
  • Easy provider switching
  • Consistent API regardless of backend

4. Scalable Architecture

  • Stateless web server design
  • Horizontal scaling capability
  • Efficient resource utilization
  • Caching strategies for performance

Security Considerations

  • API authentication and authorization
  • Input validation and sanitization
  • SQL injection prevention
  • Rate limiting and abuse prevention
  • Secure configuration storage
  • Audit logging for compliance

Performance Targets

  • Latency: < 200ms for configuration retrieval
  • Throughput: 1000+ concurrent chat sessions
  • Streaming: < 50ms first token latency
  • Database: < 10ms query response time
  • Memory: Efficient resource usage with connection pooling