Skip to main content

AI and RAG pipeline

Purpose

Orient contributors to local inference, tool dispatch, embeddings, and retrieval without dumping every method in AIOrchestrator.

Code location

ConcernTypical paths
OrchestrationMnemo.Infrastructure/Services/AI/AIOrchestrator.cs, OrchestrationLayerService
Models / serversLlamaCppServerManager, LlamaCppHttpTextService, ModelRegistry, AIModelsSetupService
Tools / skillsSkillRegistry, ToolDispatcher, feature *ToolService classes registered in Bootstrapper
Embeddings / vectorOnnxEmbeddingService (IEmbeddingService), SqliteVectorStore (IVectorStore)
Knowledge facadeKnowledgeService (IKnowledgeService)

Main interfaces / classes

TypeRole
IAIOrchestratorHigh-level assistant flows coordinating tools and models
ITextGenerationServiceDelegates between local Llama HTTP and teacher/cloud paths (DelegatingTextGenerationService)
IKnowledgeServiceRetrieval and ingestion orchestration over vector store
ISkillRegistryDiscoverable agent skills/tools

Startup / registration flow

Bootstrapper registers AI infrastructure early as singletons; LlamaCppServerManager may spin processes when generation routes first hit (see comments in bootstrap). ResourceGovernor participates in constraining concurrent work.

How to extend

  • New tool: register handler via appropriate *ToolRegistrar from an IModule.RegisterTools path; keep orchestration side effects out of ViewModels.
  • New retrieval source: extend knowledge ingestion pipeline through KnowledgeService hooks and vector store schema—avoid duplicating embedding logic in UI.

Gotchas

  • First-call latency: cold-starting local servers affects perceived hang—surface status in UI rather than blocking silently.
  • GPU / ONNX: embedding and inference hardware probes can fail open—verify logs when users report slow RAG.
  • Disposal: embedding/runtime native resources disposed from App exit handler—match lifetime when adding new native-backed singletons.

Related: Infrastructure, Startup flow