AI and RAG pipeline
Purpose
Orient contributors to local inference, tool dispatch, embeddings, and retrieval without dumping every method in AIOrchestrator.
Code location
| Concern | Typical paths |
|---|---|
| Orchestration | Mnemo.Infrastructure/Services/AI/AIOrchestrator.cs, OrchestrationLayerService |
| Models / servers | LlamaCppServerManager, LlamaCppHttpTextService, ModelRegistry, AIModelsSetupService |
| Tools / skills | SkillRegistry, ToolDispatcher, feature *ToolService classes registered in Bootstrapper |
| Embeddings / vector | OnnxEmbeddingService (IEmbeddingService), SqliteVectorStore (IVectorStore) |
| Knowledge facade | KnowledgeService (IKnowledgeService) |
Main interfaces / classes
| Type | Role |
|---|---|
IAIOrchestrator | High-level assistant flows coordinating tools and models |
ITextGenerationService | Delegates between local Llama HTTP and teacher/cloud paths (DelegatingTextGenerationService) |
IKnowledgeService | Retrieval and ingestion orchestration over vector store |
ISkillRegistry | Discoverable agent skills/tools |
Startup / registration flow
Bootstrapper registers AI infrastructure early as singletons; LlamaCppServerManager may spin processes when generation routes first hit (see comments in bootstrap). ResourceGovernor participates in constraining concurrent work.
How to extend
- New tool: register handler via appropriate
*ToolRegistrarfrom anIModule.RegisterToolspath; keep orchestration side effects out of ViewModels. - New retrieval source: extend knowledge ingestion pipeline through
KnowledgeServicehooks and vector store schema—avoid duplicating embedding logic in UI.
Gotchas
- First-call latency: cold-starting local servers affects perceived hang—surface status in UI rather than blocking silently.
- GPU / ONNX: embedding and inference hardware probes can fail open—verify logs when users report slow RAG.
- Disposal: embedding/runtime native resources disposed from
Appexit handler—match lifetime when adding new native-backed singletons.
Related: Infrastructure, Startup flow