AI and RAG pipeline

Purpose

Orient contributors to local inference, tool dispatch, embeddings, and retrieval without dumping every method in AIOrchestrator.

Code location

Concern	Typical paths
Orchestration	`Mnemo.Infrastructure/Services/AI/AIOrchestrator.cs`, `OrchestrationLayerService`
Models / servers	`LlamaCppServerManager`, `LlamaCppHttpTextService`, `ModelRegistry`, `AIModelsSetupService`
Tools / skills	`SkillRegistry`, `ToolDispatcher`, feature `*ToolService` classes registered in `Bootstrapper`
Embeddings / vector	`OnnxEmbeddingService` (`IEmbeddingService`), `SqliteVectorStore` (`IVectorStore`)
Knowledge facade	`KnowledgeService` (`IKnowledgeService`)

Main interfaces / classes

Type	Role
`IAIOrchestrator`	High-level assistant flows coordinating tools and models
`ITextGenerationService`	Delegates between local Llama HTTP and teacher/cloud paths (`DelegatingTextGenerationService`)
`IKnowledgeService`	Retrieval and ingestion orchestration over vector store
`ISkillRegistry`	Discoverable agent skills/tools

Startup / registration flow

Bootstrapper registers AI infrastructure early as singletons; LlamaCppServerManager may spin processes when generation routes first hit (see comments in bootstrap). ResourceGovernor participates in constraining concurrent work.

How to extend

New tool: register handler via appropriate *ToolRegistrar from an IModule.RegisterTools path; keep orchestration side effects out of ViewModels.
New retrieval source: extend knowledge ingestion pipeline through KnowledgeService hooks and vector store schema—avoid duplicating embedding logic in UI.

Gotchas

First-call latency: cold-starting local servers affects perceived hang—surface status in UI rather than blocking silently.
GPU / ONNX: embedding and inference hardware probes can fail open—verify logs when users report slow RAG.
Disposal: embedding/runtime native resources disposed from App exit handler—match lifetime when adding new native-backed singletons.

Related: Infrastructure, Startup flow

Purpose​

Code location​

Main interfaces / classes​

Startup / registration flow​

How to extend​

Gotchas​