Glossary

Key terms in LLMOps, AI pipelines, and production AI governance.

AI Agent Orchestration

AI agent orchestration is the coordination of multiple AI agents working together on a task, managing their communication, task delegation, and output synthesis.

AI Governance

AI governance is the framework of policies, processes, and technical controls that ensure AI systems operate safely, ethically, and in compliance with regulations.

AI Pipeline

An AI pipeline is a sequence of connected processing steps that transforms inputs into AI-generated outputs, including data retrieval, model inference, post-processing, and quality evaluation.

Eval Gate

An eval gate is an automated quality checkpoint that runs evaluation suites against an AI pipeline and blocks deployment if quality thresholds are not met.

LLM Evaluation

LLM evaluation is the systematic process of measuring language model output quality across dimensions like accuracy, faithfulness, relevance, and safety.

LLM Observability

LLM observability is the practice of collecting, analyzing, and visualizing traces, metrics, and logs from language model applications to understand system behavior in production.

LLM Tracing

LLM tracing is the practice of recording the full execution path of a language model request, including prompt construction, model calls, tool use, and response generation, as a structured trace.

LLM-as-a-Judge

LLM-as-a-Judge is an evaluation pattern where a language model scores or ranks the outputs of another language model against defined criteria.

LLMOps

LLMOps is the set of practices, tools, and infrastructure for deploying, monitoring, evaluating, and governing large language models in production.

Prompt Management

Prompt management is the practice of versioning, testing, and deploying prompts as first-class software artifacts with change tracking and rollback capabilities.

Proof Bundle

A proof bundle is an immutable record that packages evaluation results, approval decisions, and deployment metadata into a single auditable artifact for AI pipeline governance.

RAG Evaluation

RAG evaluation measures the quality of retrieval-augmented generation systems across retrieval accuracy, context relevance, answer faithfulness, and end-to-end response quality.