52 lines (38 loc) · 1.48 KB

Failure Intelligence

Failure intelligence is the capability to detect, remember, and act on failures as knowledge, not just telemetry.

What Kakveda adds (layer model)

Most stacks already have:

LLM runtime (Ollama/OpenAI/etc.)
tracing/logging (OpenTelemetry, logs)
evaluation pipelines

Kakveda adds:

a failure knowledge base
pre-flight matching (“this failed before”) before executing
pattern detection (recurrence, trends)
system health scoring over time

Pre-flight memory check

Before a run executes, Kakveda can:

normalize inputs
compute a fingerprint
match against known failures/patterns
apply policy → allow/warn/block/route

Why it matters

Reduces repeated failures in production
Gives teams a shared memory of failure modes
Enables preventive controls (warn/block) and safer routing
Improves long-term system reliability (health-based management)

Practical examples

Prompt or tool configuration that repeatedly causes parsing errors
Tool outages that recur (timeouts, 5xx) — detect pattern and route
Model regressions between versions — detect rising severity in patterns

Extensibility

Matching can be:

deterministic (hash/fingerprint)
rules/heuristics
embedding similarity (optional extension)

Policy can be:

warn-only
warn + confirmation gate
block high-severity failures
route to safer model/tools