Failure intelligence is the capability to detect, remember, and act on failures as knowledge, not just telemetry.
Most stacks already have:
- LLM runtime (Ollama/OpenAI/etc.)
- tracing/logging (OpenTelemetry, logs)
- evaluation pipelines
Kakveda adds:
- a failure knowledge base
- pre-flight matching (“this failed before”) before executing
- pattern detection (recurrence, trends)
- system health scoring over time
Before a run executes, Kakveda can:
- normalize inputs
- compute a fingerprint
- match against known failures/patterns
- apply policy → allow/warn/block/route
- Reduces repeated failures in production
- Gives teams a shared memory of failure modes
- Enables preventive controls (warn/block) and safer routing
- Improves long-term system reliability (health-based management)
- Prompt or tool configuration that repeatedly causes parsing errors
- Tool outages that recur (timeouts, 5xx) — detect pattern and route
- Model regressions between versions — detect rising severity in patterns
Matching can be:
- deterministic (hash/fingerprint)
- rules/heuristics
- embedding similarity (optional extension)
Policy can be:
- warn-only
- warn + confirmation gate
- block high-severity failures
- route to safer model/tools