Skip to content

Commit 541c110

Browse files
committed
docs: Address PR open-policy-agent#7929 review feedback
- Fix misleading 'aggregated' terminology - use 'instance-level' instead - Remove per-query metrics section from monitoring.md, add cross-references - Focus metrics documentation on commonly used regex and http.send built-ins - Add missing counter_rego_builtin_regex_interquery_value_cache_hits metric - Move admonition to after example in REST API documentation - Simplify and reduce scope of metrics documentation per reviewer guidance Signed-off-by: Anivar A Aravind <[email protected]>
1 parent 657f565 commit 541c110

File tree

3 files changed

+39
-112
lines changed

3 files changed

+39
-112
lines changed

docs/docs/monitoring.md

Lines changed: 7 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -90,42 +90,28 @@ When Prometheus is enabled in the status plugin (see [Configuration](./configura
9090

9191
OPA provides two ways to access performance metrics:
9292

93-
1. **System-wide metrics** via the `/metrics` Prometheus endpoint - Aggregated metrics across all OPA operations
93+
1. **System-wide metrics** via the `/metrics` Prometheus endpoint - Instance-level metrics across all OPA operations
9494
2. **Per-query metrics** via API responses with `?metrics=true` - Metrics for individual query executions
9595

96-
These serve different purposes: system metrics for monitoring and alerting, per-query metrics for debugging and optimization.
96+
These serve different purposes: system metrics for OPA instance monitoring and alerting, per-query metrics for debugging and optimization.
9797

9898

9999
## Accessing Metrics
100100

101101
### System-Wide Metrics (Prometheus Endpoint)
102102

103-
Access aggregated metrics across all OPA operations:
103+
Access instance-level metrics across all OPA operations:
104104

105105
- **URL**: `http://localhost:8181/metrics` (default configuration)
106106
- **Method**: HTTP GET
107107
- **Format**: Prometheus text format
108-
- **Contents**: All counters, timers, histograms, Go runtime metrics
108+
- **Contents**: Instance-level counters, timers, histograms, Go runtime metrics
109109
- **Use case**: Monitoring dashboards, alerting, performance trends
110110

111-
### Per-Query Metrics
112-
113-
Get metrics for individual policy evaluations:
114-
115-
- **Method**: Add `?metrics=true` parameter to API requests
116-
- **Supported APIs**: `/v1/data`, `/v0/data`, `/v1/query`, `/v1/compile`
117-
- **Contents**: Query-specific parse, compile, and eval timers
118-
- **Use case**: Debugging slow queries, performance optimization
119-
120-
Example:
121-
```http
122-
POST /v1/data/example?metrics=true HTTP/1.1
123-
```
124-
125-
For details on interpreting per-query metrics, see [REST API Performance Metrics](./rest-api#performance-metrics).
126-
127-
### Other Metric Sources
111+
### Additional Resources
128112

113+
- **Per-query metrics**: See [REST API Performance Metrics](./rest-api#performance-metrics) for debugging individual queries
114+
- **Policy performance**: See [Policy Performance](./policy-performance#performance-metrics) for optimization guidance
129115
- **Status API**: Includes subset of metrics in status reports
130116
- **Decision logs**: Can include request-level metrics when configured
131117
- **CLI tools**: `opa eval --metrics` and `opa bench --metrics`

docs/docs/policy-performance.md

Lines changed: 30 additions & 89 deletions
Original file line numberDiff line numberDiff line change
@@ -970,117 +970,58 @@ Users are recommended to do performance testing to determine the optimal configu
970970

971971
## Performance Metrics
972972

973-
OPA exposes metrics for each phase of policy evaluation:
973+
OPA exposes metrics for policy evaluation performance. These are available through:
974974

975-
- **System-wide metrics** are available at the `/metrics` Prometheus endpoint
976-
- **Per-query metrics** are returned with individual API responses when `?metrics=true` is specified
975+
- **System-wide metrics** at the `/metrics` Prometheus endpoint
976+
- **Per-query metrics** with individual API responses when `?metrics=true` is specified
977977

978-
See [Monitoring](./monitoring#metrics-overview) for the distinction between these metric types.
978+
See [Monitoring](./monitoring#metrics-overview) for more details.
979979

980-
### Query Evaluation Metrics
980+
### Common Built-in Function Metrics
981981

982-
Query evaluation phases:
983-
984-
- `timer_rego_query_parse_ns` - Time spent parsing the query string into AST
985-
- `timer_rego_query_compile_ns` - Time spent compiling the query for evaluation
986-
- `timer_rego_query_eval_ns` - Time spent executing the compiled query
987-
988-
Compilation time often dominates in complex policies.
989-
990-
### Module and Policy Metrics
991-
992-
Policy compilation and parsing:
993-
994-
- `timer_rego_module_parse_ns` - Time to parse policy modules from source
995-
- `timer_rego_module_compile_ns` - Time to compile parsed modules into evaluation form
996-
- `timer_rego_data_parse_ns` - Time to parse data documents
997-
- `timer_rego_input_parse_ns` - Time to parse input documents
998-
999-
Module compilation runs once at load time. Slow compilation impacts bundle updates.
1000-
1001-
### File and Bundle Loading
1002-
1003-
Policy and data loading:
1004-
1005-
- `timer_rego_load_files_ns` - Time to load policy files from disk
1006-
- `timer_rego_load_bundles_ns` - Time to load and activate bundles
1007-
- `timer_bundle_request_ns` - Time spent downloading bundles
1008-
1009-
### Compilation and Partial Evaluation Metrics
1010-
1011-
Compilation and partial evaluation:
1012-
1013-
- `timer_rego_partial_eval_ns` - Total partial evaluation time
1014-
- `timer_compile_prep_partial_ns` - Time preparing for partial evaluation
1015-
- `timer_compile_eval_constraints_ns` - Time evaluating constraints
1016-
- `timer_compile_translate_queries_ns` - Time translating queries
1017-
- `timer_compile_extract_annotations_unknowns_ns` - Time extracting unknowns
1018-
- `timer_compile_extract_annotations_mask_ns` - Time extracting masks
1019-
- `timer_compile_eval_mask_rule_ns` - Time evaluating mask rules
1020-
- `timer_compile_stage_check_imports_ns` - Time checking imports
1021-
- `counter_compile_stage_comprehension_index_build` - Comprehension indices built
1022-
1023-
High partial evaluation times indicate optimization opportunities.
1024-
1025-
### Evaluation Operation Metrics
1026-
1027-
Evaluation operations produce both timer and histogram metrics:
982+
#### HTTP Built-ins
1028983

1029-
**Timers** (measure total time):
1030-
- `timer_eval_op_plug_ns` - Time spent in plugging operations
1031-
- `timer_eval_op_resolve_ns` - Time resolving references
1032-
- `timer_eval_op_rule_index_ns` - Time spent in rule indexing
1033-
- `timer_eval_op_builtin_call_ns` - Time spent calling built-in functions
1034-
- `timer_partial_op_save_unify_ns` - Time saving unification in partial eval
1035-
- `timer_partial_op_save_set_contains_ns` - Time for set contains in partial eval
1036-
- `timer_partial_op_save_set_contains_rec_ns` - Time for recursive set contains
1037-
- `timer_partial_op_copy_propagation_ns` - Time for copy propagation optimization
984+
`http.send` metrics help identify I/O bottlenecks:
1038985

1039-
**Histograms** (track time distribution):
1040-
- `histogram_eval_op_plug` - Distribution of plugging operation times
1041-
- `histogram_eval_op_resolve` - Distribution of reference resolution times
1042-
- `histogram_eval_op_rule_index` - Distribution of rule indexing times
1043-
- `histogram_eval_op_builtin_call` - Distribution of built-in function call times
1044-
- `histogram_partial_op_save_unify` - Distribution of unification save times
1045-
- `histogram_partial_op_save_set_contains` - Distribution of set contains times
1046-
- `histogram_partial_op_save_set_contains_rec` - Distribution of recursive set contains times
1047-
- `histogram_partial_op_copy_propagation` - Distribution of copy propagation times
986+
- `timer_rego_builtin_http_send_ns` - Total time spent in http.send calls
987+
- `counter_rego_builtin_http_send_interquery_cache_hits` - Inter-query cache hits
988+
- `counter_rego_builtin_http_send_network_requests` - Actual network requests made
1048989

1049-
Histograms show percentiles: 50%, 75%, 90%, 95%, 99%, 99.9%, 99.99%.
990+
High cache hit ratios indicate effective caching and reduced network overhead.
1050991

1051-
### Built-in Function Metrics
992+
#### Regex Built-ins
1052993

1053-
#### HTTP Built-ins
994+
Regex operation metrics help optimize pattern matching:
1054995

1055-
`http.send` metrics:
996+
- `timer_rego_builtin_regex_interquery_ns` - Time spent in regex operations
997+
- `counter_rego_builtin_regex_interquery_cache_hits` - Regex pattern cache hits
998+
- `counter_rego_builtin_regex_interquery_value_cache_hits` - Regex value cache hits
1056999

1057-
- `timer_rego_builtin_http_send_ns` - Total time spent in http.send calls
1058-
- `counter_rego_builtin_http_send_interquery_cache_hits` - Inter-query cache hits
1059-
- `counter_rego_builtin_http_send_network_requests` - Actual network requests made
1000+
Effective regex caching improves performance when the same patterns are used repeatedly.
10601001

1061-
High cache hit ratios indicate effective caching.
1002+
### Core Query Metrics
10621003

1063-
#### External Data Resolution
1004+
Basic query evaluation phases:
10641005

1065-
External data resolution:
1006+
- `timer_rego_query_parse_ns` - Time parsing the query string
1007+
- `timer_rego_query_compile_ns` - Time compiling the query
1008+
- `timer_rego_query_eval_ns` - Time executing the compiled query
10661009

1067-
- `timer_rego_external_resolve_ns` - Time resolving external data references
1010+
Compilation time often dominates in complex policies.
10681011

1069-
### SDK and Server Metrics
1012+
### High-Level Metrics
10701013

1071-
High-level evaluation:
1014+
Server-level metrics for overall performance:
10721015

10731016
- `timer_server_handler_ns` - Total request handler execution time
1074-
- `timer_sdk_decision_eval_ns` - SDK decision evaluation time
10751017
- `counter_server_query_cache_hit` - Server-level query cache hits
10761018

1077-
### Using Metrics
1019+
### Using Metrics for Optimization
10781020

1079-
1. Compare parse, compile, and eval times to find slow phases
1080-
2. High operation counts indicate complex queries
1081-
3. Low cache hit rates suggest tuning opportunities
1082-
4. High `http.send` counts indicate I/O bottlenecks
1083-
5. Bundle activation times show deployment latency
1021+
1. **Query phases**: Compare parse, compile, and eval times to identify bottlenecks
1022+
2. **Cache effectiveness**: Low cache hit rates suggest tuning opportunities
1023+
3. **I/O bottlenecks**: High `http.send` network request counts indicate caching issues
1024+
4. **Pattern matching**: Monitor regex cache hits for frequently used patterns
10841025

10851026
Access metrics via:
10861027
- REST API: Add `?metrics=true` to policy evaluation requests

docs/docs/rest-api.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2223,8 +2223,6 @@ restarts, a **Redo** Trace Event is emitted.
22232223

22242224
## Performance Metrics
22252225

2226-
**Note**: These are per-query metrics returned inline with API responses. For system-wide aggregated metrics, see the `/metrics` Prometheus endpoint described in [Monitoring](./monitoring#prometheus).
2227-
22282226
OPA can report detailed performance metrics at runtime. Performance metrics can
22292227
be requested on individual API calls and are returned inline with the API
22302228
response. To enable performance metric collection on an API call, specify the
@@ -2265,6 +2263,8 @@ Content-Type: application/json
22652263
}
22662264
```
22672265

2266+
> **Note**: These are per-query metrics returned inline with API responses. For system-wide instance metrics, see the `/metrics` Prometheus endpoint described in [Monitoring](./monitoring#prometheus).
2267+
22682268
OPA provides the following query performance metrics:
22692269

22702270
### Core Query Metrics

0 commit comments

Comments
 (0)