Skip to content

Add comprehensive PromQL extraction with multiple fallback patterns#21

Open
yashpal2104 wants to merge 2 commits intosodafoundation:mainfrom
yashpal2104:feat/improve-promql-generation
Open

Add comprehensive PromQL extraction with multiple fallback patterns#21
yashpal2104 wants to merge 2 commits intosodafoundation:mainfrom
yashpal2104:feat/improve-promql-generation

Conversation

@yashpal2104
Copy link
Copy Markdown

Fixes #20

  • Implement robust syntax validation for common PromQL errors
  • Detect 'over' vs 'by'/'without' aggregation mistakes
  • Validate bracket/parentheses/brace balancing
  • Check for range vectors used directly in aggregations
  • Verify over-time functions have required time ranges
  • Improve PromQL cleaning: strip comments, placeholders, and language markers
  • Add flexible Prometheus config handling for multiple instance formats
  • Update prompt templates with PromQL syntax reference and clearer instructions

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespace from that line:

/kind new feature

/kind bug fix

/kind cleanup
/kind revert change
/kind design
/kind documentation

/kind enhancement

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #20

Test Report Added?:

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespace from that line:
/kind TESTED
/kind NOT-TESTED

Test Report:

Special notes for your reviewer:
image
image
image

Copilot AI review requested due to automatic review settings February 10, 2026 07:36
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves the robustness of extracting executable PromQL from LLM (Ollama) responses and updates the DynamicPrompt templates to steer the model toward syntactically valid PromQL, while also expanding Prometheus config parsing to support multiple-instance config formats.

Changes:

  • Added multiple fallback extraction patterns and basic PromQL syntax heuristics in get_promql_from_ollama.
  • Updated Prometheus connection setup to accept prometheus_instances-based config and additional base URL key fallbacks.
  • Refined prompt templates to request “PromQL-only” output and added a PromQL syntax reference to reduce LLM formatting/syntax errors.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
pkg/copilot/DP_logic/dp_logic.py Adds multi-pattern PromQL extraction/cleanup + syntax heuristics; expands Prometheus config parsing logic.
pkg/copilot/DP_logic/DynamicPrompt/dynamic_prompt/prompt_builder.py Reorders prompt content to more directly instruct the model to generate PromQL for the user question.
pkg/copilot/DP_logic/DynamicPrompt/config/template_sections/postamble.md Tightens instructions to output only a PromQL code block and avoid placeholders/comments.
pkg/copilot/DP_logic/DynamicPrompt/config/template_sections/domain.md Adds an embedded PromQL syntax reference and common pitfalls to guide query generation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 19 to +23
Step 5: Create the core `query` expression. This is the PromQL string. If necessary, build the query using aggregation operators, functions, or vector selectors, based on what the input text describes. Explicitly explain how you construct this expression.

Step 6: Display the final `query` expression in a labelled markdown block like so:
```promql
[final query here]
```

IMPORTANT: Do not ask the user for clarification or additional details. Always infer the best possible query and format.
Step 6: Output ONLY the PromQL query expression inside a promql markdown code block.
The code block must contain the actual executable PromQL query you generated - nothing else.

Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The instructions are internally inconsistent: Step 5 asks to "Explicitly explain" how the query is constructed, but Step 6 now requires outputting ONLY the PromQL code block and nothing else. This conflict can cause the LLM to emit extra text and break downstream extraction; consider removing the Step 5 explanation requirement or clarifying that any explanation must be outside the final response (or omitted entirely).

Copilot uses AI. Check for mistakes.
Comment on lines +97 to +100
# Clean up any square brackets or placeholder text that might have been included
promql = re.sub(r'^\s*\[\s*', '', promql) # Remove leading [
promql = re.sub(r'\s*\]\s*$', '', promql) # Remove trailing ]
promql = promql.strip()
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cleanup that strips a trailing ']' will break valid PromQL expressions that legitimately end with a range selector (e.g., metric[5m]), turning them into invalid syntax. Consider removing this normalization, or only stripping surrounding brackets when the entire extracted string is clearly a placeholder wrapper (e.g., starts with '[' and ends with ']' and contains no other '['/']' pairs).

Suggested change
# Clean up any square brackets or placeholder text that might have been included
promql = re.sub(r'^\s*\[\s*', '', promql) # Remove leading [
promql = re.sub(r'\s*\]\s*$', '', promql) # Remove trailing ]
promql = promql.strip()
# Clean up any placeholder-style square bracket wrappers like "[ query ]"
promql = promql.strip()
if promql.startswith("[") and promql.endswith("]") and promql.count("[") == 1 and promql.count("]") == 1:
promql = promql[1:-1].strip()

Copilot uses AI. Check for mistakes.
Comment on lines +94 to +95
# Remove comments (# and everything after on the same line)
promql = re.sub(r'#.*', '', promql)
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing everything after # can corrupt otherwise-valid PromQL when # appears inside a quoted label value (e.g., {label="a#b"}). Since PromQL doesn’t have comments, it may be better to avoid this transformation or only strip lines where # is the first non-whitespace character and not within quotes.

Suggested change
# Remove comments (# and everything after on the same line)
promql = re.sub(r'#.*', '', promql)
# Remove comment lines where '#' is the first non-whitespace character
promql = "\n".join(
line for line in promql.splitlines()
if not line.lstrip().startswith("#")
)

Copilot uses AI. Check for mistakes.
Comment on lines +169 to 185
# Handle both direct config and prometheus_instances structure
if "prometheus_instances" in prom_config:
# Extract first instance from list
instance = prom_config["prometheus_instances"][0]
base_url = instance["base_url"]
disable_ssl = instance.get("disable_ssl", False)
else:
# Fallback logic: check for 'base_url', then 'prometheus_url', then 'url'
base_url = prom_config.get("base_url") or \
prom_config.get("prometheus_url") or \
prom_config.get("url")
disable_ssl = prom_config.get("disable_ssl", True)

prom = PrometheusConnect(
url=prom_config["base_url"],
disable_ssl=True
url=base_url,
disable_ssl=disable_ssl
)
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prometheus_instances handling assumes the list is non-empty and that base_url is present; otherwise this will raise (IndexError/KeyError) or pass None into PrometheusConnect. Also, any per-instance headers in config (supported elsewhere in the repo) are ignored here, which can break auth-required Prometheus setups. Suggest validating the config shape (non-empty list, required keys) and passing through headers when constructing PrometheusConnect.

Copilot uses AI. Check for mistakes.
@yashpal2104 yashpal2104 changed the title - Add comprehensive PromQL extraction with multiple fallback patterns Add comprehensive PromQL extraction with multiple fallback patterns Feb 11, 2026
@sushanthakumar
Copy link
Copy Markdown
Collaborator

@rohithvaidya @Anagha9812 pls help to review

@sushanthakumar
Copy link
Copy Markdown
Collaborator

@yashpal2104 updating this PR with reworked code?

@yashpal2104
Copy link
Copy Markdown
Author

@sushanthakumar reworked as in?

@rohithvaidya
Copy link
Copy Markdown
Collaborator

Hi @yashpal2104, Thanks for raising a PR for the fix, by adding syntax checking and promql extraction routines. Please incorporate the below points for modularity and provide clarifications for the functionality being added.

  1. Instead of having multiple if blocks, can this be added as a separate extractor module and syntax checking module so that the individual patterns can be in an array and iteratively checked?
  2. If there is a sub-query like sum(rate(x[5m])) will it be flagged?
  3. Can we use the module promql-parser (https://pypi.org/project/promql-parser/) for this purpose instead of regex?
  4. Will this allow a selector? For eg. http_requests_total{job="api"}
  5. Are patterns 2 and 1, doing the same thing, newline and space enforcement? Can one be removed?

@yashpal2104 yashpal2104 force-pushed the feat/improve-promql-generation branch from 3502f24 to 31d3e37 Compare March 9, 2026 11:24
@yashpal2104
Copy link
Copy Markdown
Author

  1. Instead of having multiple if blocks added a separate extractor module and syntax checking module so that the individual patterns can be in an array and iteratively checked
  2. If there is a sub-query like sum(rate(x[5m])) Yes it will be flagged this is a false positive, trying to fix it
  3. Yes I think using py-promql parser would be good instead of regex
  4. removed 1, the 2nd one takes care of it

- Implement robust syntax validation for common PromQL errors
  - Detect 'over' vs 'by'/'without' aggregation mistakes
  - Validate bracket/parentheses/brace balancing
  - Check for range vectors used directly in aggregations
  - Verify over-time functions have required time ranges
- Improve PromQL cleaning: strip comments, placeholders, and language markers
- Add flexible Prometheus config handling for multiple instance formats
- Update prompt templates with PromQL syntax reference and clearer instructions

Signed-off-by: Yash Pal <yashpal2104@gmail.com>
…r modularity

Signed-off-by: Yash Pal <yashpal2104@gmail.com>
@yashpal2104 yashpal2104 force-pushed the feat/improve-promql-generation branch from 31d3e37 to 8a0d632 Compare March 10, 2026 07:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: PromQL Query Parsing Failures Leading to Error Responses

4 participants