Skip to content

Analyzer_agent#67

Open
Jul434 wants to merge 1 commit into
ebzych:mainfrom
Jul434:analyzer_llm
Open

Analyzer_agent#67
Jul434 wants to merge 1 commit into
ebzych:mainfrom
Jul434:analyzer_llm

Conversation

@Jul434
Copy link
Copy Markdown
Collaborator

@Jul434 Jul434 commented Mar 12, 2026

Add agent for optional analysis with LLM

@Jul434 Jul434 force-pushed the analyzer_llm branch 2 times, most recently from d2f0444 to 7b6bfa4 Compare March 12, 2026 10:22
@Jul434 Jul434 requested a review from ebzych April 10, 2026 11:27
Copy link
Copy Markdown
Owner

@ebzych ebzych left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think to need migrate on openai and don't use gigachat, generalize for any model as it implemented in perf_analyzer

Comment on lines +123 to +137
def create_model():
"""Create GigaChat model."""
credentials = os.getenv("GIGACHAT_CREDENTIALS")
if not credentials:
raise ValueError("GigaChat credentials environment variable not set")

model = GigaChat(
credentials=credentials,
scope="GIGACHAT_API_PERS",
model="GigaChat-2-pro",
verify_ssl_certs=False,
timeout=120,
temperature=0.3,
)
return model
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should think about the use scenario of tool: a user may want to use his own model
look at perf_analyzer, it is implemented generically there via environment variables, if there is a problem with generalization in langchain, try with openai

import yaml
from langchain_core.messages import AIMessage, HumanMessage, SystemMessage, ToolMessage
from langchain_core.tools import tool
from langchain_gigachat.chat_models import GigaChat
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to nail gigachat to tool?

Comment on lines +20 to +26
"""
Returns file tree in the project. Each line contains relative path to one file.
Returns max 300 files to avoid token limits.

Args:
proj_path: Absolute path to the project directory
"""
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is Google docstring style, we use reST style everywhere, check other modules

don't forget about types
:param str proj_path: bubuububu
:rtype:
:return:

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i hope it does not cause problems with tool description to llm

proj_path: Absolute path to the project directory
"""
base = Path(proj_path)
paths = [str(p.relative_to(base)) for p in base.rglob("*") if p.is_file()]
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if llm had been used this tool several times but not found file in root of project because rglob implemented as DFS? you can look at general.BuildSystem.find_relative_path, BFS was implemented there

"""
base = Path(proj_path)
paths = [str(p.relative_to(base)) for p in base.rglob("*") if p.is_file()]
return "\n".join(sorted(paths)[:MAX_FILES_IN_TREE])
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. directory src have ~4700 files in grpc repository, if the first directory to walking will be src then you may not find CMakeLists.txt

Comment on lines +99 to +106
1. Use directory_tree to discover the project structure
2. Get information about project by presence of files and directories
3. Use get_file_content to examine build configs (CMakeLists.txt, meson.build, Makefile, etc.) and CI files
4. Analyze CMakeLists.txt, meson.build, CI configs, etc. for test/benchmark paths
5. Analyze all build system files to find what systems are used
6. Analyze third-party directory, CMakeLists.txt find_package, etc. to find dependencies
7. Put found information in YAML file format
8. Repeat until you have all information
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

openai support skills, you can make more clear and comprehensive instructions for concrete build system or anything else with they, because they are not increasing system prompt and loads if condition is satisfied

except ValueError as e:
raise e

model_with_tools = model.bind_tools(TOOLS)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe TOOLS_MAP be better? also may need to specify model.tool_choice because it None by default?

messages.append(result)

if len(messages) > MAX_MESSAGES:
messages = messages[-MAX_MESSAGES:]
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from the end for getting part of message with yaml formatting?

last_message = messages[-1]
content = last_message.content
if isinstance(content, list):
output_text = str(content)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe "\n".join(content)?

from langchain_core.tools import tool
from langchain_gigachat.chat_models import GigaChat

LLM_ANALYSIS_FILE = "amphimixis_llm.analyzed"
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe centralize data in one file for reuse for other modules?

Comment thread amphimixis/analyzer.py
Comment on lines +53 to +57
if use_llm is True:
_logger.info("Analyzing with llm")
analyze_with_agent(proj_path)
_logger.info("Analyzing with llm done")

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't return after this
i understand that you do a primary analysis with heuristics in case llm fails, but I don’t see the results being reflected in the results somehow

@ebzych ebzych added the fix wanted PR wants to be fixed label May 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fix wanted PR wants to be fixed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants