Skip to content

Commit 95beb3e

Browse files
authored
startup checks (#2716)
1 parent 4367889 commit 95beb3e

File tree

40 files changed

+18802
-106
lines changed

40 files changed

+18802
-106
lines changed

.github/ISSUE_TEMPLATE/bug_report.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ Please answer the following questions for yourself before submitting an issue.
2222
- [ ] I checked to make sure that this issue has not already been filed
2323
- [ ] I'm reporting the issue to the correct repository (for multi-repository projects)
2424
- [ ] I have read and checked all configs (with all optional parts)
25+
- [ ] Asked and no solution about my issue with [deepwiki](https://deepwiki.com/kevoreilly/CAPEv2)
2526

2627

2728
# Expected Behavior

.github/workflows/auto_answer.yml

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
name: Auto Answer Bot (using uv run)
2+
3+
on:
4+
issues:
5+
types: [opened]
6+
7+
jobs:
8+
answer:
9+
runs-on: ubuntu-latest
10+
steps:
11+
- name: Checkout repository code
12+
uses: actions/checkout@v4
13+
14+
- name: Set up Python with caching
15+
uses: actions/setup-python@v5
16+
with:
17+
python-version: '3.10'
18+
cache: 'pip'
19+
20+
- name: Install uv
21+
uses: astral-sh/setup-uv@v6
22+
with:
23+
enable-cache: true
24+
25+
- name: Install the project
26+
run: uv run pip install -r requirements.txt
27+
28+
- name: Run the answer bot with uv run
29+
env:
30+
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
31+
GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}
32+
ISSUE_NUMBER: ${{ github.event.issue.number }}
33+
REPO_NAME: ${{ github.repository }}
34+
# This single step installs dependencies (if needed) and runs the script
35+
run: cd KnowledgeBaseBot && uv run python auto_answer_bot.py

KnowledgeBaseBot/all_texts.json

Lines changed: 3014 additions & 0 deletions
Large diffs are not rendered by default.

KnowledgeBaseBot/auto_answer.yml

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
name: Auto Answer Bot (using uv run)
2+
3+
on:
4+
issues:
5+
types: [opened]
6+
7+
jobs:
8+
answer:
9+
runs-on: ubuntu-latest
10+
steps:
11+
- name: Checkout repository code
12+
uses: actions/checkout@v4
13+
14+
- name: Set up Python with caching
15+
uses: actions/setup-python@v5
16+
with:
17+
python-version: '3.10'
18+
cache: 'pip'
19+
20+
- name: Install uv
21+
uses: astral-sh/setup-uv@v6
22+
with:
23+
enable-cache: true
24+
25+
- name: Install the project
26+
run: uv run pip install -r requirements.txt
27+
28+
- name: Run the answer bot with uv run
29+
env:
30+
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
31+
GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}
32+
ISSUE_NUMBER: ${{ github.event.issue.number }}
33+
REPO_NAME: ${{ github.repository }}
34+
# This single step installs dependencies (if needed) and runs the script
35+
run: cd KnowledgeBaseBot && uv run python auto_answer_bot.py
Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
import os
2+
import json
3+
import faiss
4+
from github import Github
5+
from sentence_transformers import SentenceTransformer
6+
import google.generativeai as genai
7+
8+
# --- Configuration ---
9+
GITHUB_TOKEN = os.getenv('GITHUB_TOKEN')
10+
REPO_NAME = os.getenv('REPO_NAME')
11+
try:
12+
ISSUE_NUMBER = int(os.getenv('ISSUE_NUMBER'))
13+
except (TypeError, ValueError):
14+
print("Error: Invalid or missing ISSUE_NUMBER environment variable. Exiting.")
15+
exit(1)
16+
MODEL_NAME = 'all-MiniLM-L6-v2'
17+
K_NEAREST_NEIGHBORS = 5 # Number of similar items to retrieve
18+
GEMINI_API_KEY = os.getenv('GEMINI_API_KEY')
19+
if not GEMINI_API_KEY:
20+
exit("Missed GEMINI api key")
21+
genai.configure(api_key=GEMINI_API_KEY)
22+
23+
# --- Initialization ---
24+
g = Github(GITHUB_TOKEN)
25+
repo = g.get_repo(REPO_NAME)
26+
issue = repo.get_issue(number=ISSUE_NUMBER)
27+
llm_model = SentenceTransformer(MODEL_NAME)
28+
29+
# --- Load the Unified Knowledge Base ---
30+
index = faiss.read_index("unified_index.faiss")
31+
with open("metadata.json", "rb") as f:
32+
metadata = json.load(f)
33+
with open("all_texts.json", "rb") as f:
34+
all_texts = json.load(f)
35+
36+
# --- Process the New Issue ---
37+
new_issue_text = f"Title: {issue.title}\nBody: {issue.body}"
38+
new_issue_embedding = llm_model.encode([new_issue_text]).astype('float32')
39+
40+
# --- Semantic Search ---
41+
distances, indices = index.search(new_issue_embedding, K_NEAREST_NEIGHBORS)
42+
context_pieces = []
43+
44+
for i in indices[0]:
45+
source_metadata = metadata[i]
46+
source_text = all_texts[i]
47+
48+
if source_metadata['source'] == 'documentation':
49+
context_pieces.append(f"--- Context from Documentation (file: {source_metadata['file']}) ---\n{source_text}")
50+
elif source_metadata['source'] == 'issue':
51+
context_pieces.append(f"--- Context from a Similar Issue ({source_metadata['url']}) ---\n{source_text}")
52+
53+
context = "\n\n".join(context_pieces)
54+
55+
# --- Generate Answer with LLM (Improved Prompt) ---
56+
prompt = f"""
57+
You are an expert GitHub Triage Assistant for an open-source project. Your primary goal is to ensure every issue is actionable for the developers. Your secondary goal is to answer questions using the provided context. Follow these steps in order:
58+
59+
**Step 1: Triage the User's Issue Quality**
60+
First, analyze the "New Issue" content. Does it contain enough detail to be actionable?
61+
- **GOOD ISSUE:** It has a clear description, steps to reproduce, error messages, or a specific question.
62+
- **POOR ISSUE:** It's a short, vague question (e.g., "it doesn't work"), it's missing crucial details, or the user has deleted the issue template.
63+
64+
**Step 2: Take Action Based on Triage**
65+
66+
<if_issue_is_poor>
67+
- Gently inform the user that more information is needed for the community to help.
68+
- Explain *why* details are important (e.g., "to understand the context and reproduce the problem").
69+
- Provide a clear, actionable list of what's missing. Use the official issue template as a guide (e.g., "Please provide steps to reproduce, the version you are using, and any error logs.").
70+
- Politely remind them that this is a community-supported open-source project and that clear, detailed reports are the best way to get helpful and fast responses from volunteers.
71+
- **Do NOT attempt to answer the question.** Your only goal is to improve the quality of the issue report.
72+
</if_issue_is_poor>
73+
74+
<if_issue_is_good>
75+
- Acknowledge their well-detailed report.
76+
- Now, analyze the "Relevant Context" provided (from documentation and past issues).
77+
- Generate a clear and helpful response based **strictly** on this context.
78+
- If the documentation provides an answer, summarize it and cite the source file.
79+
- If a past issue offers a solution, explain it and provide the URL to that issue.
80+
- If the context doesn't seem to fully resolve their detailed question, state that and mention a maintainer will look into it.
81+
- Conclude by thanking them for their contribution to the project.
82+
</if_issue_is_good>
83+
**New Issue:**
84+
{new_issue_text}
85+
86+
**Relevant Context (from Documentation and Past Issues):**
87+
{context}
88+
89+
**Suggested Answer (include links to sources if available):**
90+
"""
91+
"""
92+
response = openai.chat.completions.create(
93+
model="gpt-4", # Or "gpt-3.5-turbo", or any other model you prefer
94+
messages=[
95+
{"role": "system", "content": "You are an expert GitHub support assistant. Your mission is to answer user issues based solely on official documentation and the history of resolved issues."},
96+
{"role": "user", "content": prompt}
97+
]
98+
)
99+
"""
100+
# The system prompt from OpenAI is handled by system_instruction in Gemini
101+
system_instruction = "You are an expert GitHub support assistant. Your mission is to answer user issues based solely on official documentation and the history of resolved issues."
102+
103+
# https://ai.google.dev/gemini-api/docs/models
104+
# Create the model with the system instruction
105+
generative_model = genai.GenerativeModel(
106+
model_name="gemini-2.5-flash",
107+
system_instruction=system_instruction
108+
)
109+
110+
response = generative_model.generate_content(prompt)
111+
# --- Post the Comment ---
112+
final_comment = f"Hello @{issue.user.login}, thanks for reaching out.\n\n"
113+
final_comment += response.text
114+
final_comment += "\n\n---\n*This is an automated message generated from our documentation and issue history. If this doesn't solve your problem, someone will try to help you soon. Ensure that you checked other issues for the same issue!.* 🤖"
115+
116+
issue.create_comment(final_comment)
117+
118+
print(f"Enriched answer posted to issue #{ISSUE_NUMBER}")
Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
import os
2+
import json
3+
import faiss
4+
import numpy as np
5+
from github import Auth
6+
from github import Github
7+
from sentence_transformers import SentenceTransformer
8+
from langchain.text_splitter import RecursiveCharacterTextSplitter
9+
from langchain_community.document_loaders import DirectoryLoader
10+
from datetime import datetime, timezone
11+
12+
# --- Configuration ---
13+
GITHUB_TOKEN = os.getenv('GITHUB_TOKEN')
14+
# GITHUB_TOKEN =
15+
REPO_NAME = "kevoreilly/CAPEv2"
16+
DOCS_PATH = "../docs" # Path to the folder with your documentation files (e.g., .md)
17+
MODEL_NAME = 'all-MiniLM-L6-v2' # An efficient embedding model
18+
19+
# --- File Paths for State ---
20+
INDEX_FILE = "unified_index.faiss"
21+
METADATA_FILE = "metadata.json"
22+
TEXTS_FILE = "all_texts.json"
23+
STATE_FILE = "kb_state.json"
24+
25+
# init pandoc
26+
# from pypandoc.pandoc_download import download_pandoc
27+
# download_pandoc()
28+
29+
# --- Initialization ---
30+
auth = Auth.Token(GITHUB_TOKEN)
31+
g = Github(auth=auth)
32+
# auth=github.Auth.Token(...)
33+
repo = g.get_repo(REPO_NAME)
34+
model = SentenceTransformer(MODEL_NAME)
35+
36+
# --- Load Existing Knowledge Base or Initialize a New One ---
37+
if os.path.exists(INDEX_FILE):
38+
print("Loading existing knowledge base...")
39+
index = faiss.read_index(INDEX_FILE)
40+
with open(METADATA_FILE, "r") as f:
41+
metadata = json.load(f)
42+
with open(TEXTS_FILE, "r") as f:
43+
all_texts = json.load(f)
44+
with open(STATE_FILE, "r") as f:
45+
last_update_time = datetime.fromisoformat(json.load(f))
46+
print(f"Knowledge base loaded. Last update was at: {last_update_time}")
47+
new_issues = repo.get_issues(state='all', since=last_update_time, sort='created', direction='asc')
48+
else:
49+
print("No existing knowledge base found. Creating a new one.")
50+
index = None
51+
metadata = []
52+
all_texts = []
53+
# Set a very old date to fetch all issues for the first time
54+
last_update_time = datetime(1970, 1, 1, tzinfo=timezone.utc)
55+
new_issues = repo.get_issues(state='all', sort='updated', direction='asc')
56+
# Initial processing of documentation (only on first build)
57+
print("Processing documentation for the first time...")
58+
loader = DirectoryLoader(DOCS_PATH, glob="**/*.rst")
59+
docs = loader.load()
60+
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
61+
doc_chunks = text_splitter.split_documents(docs)
62+
for chunk in doc_chunks:
63+
all_texts.append(chunk.page_content)
64+
metadata.append({'source': 'documentation', 'file': chunk.metadata.get('source', 'N/A')})
65+
66+
# --- Process GitHub Issues ---
67+
# --- Fetch New Issues from GitHub ---
68+
print(f"Fetching issues updated since {last_update_time.isoformat()}...")
69+
# The 'since' parameter fetches issues updated on or after the given time
70+
# be aware since might not work
71+
72+
new_issue_texts = []
73+
new_issue_metadata = []
74+
latest_issue_time = last_update_time
75+
existing_issue_urls = {m['url'] for m in metadata if m.get('source') == 'issue'}
76+
77+
for issue in new_issues:
78+
# We check the updated_at time to ensure we save the most recent timestamp
79+
# ToDo this doesn't work properly
80+
#if issue.updated_at.replace(tzinfo=timezone.utc) > latest_issue_time:
81+
# latest_issue_time = issue.updated_at.replace(tzinfo=timezone.utc)
82+
83+
# Simple logic to avoid adding duplicates. For a robust system, you might check IDs.
84+
issue_url = issue.html_url
85+
if issue_url in existing_issue_urls:
86+
print(f"Skipping issue #{issue.number} as it might be a duplicate or minor update.")
87+
continue
88+
89+
print(f"Processing new/updated issue #{issue.number}")
90+
full_text = f"Title: {issue.title}\nBody: {issue.body}"
91+
for comment in issue.get_comments():
92+
full_text += f"\nComment: {comment.body}"
93+
94+
new_issue_texts.append(full_text)
95+
new_issue_metadata.append({'source': 'issue', 'number': issue.number, 'url': issue.html_url})
96+
97+
# --- Add New Issues to the Knowledge Base ---
98+
if new_issue_texts:
99+
print(f"Found {len(new_issue_texts)} new/updated issues to add.")
100+
101+
# Generate embeddings for new issues only
102+
new_embeddings = model.encode(new_issue_texts, show_progress_bar=True)
103+
new_embeddings = np.array(new_embeddings).astype('float32')
104+
105+
# If the index is new, create it
106+
if index is None:
107+
dimension = new_embeddings.shape[1]
108+
index = faiss.IndexFlatL2(dimension)
109+
110+
# Add new embeddings to the index and update metadata lists
111+
index.add(new_embeddings)
112+
all_texts.extend(new_issue_texts)
113+
metadata.extend(new_issue_metadata)
114+
115+
print("Knowledge base updated.")
116+
else:
117+
print("No new issues found. Knowledge base is already up-to-date.")
118+
119+
# --- Save the Updated Knowledge Base and State ---
120+
print("Saving knowledge base and state...")
121+
faiss.write_index(index, INDEX_FILE)
122+
with open(METADATA_FILE, "w") as f:
123+
json.dump(metadata, f, indent=2)
124+
with open(TEXTS_FILE, "w") as f:
125+
json.dump(all_texts, f, indent=2)
126+
# Save the timestamp of the latest issue we processed for the next run
127+
with open(STATE_FILE, "w") as f:
128+
json.dump(latest_issue_time.isoformat(), f)
129+
130+
print("Process complete!")

KnowledgeBaseBot/kb_state.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
"1970-01-01T00:00:00+00:00"

0 commit comments

Comments
 (0)