Skip to content

feat: add selective image roast analysis#183

Open
TaiLaaa wants to merge 5 commits into
SXP-Simon:mainfrom
TaiLaaa:image-roast-analysis
Open

feat: add selective image roast analysis#183
TaiLaaa wants to merge 5 commits into
SXP-Simon:mainfrom
TaiLaaa:image-roast-analysis

Conversation

@TaiLaaa

@TaiLaaa TaiLaaa commented May 8, 2026

Copy link
Copy Markdown

改动

  • 新增图片锐评开关、数量限制和提示词配置
  • 从群聊图片中提取候选图,过滤 bot 日报图/表情图
  • 使用视觉模型筛选图片,海报/壁纸/普通美图返回 SKIP 后不进入日报
  • 图片锐评独立展示在 scrapbook 模板的金句之后,避免和文字锐评粘连

校验

  • python3 -m json.tool _conf_schema.json
  • python3 -m py_compile src/domain/models/data_models.py src/domain/services/statistics_service.py src/infrastructure/config/config_manager.py src/application/services/analysis_application_service.py src/infrastructure/reporting/generators.py
  • git diff --check

Summary by Sourcery

Add image roast extraction, analysis, and rendering to daily group chat reports.

New Features:

  • Introduce ImageSummaryItem model to represent roasted images in analysis results.
  • Extract candidate images from group messages while filtering out bot daily report images and emoji-like images for potential roasting.
  • Analyze candidate images with a dedicated vision-enabled provider using configurable prompts to select only highlight-worthy images and generate short roasts.
  • Render an "image moments" section in scrapbook HTML templates to display selected images and their roasts separately from text golden quotes.

Enhancements:

  • Add configuration options to toggle image roasting, control maximum image count, and customize the image roast prompt template.

@sourcery-ai

sourcery-ai Bot commented May 8, 2026

Copy link
Copy Markdown
Contributor

Reviewer's Guide

Adds a configurable image roast (image summary) feature that selects candidate chat images (excluding bot reports and emojis), runs them through a visual LLM for roast-style summaries, and renders them as a separate “image moments” section in the scrapbook reports, controlled via new config flags and prompt templates.

Sequence diagram for daily analysis with selective image roast

sequenceDiagram
    actor User
    participant Bot as BotService
    participant AnalysisApp as AnalysisApplicationService
    participant Config as ConfigManager
    participant Stats as StatisticsService
    participant LLM as VisualLLMProvider
    participant ReportGen as ReportingGenerator

    User->>Bot: request_daily_report()
    Bot->>AnalysisApp: execute_daily_analysis()

    Note over AnalysisApp: existing text analysis flow omitted

    AnalysisApp->>Config: get_image_summary_enabled()
    Config-->>AnalysisApp: enabled_flag
    alt image_summary_enabled
        AnalysisApp->>Config: get_bot_self_ids()
        Config-->>AnalysisApp: config_bot_ids
        AnalysisApp->>AnalysisApp: read_runtime_bot_ids()
        AnalysisApp->>Config: get_max_image_summaries()
        Config-->>AnalysisApp: max_image_summaries

        AnalysisApp->>Stats: extract_image_summaries(unified_messages, limit, bot_self_ids)
        Stats->>Stats: filter_bot_daily_report_images()
        Stats->>Stats: filter_emoji_like_images()
        Stats-->>AnalysisApp: raw_image_summaries

        loop for each image_summary
            AnalysisApp->>Config: get_image_summary_prompt()
            Config-->>AnalysisApp: prompt_template
            AnalysisApp->>LLM: llm_generate(provider_id, prompt, image_url, system_prompt)
            LLM-->>AnalysisApp: completion_text
            AnalysisApp->>AnalysisApp: decide_keep_or_skip()
            alt keep
                AnalysisApp->>AnalysisApp: set model_summary and description
            else skip
                AnalysisApp->>AnalysisApp: drop image
            end
        end

        AnalysisApp->>AnalysisApp: statistics.image_summaries = kept_items
    else image_summary_disabled
        AnalysisApp->>AnalysisApp: statistics.image_summaries = []
    end

    AnalysisApp-->>Bot: analysis_result (includes image_summaries)
    Bot->>ReportGen: _prepare_render_data(analysis_result)
    ReportGen->>ReportGen: build_image_summary_html(image_summaries)
    ReportGen-->>Bot: scrapbook_html
    Bot-->>User: send_scrapbook_report(scrapbook_html)
Loading

Class diagram for new image summary data model and services

classDiagram
    class ImageSummaryItem {
        +str url
        +str sender
        +str sender_id
        +str description
        +str model_summary
    }

    class StatisticsService {
        +extract_image_summaries(messages, limit, bot_self_ids) list~ImageSummaryItem~
        +_is_bot_daily_report_image(msg, bot_self_ids) bool
        +_is_emoji_like_image(raw_data) bool
    }

    class AnalysisApplicationService {
        +execute_daily_analysis(...)
        +_enrich_image_summaries(image_summaries, unified_msg_origin, keep_limit) list
    }

    class ConfigManager {
        +get_image_summary_enabled() bool
        +get_max_image_summaries() int
        +get_image_summary_prompt(style) str
        +get_max_golden_quotes() int
        +get_bot_self_ids() list
    }

    class ReportingGenerator {
        +_prepare_render_data(analysis_result, stats, activity_viz, chat_quality_review) dict
    }

    class LLMAnalyzerContext {
        +llm_generate(chat_provider_id, prompt, image_urls, system_prompt) LLMResponse
    }

    class LLMResponse {
        +str completion_text
    }

    AnalysisApplicationService --> StatisticsService : uses
    AnalysisApplicationService --> ConfigManager : reads_config
    AnalysisApplicationService --> LLMAnalyzerContext : calls
    StatisticsService --> ImageSummaryItem : creates
    ReportingGenerator --> ConfigManager : reads_limits
    ReportingGenerator --> ImageSummaryItem : renders
    LLMAnalyzerContext --> LLMResponse : returns
Loading

File-Level Changes

Change Details Files
Introduce an ImageSummaryItem model and extraction pipeline to collect candidate images from chat history while filtering out bot report and emoji-like images.
  • Add ImageSummaryItem dataclass to represent images and their roast metadata.
  • Add _is_bot_daily_report_image helper to detect and exclude bot/daily-report images based on sender metadata, text, and raw image metadata.
  • Add extract_image_summaries method to walk recent messages in reverse, skip bot/report/emoji-like images, resolve image URLs from multiple raw formats, and build ImageSummaryItem instances with a default description, honoring a configurable limit.
src/domain/models/data_models.py
src/domain/services/statistics_service.py
Wire image summary generation into the daily analysis flow with a configurable toggle, limits, and a dedicated image-summary LLM provider and prompt.
  • Extend execute_daily_analysis to, when enabled, collect candidate images using statistics_service, over-sample them, and then enrich/filter them via an image-summary LLM call.
  • Implement _enrich_image_summaries to choose an image-summary provider, apply a system prompt plus configurable prompt template, call llm_generate with image URLs, skip images whose response indicates SKIP, and attach the roast text back to items with length limiting and error handling.
  • Include the resulting image_summaries in the analysis_result payload for downstream reporting.
src/application/services/analysis_application_service.py
Render a dedicated “image moments” section in scrapbook reports using the image summaries, decoupled from text golden quotes.
  • Generate HTML for an image roast section that limits the number of displayed items based on config, supports both dict and object forms of image summary items, and falls back to a default caption when no roast text is present.
  • Inject the rendered image_summary_html into the render context and templates so it appears after golden quotes but before chat quality review.
src/infrastructure/reporting/generators.py
src/infrastructure/reporting/templates/scrapbook/html_template.html
src/infrastructure/reporting/templates/scrapbook/image_template.html
Add configuration surface for enabling image summaries, controlling their count, and customizing the prompt.
  • Expose get_image_summary_enabled, get_max_image_summaries, and get_image_summary_prompt on ConfigManager, with sensible defaults and support for per-style prompt variants.
  • Define a default Chinese prompt template that instructs the LLM to SKIP non-meme images and to roast suitable images in under 60 characters in an on-brand tone.
src/infrastructure/config/config_manager.py
_conf_schema.json

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • The _is_bot_daily_report_image heuristic hardcodes a fairly long list of sender/text markers; consider moving these marker lists into configuration (or at least a module-level constant) so they can be tuned per deployment without code changes.
  • The new image summary flow uses very loose typing (list[Any], duck-typed item in _enrich_image_summaries and generator HTML); tightening this to ImageSummaryItem | dict (or a dedicated protocol) would make it easier to catch misuse and refactors at compile-time.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `_is_bot_daily_report_image` heuristic hardcodes a fairly long list of sender/text markers; consider moving these marker lists into configuration (or at least a module-level constant) so they can be tuned per deployment without code changes.
- The new image summary flow uses very loose typing (`list[Any]`, duck-typed `item` in `_enrich_image_summaries` and generator HTML); tightening this to `ImageSummaryItem | dict` (or a dedicated protocol) would make it easier to catch misuse and refactors at compile-time.

## Individual Comments

### Comment 1
<location path="src/infrastructure/reporting/generators.py" line_range="831-833" />
<code_context>
+        # 生成图片锐评HTML:独立于金句区块,避免和最后一条文字锐评粘连
+        image_summary_html = ""
+        image_summaries = analysis_result.get("image_summaries") or []
+        max_image_summaries = min(
+            self.config_manager.get_max_golden_quotes(),
+            getattr(self.config_manager, "get_max_image_summaries", lambda: 5)(),
+        )
+        display_image_summaries = image_summaries[:max_image_summaries]
</code_context>
<issue_to_address>
**question (bug_risk):** Coupling the max image summaries to `max_golden_quotes` may unintentionally constrain image sections.

Because `max_image_summaries` is computed as `min(max_golden_quotes, get_max_image_summaries())`, a low `max_golden_quotes` will cap images even if `max_image_summaries` is configured higher. If this coupling is intentional, please document it in the config semantics; otherwise, consider using only `get_max_image_summaries()` here so image limits are independent of golden quotes:

```python
max_image_summaries = getattr(self.config_manager, "get_max_image_summaries", lambda: 5)()
```
</issue_to_address>

### Comment 2
<location path="src/application/services/analysis_application_service.py" line_range="370" />
<code_context>
+                text = (getattr(resp, "completion_text", "") or "").strip()
+                normalized = text.strip().lower().strip("`*_ -。.!!")
+                if not text or normalized.startswith("skip") or normalized in {"跳过", "不保留", "忽略"}:
+                    logger.info(f"图片锐评筛选跳过普通图片: {url}")
+                    continue
+                item.model_summary = text[:160]
</code_context>
<issue_to_address>
**🚨 suggestion (security):** Logging full image URLs for skipped pictures might be too verbose or leak sensitive links.

To reduce this risk, consider omitting the URL, logging only a derived ID (e.g., hash or index), or lowering the log level. For example:

```python
logger.debug("图片锐评筛选跳过普通图片", extra={"image_idx": idx})
```

Suggested implementation:

```python
                if not text or normalized.startswith("skip") or normalized in {"跳过", "不保留", "忽略"}:
                    logger.debug("图片锐评筛选跳过普通图片")
                    continue

```

If the surrounding loop exposes an `idx` or other stable identifier, you can further adjust the log to include it without leaking the URL, for example:
- Change the call to `logger.debug("图片锐评筛选跳过普通图片", extra={"image_idx": idx})` when `idx` is available.
- Alternatively, if you prefer a hash-based ID, compute a hash from `url` (e.g., using `hashlib.sha256(url.encode()).hexdigest()`) and log that instead: `extra={"image_id": image_id}`.
These adjustments should be made where the loop variables are defined.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +831 to +833
max_image_summaries = min(
self.config_manager.get_max_golden_quotes(),
getattr(self.config_manager, "get_max_image_summaries", lambda: 5)(),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question (bug_risk): Coupling the max image summaries to max_golden_quotes may unintentionally constrain image sections.

Because max_image_summaries is computed as min(max_golden_quotes, get_max_image_summaries()), a low max_golden_quotes will cap images even if max_image_summaries is configured higher. If this coupling is intentional, please document it in the config semantics; otherwise, consider using only get_max_image_summaries() here so image limits are independent of golden quotes:

max_image_summaries = getattr(self.config_manager, "get_max_image_summaries", lambda: 5)()

text = (getattr(resp, "completion_text", "") or "").strip()
normalized = text.strip().lower().strip("`*_ -。.!!")
if not text or normalized.startswith("skip") or normalized in {"跳过", "不保留", "忽略"}:
logger.info(f"图片锐评筛选跳过普通图片: {url}")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚨 suggestion (security): Logging full image URLs for skipped pictures might be too verbose or leak sensitive links.

To reduce this risk, consider omitting the URL, logging only a derived ID (e.g., hash or index), or lowering the log level. For example:

logger.debug("图片锐评筛选跳过普通图片", extra={"image_idx": idx})

Suggested implementation:

                if not text or normalized.startswith("skip") or normalized in {"跳过", "不保留", "忽略"}:
                    logger.debug("图片锐评筛选跳过普通图片")
                    continue

If the surrounding loop exposes an idx or other stable identifier, you can further adjust the log to include it without leaking the URL, for example:

  • Change the call to logger.debug("图片锐评筛选跳过普通图片", extra={"image_idx": idx}) when idx is available.
  • Alternatively, if you prefer a hash-based ID, compute a hash from url (e.g., using hashlib.sha256(url.encode()).hexdigest()) and log that instead: extra={"image_id": image_id}.
    These adjustments should be made where the loop variables are defined.

@SXP-Simon

Copy link
Copy Markdown
Owner

想法挺好,能不能看看实际效果图

@TaiLaaa

TaiLaaa commented May 8, 2026

Copy link
Copy Markdown
Author

想法挺好,能不能看看实际效果图

Screenshot_2026-05-08-19-00-50-396_com microsoft emmx_1778238179525edit Screenshot_2026-05-08-19-00-34-053_com microsoft emmx_1778238172372edit Image_1778238423660 Screenshot_2026-05-08-19-00-29-354_com microsoft emmx_1778238165314edit Screenshot_2026-05-08-19-00-50-396_com microsoft emmx_1778238179525edit Screenshot_2026-05-08-19-00-34-053_com microsoft emmx_1778238172372edit Screenshot_2026-05-08-19-00-29-354_com microsoft emmx_1778238165314edit

@TaiLaaa

TaiLaaa commented May 8, 2026

Copy link
Copy Markdown
Author

Uploading Image_1778238423660.jpg…

@SXP-Simon

Copy link
Copy Markdown
Owner

我最近期中考试,等有时间会审一下的,很好的 PR,感谢贡献

@Liangyu-G

Copy link
Copy Markdown
Contributor

Thanks for the work on the image roast flow — the separation from text golden quotes looks useful.

One configuration detail I’d like to confirm before merge: the code now looks up image_summary_provider_id for the vision-enabled image analysis path, but I don’t see an explicit config schema entry or user-facing explanation for that key in _conf_schema.json. Could you please either add the schema/documentation for image_summary_provider_id, or clarify whether it is intentionally expected to reuse/fall back to the main Provider when unset?

This would help users understand what needs to be configured for image input support and avoid silent fallback/misconfiguration surprises.

@Liangyu-G

Copy link
Copy Markdown
Contributor

巡检时看了一下这组改动,整体方向很有价值:图片锐评默认关闭、候选图先过滤再走视觉模型、并且增加了 max_concurrent_image_llm,这些对成本和风控都比较友好。

合入前建议再补两处小修,能降低后续维护风险:

  1. 视觉 Provider 配置入口需要补齐或在文档里明确回退语义
    代码里 _enrich_image_summaries() 使用了 get_provider_id_with_fallback(..., "image_summary_provider_id", ...),但本轮 diff 里我没有看到 _conf_schema.json / ConfigManager 增加 image_summary_provider_id。如果目标是“没有专用视觉 Provider 时回退主 LLM”,建议在配置 hint / README 明确说明;如果希望用户可单独指定视觉模型,则需要把这个字段补到 llm 配置组里,避免用户开启图片锐评后不知道该在哪里配置支持图片输入的模型。

  2. ATRI 模板里有一个疑似误改的文案
    src/infrastructure/reporting/templates/ATRI/image_template.html 中:

    亚托莉偷偷记 en 小本子上的宝藏名言!

    这里看起来应为“记在小本子上”。这个属于低风险小 typo,建议顺手修掉。

另外,之前 bot 提到的“跳过图片时不要记录完整 URL”和“图片数量不要受金句数量限制”看起来在最新提交里已经基本处理了(debug 级别日志、不再 min(max_golden_quotes, ...)),这两点是加分项。

@Liangyu-G

Copy link
Copy Markdown
Contributor

定时协作巡检看到这个 PR,整体方向很有价值,尤其是把图片名场面做成默认关闭、候选数/展示数分离、并发限流都比较稳。

我这边快速过了一遍 diff,有两个小点供参考:

  1. _conf_schema.json 里目前只新增了 image_summary_enabled / max_image_summaries / max_image_candidates,但实现里会用 image_summary_provider_id 作为 Provider key(get_provider_id_with_fallback(..., "image_summary_provider_id", ...) / call_provider_with_retry(... provider_id_key="image_summary_provider_id"))。如果希望用户显式选择支持视觉输入的模型,建议补一个 llm.image_summary_provider_id_special: select_provider)配置;否则 README/配置提示里最好说明会回退到主 Provider/会话 Provider,避免用户开启后却用到不支持图片输入的模型。
  2. src/infrastructure/reporting/templates/ATRI/image_template.html 里有一处疑似误改文案:亚托莉偷偷记 en 小本子上的宝藏名言!,看起来应为 记在小本子上

没有做阻塞性操作,只是巡检建议。

@Liangyu-G Liangyu-G left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

维护者巡检:图片锐评功能有价值,但当前 PR 混入了大量 HatsuneMiku 移动端/外链图片替换等已由 #184 处理或与“图片锐评”无关的模板大改,且视觉模型 provider 配置键未在 schema 中暴露,低风险合入条件还不满足。请优先把 PR 收敛为纯图片锐评逻辑,并基于最新 main 重做;Sourcery 提到的 URL 泄露与数量耦合问题看起来已在后续提交里基本处理,但仍需要确认最终 diff。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants