Skip to content

feat(workflow): Consolidated multi-platform research pipeline — expand XResearch into SourceDiscovery #258

@petermuidev

Description

@petermuidev

Feature Description

Expand the current XResearch workflow into a unified SourceDiscovery pipeline that collects trending content from multiple platforms (X/Twitter, TikTok, YouTube Shorts, Instagram Reels, Reddit) and consolidates it into a single research feed before feeding downstream workflows.

Motivation

Currently XResearch only scrapes X/Twitter trending topics → AI-analyzed posts → Telegram reports. The Video Sourcing + Retouch workflow (see companion issue) needs broader source discovery — not just text posts but also video content — to find viral clips across platforms, extract key themes, and feed them as source material for retouch/re-upload.

A consolidated research pipeline would:

  • Find video content (not just text posts) across TikTok, YouTube Shorts, Instagram Reels
  • Cross-reference trends across platforms — see the same trend on X, TikTok, and Reddit
  • Deduplicate and rank source material by engagement metrics across all platforms
  • Store source metadata (URL, platform, views, engagement, thumbnail, duration) in a structured format that downstream workflows can consume
  • Send consolidated research reports to Telegram (like current XResearch)

Proposed Solution

New class: src/classes/SourceDiscovery.py

SourceDiscovery(fp_profile_path)
  ├── discover_x_trends()        # existing XResearch logic, refactored
  ├── discover_tiktok_trends()   # scrape TikTok trending/creative center
  ├── discover_youtube_trends()  # scrape YouTube Shorts trending
  ├── discover_reddit_trends()   # scrape Reddit popular/rising posts
  ├── consolidate_sources()      # merge + dedupe + rank across platforms
  └── run() → List[SourceItem]   # unified pipeline

SourceItem schema (replaces current XResearch post dict)

{
  "id": "src-<platform>-<hash>",
  "platform": "tiktok|x|youtube|reddit|instagram",
  "url": "https://...",
  "title": "...",
  "content": "...",
  "views": 0,
  "engagement": { "likes": 0, "shares": 0, "comments": 0 },
  "has_video": true,
  "video_url": "https://...",
  "thumbnail_url": "https://...",
  "duration_seconds": 30,
  "trend_name": "...",
  "ai_analysis": { "main_topic": "...", "keywords": [], "angle": "...", "summary": "..." },
  "discovered_at": "2026-04-25T..."
}

New script: scripts/run_source_discovery.py

CLI entry point mirroring scripts/xresearch.py pattern, with flags:

  • --platforms x,tiktok,youtube,reddit
  • --min-engagement 1000
  • --save-json (persist SourceItems to .mp/sources/)
  • --send-telegram

Storage: .mp/sources/ directory

Each run saves sources_YYYY-MM-DD.json so downstream workflows can pick up discovered sources.

Current XResearch → Migration

  • XResearch.py stays functional but internal methods get refactored into SourceDiscovery
  • xresearch.py script stays as shortcut: --platforms x
  • Full migration path: old script works, new script is the supercharged version

Alternatives Considered

  • Per-platform separate scripts → rejected (cross-platform dedup/ranking is core value)
  • API-based discovery → rejected (most platforms require auth; Selenium matches existing pattern)
  • Just extend XResearch → rejected (scope too different; new class keeps things clean)

Dependencies

  • Companion issue: Video Sourcing + Retouch workflow
  • Requires yt-dlp for video metadata extraction

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions