Portal.ai fashion SKU crawler. Cafe24 (Playwright) + Shopify (JSON) harvester writing into Supabase + R2.
portal/app (Next.js) consumes this data via Supabase. No direct API between the two — DB is the contract.
[Crawler / EC2 batch] [Supabase + R2] [Vercel / Next.js]
───────────────────── ──────────────── ──────────────────
Cafe24 engine (Playwright) → products / brands / images → portal.ai
Shopify engine (/products.json) R2 bucket (image binaries) search & recommendation
configs/platforms.ts (32 sites)
pnpm install
pnpm exec playwright install chromium
cp .env.example .env
# fill in Supabase + R2 keys
pnpm tsx src/cli.ts crawl --platform=<key> --dry-run
pnpm tsx src/cli.ts crawl --platform=<key>pnpm typecheck
pnpm lintEC2 (c6i.large Spot recommended) + systemd timer / cron. See docs/operations.md (TBA).
src/
├── cli.ts # entrypoint
└── commands/ # crawl, import-products, probe-reviews, ...
engines/
├── cafe24/ # Playwright engine + per-site parsers
│ ├── index.ts
│ └── parsers/{detail,review}/
└── shopify/ # /products.json fetcher
└── index.ts
configs/
├── platforms.ts # PLATFORMS: SiteConfig[] — one entry = one site
└── analyze-prompt.ts
lib/
├── types.ts
├── database.types.ts # supabase gen types output
├── body-info-extractor.ts
└── product-analyzer.ts
output/ # gitignored (per-run cache)
| Metric | Value (at separation) |
|---|---|
| Platforms | 32 (22 Cafe24 KR + 10 Shopify global) |
| SKUs | ~81,000 (45k KR + 35k global) |
| Brands | 697 |
Roadmap: ZARA, H&M, 29CM, Musinsa, Uniqlo, Furutsu.
| Area | Choice |
|---|---|
| Runtime | Node.js + tsx (no transpile) |
| Browser automation | Playwright ^1.58 |
| HTTP fetch | native fetch (Shopify) |
| DB write | @supabase/supabase-js (service role) |
| Image storage | Cloudflare R2 (S3-compatible SDK) |
| Language | TypeScript |
| Lint / format | ESLint / Prettier (TBA) |
- Append a
SiteConfigobject toconfigs/platforms.ts - Cafe24: try defaults first → override
selectorsif needed - Shopify: only host is required
pnpm tsx src/cli.ts crawl --platform=<key> --dry-runto verify
| Project | Path | Role |
|---|---|---|
| portal.ai | endurance-ai/portal.ai | Next.js search & recommendation web (consumer) |
| ai-server | endurance-ai/ai-server | FastAPI search server (FashionSigLIP + pgvector) |
- Public repo — never commit
.env. Only.env.exampleis tracked. - DB schema is owned by
endurance-ai/portal.ai(supabase/migrations/). - Ported from
endurance-ai/portal.ai @ 5e3e7a0on 2026-05-05.