Skip to content

Hurthv/PlayCast

Repository files navigation

PlayCast

PlayCast is a flexible Playwright-based framework for extracting structured product data from multiple Russian e-commerce websites. It provides a simple API for parsing sites like Citilink, DNS, OZON, Avito, and 28bit.

What it does

  • Site-specific parsers for Citilink, DNS, OZON, Avito, and 28bit
  • Uses Playwright and browser automation to fetch pages and collect search results
  • Includes shared utilities for human-like typing, scrolling, and selector handling -- Simple import and usage: from playcast.Parser import ParseOzon

Why this project

The goal is to make a reusable scraping toolkit instead of a one-off script. In the future, this project can grow into an extensible library with:

  • a common parser base class
  • plugin-style parser registration
  • configuration-driven selectors
  • optional LLM assistance for selector discovery and page analysis

Installation

python -m pip install -U pip
python -m pip install playwright beautifulsoup4 fake-useragent httpx lxml requests undetected-playwright playwright-stealth rebrowser-playwright
python -m playwright install chromium

Usage

Import and use a parser:

import asyncio
from multiparser.Parser import ParseOzon

async def main():
    results = await ParseOzon.get_cards_by_placeholder("RTX 5080")
    print(results)

asyncio.run(main())

This will launch Playwright, search for "RTX 5080" on OZON, extract product data, and return the results.

Project structure

  • playcast/Parser.py - Main parser classes with get_cards methods
  • parsers/ - Individual parser implementations
  • common/utils.py - Shared helper functions
  • data/config.py - URLs, default search keywords, and constants

Future ideas

A possible future improvement is to use a large language model (LLM) to analyze a page's HTML and automatically suggest selectors or attribute patterns. This will make the parser more flexible to site changes and reduce the amount of manual CSS/XPath configuration, but the problem is that large language models will be too large.

Contribution

  1. Add a new parser under parsers/
  2. Create a corresponding class in playcast/Parser.py
  3. Reuse parsers/base.py for shared behaviors
  4. Keep site-specific selectors and actions isolated
  5. Add tests for new parser output and utilities

License

This repository is available under the MIT License.

About

Playwright-based framework for scraping e-commerce websites (Russia + International)

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages