This project is a Telegram web app bot called "Sannie." Sannie crawls the BBC Burmese website to display news content, allowing Telegram users to read the news without leaving the app. The entire user interface is displayed in Burmese language, providing a native experience for Burmese-speaking users.
A Telegram account is required.
- Direct Link: Chat the bot directly
- Search Method: Find the bot in Telegram's search bar:
@presenter_sannie_bot- Inline Button - Interactive buttons within messages
- Keyboard Button - Custom keyboard for easy navigation
- Inline Mode - Search and share content directly from any chat
/start- greet and return the main web app/help- describe how to use this bot/keyboard- return the keyboard button
Once the bot is started, it will automatically greet with a direct link to the website. The user can then:
- Browse by topic: Choose a topic, then a page, then click "Read" button to view content/article (Topic mode)
- Enter a single link to read content/article directly (Insert Link mode)
- Navigate between pages using Previous/Next buttons without returning to page selection
Note: All UI elements, buttons, and navigation are displayed in Burmese language for better accessibility.
The webscraper can
- scrape all topics (all pages of each topic) from BBC Burmese,
- scrape Burmese content/article with a filter,
- export scraped data in spreadsheet,
- store scraped data in DynamoDB (permanent storage),
- cache frequently accessed data in Redis (fast cache),
/.github/workflows- CI/CD pipeline (Ruff lint, ESLint, pytest on push/PR to main)ci.yml- runs lint (Python + JS) and tests
/caching prototypes- Development and testing files for caching system/db- DynamoDB Local database files and scripts/docs- Frontend files for GitHub Pages hosting/img- Project images and assets/notebooks- Jupyter notebooks for webscraper development and documentation/spreadsheets- Exported data (ignored in version control)/telegram-bot- Telegram bot scripts (requires.envfile for tokens)app.py- Main bot applicationcredentials.py- Environment variable handlerDockerfile- Bot container configurationProcfile- Bot deployment configuration (Railway / Render)requirements.txt- Bot-specific dependencies
/tests- Pytest testsconftest.py- shared fixtures (mocked Redis, DynamoDB, scraper; FastAPI client)test_api.py- FastAPI endpoint teststest_dynamo_helpers.py- db.dynamo_helpers teststest_telegram_bot.py- bot handler tests
/webscraper- Main Python web scraping scripts and modules/modules- Modular scraping scripts
api.py- FastAPI server for web scraping endpointsdocker-compose.yml- Orchestrates all Docker services (FastAPI, Redis, Telegram Bot)Dockerfile- FastAPI app container configurationflow.md- Control flow documentationDEVELOPMENT_GUIDE.md- Local development setup guideDEPLOYMENT_GUIDE.md- Production deployment guideProcfile- FastAPI app deployment configuration (Railway / Render)pyproject.toml- Ruff linter configurationpytest.ini- Pytest configurationrequirements.txt- Runtime dependenciesrequirements-dev.txt- Development and test dependencies.eslintrc.json- ESLint configuration for JavaScript.eslintignore- Paths ignored by ESLint
- Web Scraper Development - Check all notebooks in the
/notebooksfolder for detailed documentation on how the customized web-crawler evolved from scratch - Integration - The web scraper is combined with multiple components:
- Frontend (web-hosted with GitHub Pages) with improved navigation and direct view buttons
- Telegram bot (created to use the hosted frontend)
- Redis cache (fast in-memory storage for frequently accessed data)
- DynamoDB (permanent storage for all scraped data)
- Three-tier caching strategy: Redis -> DynamoDB -> Web scraping
- CONTROL_FLOW.md - Control flow documentation and user journey
- DEVELOPMENT_GUIDE.md - Local development setup and commands
- DEPLOYMENT_GUIDE.md - Production deployment instructions (Railway/Render)
This project is intended for educational purposes.