A Python web scraper to collect product registration data from the Indonesian Food and Drug Authority (BPOM) website and store it in a PostgreSQL database.
- Scrapes 639,024+ product records from BPOM website
- Handles pagination automatically
- Stores data in PostgreSQL database
- Progress tracking with progress bar
- Error handling and retry mechanism
- Batch processing for efficient database operations
- Clone the repository
- Install dependencies:
pip install -r requirements.txt- Create
.envfile from.env.example:
cp .env.example .env- Update
.envwith your database credentials if different
Run the scraper:
python main.pyThe scraper will:
- Connect to the PostgreSQL database
- Create the necessary table if it doesn't exist
- Fetch all product data from BPOM API
- Store the data in the database with batch processing
- Show progress with a progress bar
The scraper creates a table bpom_products with all fields from the BPOM API response.
Edit .env file to configure:
DATABASE_URL: PostgreSQL connection stringBATCH_SIZE: Number of records to insert per batch (default: 100)REQUEST_TIMEOUT: HTTP request timeout in seconds (default: 30)MAX_RETRIES: Maximum number of retry attempts (default: 3)
Data is scraped from: https://cekbpom.pom.go.id/produk-dt/all