It's a Crawler / Spider for crawling company data on Yellow Page, it written in Python with Scrapy.
You should be install the Scrapy first, other packages (e.g. csv, datetime) should be installed by default.
You can check all the packages by the following command:
pip3 listOutputs:
Package Version
------------------ ---------
Scrapy 2.5.0
... ...If Scrapy is not on the list, you need to install it by:
pip3 install scrapy| Required Package |
|---|
| csv |
| datetime |
| scrapy |
| Tools | Version |
|---|---|
| Python | 3.9.6 |
You can run the crawler by following command, it will crawl the YelloPage with default keyword 體檢 which encoded as %E9%AB%94%E6%AA%A2:
scrapy crawl clinic_spiderHowever, you can pass any keyword to it with the -a custom arugment flag (e.g. -a {keyword})
scrapy crawl clinic_spider -a {KEYWORD_TO_BE_SEARCH}The crawler will generate the result csv files with filename format company_YYYYMMDD_HHmmss.csv when each crawl.
- Search: Search by Keyword (Done)
- Search: Search by Category (Not yet started)