This project extracts structured product data from ShopWSS.com, covering shoes, clothing, and accessories in one consistent format. It helps teams collect rich retail catalog data without relying on limited or unavailable public APIs. Built for speed and scale, it turns complex storefront pages into clean, usable datasets.
Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for shopwss-com-scraper you've just found your team — Let’s Chat. 👆👆
The ShopWSS.com Scraper is designed to collect detailed product information from a large online footwear and apparel retailer. It solves the problem of manually gathering or maintaining product catalogs by automating data extraction at scale. This tool is ideal for developers, data teams, and ecommerce analysts who need reliable product-level data.
- ShopWSS does not offer a public, free product API
- Product listings span collections, search results, and detail pages
- Manual catalog tracking is time-consuming and error-prone
- Structured data is required for analytics, syncing, or research
| Feature | Description |
|---|---|
| Collection crawling | Extracts all products from category and collection pages. |
| Search-based extraction | Retrieves products using keyword-driven search results. |
| Full product details | Captures descriptions, images, variants, options, and availability. |
| Variant-level data | Includes size, color, pricing, stock, and SKU per variant. |
| Pagination control | Scrape specific page ranges or limit total items collected. |
| Extensible output | Supports custom mapping and transformation of extracted data. |
| Field Name | Field Description |
|---|---|
| id | Unique product identifier. |
| url | Direct link to the product page. |
| title | Product title or name. |
| description | Plain-text product description. |
| descriptionHTML | Rich HTML description content. |
| vendor | Brand or manufacturer name. |
| availableForSale | Indicates if the product can be purchased. |
| collections | Categories and collections the product belongs to. |
| tags | Product tags and labels. |
| variants | Variant-level pricing, stock, SKU, size, and color data. |
| images | Product image URLs with dimensions. |
| options | Available option types such as size and color. |
| createdAt | Product creation timestamp. |
| updatedAt | Last update timestamp. |
[
{
"type": "product",
"url": "https://www.shopwss.com/products/a03061c",
"slug": "a03061c",
"vendor": "CONVERSE",
"availableForSale": true,
"quantity": 182,
"tags": ["Classic", "CONVERSE", "on-sale", "Womens"],
"variants": [
{
"sku": "11155400021",
"price": "69.98",
"availableQuantity": 39,
"options": {
"Size": "06.0",
"Color": "Squirrel Friend/Black/White"
}
}
],
"images": [
"https://cdn.shopify.com/s/files/1/0069/3442/9751/products/A03061C_1.jpg"
]
}
]
ShopWSS.com Scraper/
├── src/
│ ├── index.js
│ ├── crawler/
│ │ ├── productCrawler.js
│ │ ├── collectionCrawler.js
│ │ └── searchCrawler.js
│ ├── parsers/
│ │ ├── productParser.js
│ │ └── variantParser.js
│ ├── utils/
│ │ ├── pagination.js
│ │ └── helpers.js
│ └── config/
│ └── settings.example.json
├── data/
│ ├── sample-input.json
│ └── sample-output.json
├── package.json
├── package-lock.json
└── README.md
- Ecommerce analysts use it to monitor product availability and pricing, so they can track trends and stock changes.
- Developers use it to sync external catalogs into internal systems, so data stays consistent across platforms.
- Market researchers use it to analyze brand presence and assortment, so they can make informed decisions.
- Retail aggregators use it to build comparison tools, so users can evaluate products across stores.
- Data teams use it to generate datasets for reporting and modeling, so insights are based on real inventory data.
What types of pages are supported? The scraper supports collection pages, keyword search results, and individual product detail pages. Each page type is handled with dedicated logic for accuracy.
Can I limit how much data is collected? Yes. You can restrict scraping by page range or maximum item count, which is useful for testing or targeted extraction.
Does it handle product variants correctly? Each variant is captured individually, including size, color, price, stock availability, and SKU, ensuring variant-level precision.
Is proxy usage required? Yes. Using a proxy is recommended to maintain stability and avoid request blocking during large-scale runs.
Primary Metric: Processes roughly 100 product listings in about 2 minutes under normal conditions.
Reliability Metric: Maintains a high completion rate when run with stable proxy routing and valid inputs.
Efficiency Metric: Optimized request flow minimizes redundant page loads, keeping resource usage low.
Quality Metric: Extracted records consistently include complete product, variant, and image data with minimal missing fields.
