A complete PostgreSQL-based SQL project using a real-world e-commerce inventory dataset from Zepto, one of Indiaβs fastest-growing quick-commerce startups. This project demonstrates end-to-end data analytics skills β from data exploration to business-focused insights β using real analyst techniques like CASE, JOINS, CTEs, and Subqueries.
- β Realistic SQL project with retail use-case
- π§Ή Data cleaning & transformation (e.g. paise β rupees)
- π Business insights: revenue, stock status, product value
- π§ Uses advanced SQL techniques:
CASE
,JOIN
,CTEs
,Subqueries
- πΌ Perfect for interview prep, portfolio building, or SQL learning
- Source: Kaggle (scraped from Zeptoβs mobile app)
- Real-world e-commerce catalog structure
- Duplicate product names with different packaging/weights
- Each row = unique SKU (Stock Keeping Unit)
Column | Description |
---|---|
sku_id | Unique product ID |
name | Product name |
category | Product category (e.g. Fruits, Snacks) |
mrp | Max Retail Price (βΉ) |
discountPercent | % discount on MRP |
discountedSellingPrice | Final discounted price |
availableQuantity | Inventory units available |
weightInGms | Product weight (grams) |
outOfStock | Boolean - Is the product out of stock? |
quantity | No. of units per SKU |
- Removed products with MRP or price = 0
- Converted
mrp
&discountedSellingPrice
from paise to rupees - Handled null values and inconsistent entries
- Counted total rows
- Identified distinct categories
- Compared stock vs out-of-stock
- Detected repeated product names (same product, different SKU)
- Top discounted products
- High MRP but out-of-stock products
- Revenue by category
- Expensive products with low discount
- Avg. discount by category
- Price-per-gram value ranking
SELECT DISTINCT name, mrp, discountPercent
FROM zepto
ORDER BY discountPercent DESC
LIMIT 10;

SELECT DISTINCT name, mrp
FROM zepto
WHERE outOfStock = TRUE AND mrp > 300
ORDER BY mrp DESC;

SELECT category,
SUM(discountedSellingPrice * availableQuantity) AS total_revenue
FROM zepto
GROUP BY category
ORDER BY total_revenue DESC;

SELECT DISTINCT name, mrp, discountPercent
FROM zepto
WHERE mrp > 500 AND discountPercent < 10
ORDER BY mrp DESC, discountPercent DESC;

SELECT category,
ROUND(AVG(discountPercent), 2) AS avg_discount
FROM zepto
GROUP BY category
ORDER BY avg_discount DESC
LIMIT 5;

SELECT DISTINCT name, weightInGms, discountedSellingPrice,
ROUND(discountedSellingPrice / weightInGms, 2) AS price_per_gram
FROM zepto
WHERE weightInGms >= 100
ORDER BY price_per_gram;

SELECT DISTINCT name, weightInGms,
CASE
WHEN weightInGms < 1000 THEN 'Low'
WHEN weightInGms > 5000 THEN 'Medium'
ELSE 'Bulk'
END AS weight_category
FROM zepto;

SELECT category,
SUM(weightInGms * availableQuantity) AS total_weight
FROM zepto
GROUP BY category
ORDER BY total_weight DESC;
