This project is dedicated to analyzing the sales funnel of an online store based on the User Funnels Dataset. The goal of the analysis is to track the user's journey from the first visit to the site to a successful purchase, identify critical drop-off points, and provide sound recommendations for improving conversion.
Data source: Kaggle: User Funnels Dataset
Database: PostgreSQL (SQL for ETL, cleansing and analysis).
Programming language: Python 3.x.
Libraries: Pandas, Matplotlib, NumPy.
1. Data Acquisition and Loading (ETL)
At this stage, the analysis infrastructure was prepared. Data from the raw CSV file was imported into a relational database to ensure integrity and the ability to write complex queries.
Tools: PostgreSQL, SQL.
What was done: The user_funnels table schema was defined, data types were configured, and the import was performed using the COPY command.
File: 01_create_table.sql
2. Data Cleaning and Validation
Before calculations, the data was checked for sterility. The quality of the results directly depends on the purity of the source data.
Checks:
- Finding and processing NULL values โโand empty strings.
- Checking for duplicates (uniqueness of user_id + stage pairs).
- Business logic validation: funnel stages match the specified list.
Result: The dataset is considered valid (17,175 records, 0 gaps).
File: 02_data_cleaning.sql
3. Exploratory Analysis (EDA)
Initial review of the data to understand general trends and distributions.
Metrics:
- Total number of unique users at each stage.
- Percentage of completed and incomplete conversions.
- User Journey Mapping for visual sequence verification.
File: 03_eda.sql
4. Deep Funnel Analysis (Funnel Analysis) The main analytical section, where the product's key performance indicators (KPIs) are calculated.
Calculations:
- Step-to-Step Conversion: The percentage of users who move from the current step to the next.
- Drop-off Rate: The percentage of users who drop off at each step.
- Cumulative Conversion (Overall CR): The total conversion rate from the first step to a purchase.
Methods: Using window functions (LAG, FIRST_VALUE) to compare data between rows.
File: 04_funnel_analysis.sql
5. User Segmentation
Dividing the audience into groups based on their behavior (funnel depth) for targeted marketing.
Segments: Bounce (homepage only), Browser (product viewer), Cart Abandoner (abandoned cart), Buyer.
Result: It was found that 49% of users are bouncers, indicating a traffic or content relevance issue on the homepage.
File: 05_segmentation.sql
6. Calculating key business metrics Based on the cleaned data, key business metrics were calculated to assess the effectiveness of the product's sales funnel.
File: 06_advanced_metrics.sql
According to the analysis:
Overall Conversion Rate (Overall CR): 2.25%.
Highest churn: Occurs at Product Page โ Cart (70% loss) and Cart โ Checkout (70% loss).
Segmentation: Almost 50% of users leave the site after viewing only the main page (Bounce rate).
Transforming dry numbers into understandable visuals for business.
- Visual funnel with drop-off
- Step conversion & drop-off pie
- Cumulative conversion curve
User-Funnels/ โ project directory
โโโ data/ โ project data
โ โโโ raw/ โ raw data
โ โโโ processed/ โ cleaned data
โ
โโโ sql/ - sql queries
โ โโโ 01_create_table.sql # Schema + data loading
โ โโโ 02_data_cleaning.sql # Validation & quality checks
โ โโโ 03_eda.sql # Exploratory data analysis
โ โโโ 04_funnel_analysis.sql # Core funnel metrics
โ โโโ 05_segmentation.sql # User segmentation
โ โโโ 06_advanced_metrics.sql # Window functions & advanced SQL
โ
โโโ images/ - visualization
โ โโโ funnel_chart.png # Visual funnel with drop-off
โ โโโ conversion_charts.png # Step conversion + drop-off pie
โ โโโ cumulative_conversion.png # Cumulative conversion curve
โ
โโโ notebooks/ - colab or jupyter notebook files
โ โโโ notebook.ipynb # ะกreating visualizations
โโโ reports/ โ report of project
โ โโโ report.pdf # Project report file
โโโ Project-logo.png โ project cover
โโโ LICENSE โ MIT License
โโโ requirements.txt โ list of libraries to run the project
โโโ README.md โ project description.
- Clone the repository.
- Run the SQL scripts from the /sql folder in your PostgreSQL environment.
- To create diagrams use notebook.ipynb, run it in Colab or Jupyter Notebook.
Name: Andrii Isachenko
LinkedIn: Andrii Isachenko
E-mail: isao.datastudio@gmail.com
- Thanks to Amir Motefaker for providing this rich dataset for the data community.
- Special thanks to the Kaggle platform for hosting the data.
- Thanks to the Data Analyst/GoIT course, which was part of this project.
Project Status: Completed.
License: MIT License.



