Skip to content

ampojohnfranz/ecommerce-sql-python-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

E-Commerce Revenue and Delivery Analysis

Portfolio project using the Olist Brazilian E-Commerce dataset.

Business Problem

An e-commerce marketplace wants to understand what drives revenue, where orders come from, and whether delivery performance affects customer satisfaction.

This project answers four business questions:

  1. Which product categories generate the most revenue?
  2. Which customer states produce the highest order volume and revenue?
  3. How does revenue trend over time?
  4. Do late deliveries receive lower review scores?

Dataset

Source: Brazilian E-Commerce Public Dataset by Olist on Kaggle

The dataset contains about 100k orders from 2016 to 2018 across multiple Brazilian marketplaces. It includes orders, customers, payments, reviews, products, sellers, and delivery timestamps.

Place the Kaggle CSV files inside:

data/raw/

Expected files:

olist_customers_dataset.csv
olist_geolocation_dataset.csv
olist_order_items_dataset.csv
olist_order_payments_dataset.csv
olist_order_reviews_dataset.csv
olist_orders_dataset.csv
olist_products_dataset.csv
olist_sellers_dataset.csv
product_category_name_translation.csv

Methodology

  1. Load raw CSV files into a local SQLite database.
  2. Use SQL to join business tables and answer core questions.
  3. Load SQL results into pandas.
  4. Clean dates, missing values, and calculated fields.
  5. Create charts that explain the business story.

How to Run

Create and activate a virtual environment if you want one, then install:

pip install -r requirements.txt

Check whether the dataset files are in the right folder:

python src/00_check_data.py

Build the SQLite database:

python src/01_build_database.py

Run SQL exploration and export query results:

python src/02_sql_exploration.py

Clean, analyze, and generate charts:

python src/03_clean_analyze_visualize.py

Outputs are saved in:

outputs/

Project Structure

ecommerce-sql-python-analysis/
├── data/
│   ├── raw/
│   └── processed/
├── docs/
├── outputs/
│   ├── charts/
│   └── query_results/
├── sql/
├── src/
├── README.md
└── requirements.txt

Key Findings

  • The analysis covered 96,478 delivered orders with total delivered revenue of R$ 15,489,665.55.
  • health_beauty was the highest revenue product category, followed by watches_gifts and bed_bath_table.
  • SP was the strongest customer state by revenue, generating R$ 5,769,703.15 from delivered orders.
  • Average delivery time was 12.0 days, and 7.9% of delivered orders arrived later than the estimated delivery date.
  • Late deliveries had a much lower average review score of 2.27, compared with 4.29 for on-time or early deliveries.
  • This suggests delivery performance is strongly connected to customer satisfaction and should be monitored as a business KPI.

Skills Demonstrated

  • SQL joins, aggregations, filtering, and date logic
  • Python data loading and cleaning with pandas
  • Business analysis and metric definition
  • Data visualization with matplotlib and seaborn
  • Reproducible project organization

About

Portfolio project analyzing e-commerce revenue and delivery performance using SQL and Python on the Olist dataset.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages