Portfolio project using the Olist Brazilian E-Commerce dataset.
An e-commerce marketplace wants to understand what drives revenue, where orders come from, and whether delivery performance affects customer satisfaction.
This project answers four business questions:
- Which product categories generate the most revenue?
- Which customer states produce the highest order volume and revenue?
- How does revenue trend over time?
- Do late deliveries receive lower review scores?
Source: Brazilian E-Commerce Public Dataset by Olist on Kaggle
The dataset contains about 100k orders from 2016 to 2018 across multiple Brazilian marketplaces. It includes orders, customers, payments, reviews, products, sellers, and delivery timestamps.
Place the Kaggle CSV files inside:
data/raw/
Expected files:
olist_customers_dataset.csv
olist_geolocation_dataset.csv
olist_order_items_dataset.csv
olist_order_payments_dataset.csv
olist_order_reviews_dataset.csv
olist_orders_dataset.csv
olist_products_dataset.csv
olist_sellers_dataset.csv
product_category_name_translation.csv
- Load raw CSV files into a local SQLite database.
- Use SQL to join business tables and answer core questions.
- Load SQL results into pandas.
- Clean dates, missing values, and calculated fields.
- Create charts that explain the business story.
Create and activate a virtual environment if you want one, then install:
pip install -r requirements.txtCheck whether the dataset files are in the right folder:
python src/00_check_data.pyBuild the SQLite database:
python src/01_build_database.pyRun SQL exploration and export query results:
python src/02_sql_exploration.pyClean, analyze, and generate charts:
python src/03_clean_analyze_visualize.pyOutputs are saved in:
outputs/
ecommerce-sql-python-analysis/
├── data/
│ ├── raw/
│ └── processed/
├── docs/
├── outputs/
│ ├── charts/
│ └── query_results/
├── sql/
├── src/
├── README.md
└── requirements.txt
- The analysis covered
96,478delivered orders with total delivered revenue ofR$ 15,489,665.55. health_beautywas the highest revenue product category, followed bywatches_giftsandbed_bath_table.SPwas the strongest customer state by revenue, generatingR$ 5,769,703.15from delivered orders.- Average delivery time was
12.0 days, and7.9%of delivered orders arrived later than the estimated delivery date. - Late deliveries had a much lower average review score of
2.27, compared with4.29for on-time or early deliveries. - This suggests delivery performance is strongly connected to customer satisfaction and should be monitored as a business KPI.
- SQL joins, aggregations, filtering, and date logic
- Python data loading and cleaning with pandas
- Business analysis and metric definition
- Data visualization with matplotlib and seaborn
- Reproducible project organization