Analyze website traffic, user engagement, and purchase behavior to answer 8 business questions and turn raw analytics data into clear insights.
- π― Business Question: Which traffic sources drive the most revenue - and how do user engagement patterns differ between purchasers and non-purchasers?
- π¬ Domain: E-commerce & Digital Marketing
- π οΈ Tools: SQL (Google BigQuery)
π€ Author: BαΊ‘ch Minh Nam
Objective:
- This project uses SQL (Google BigQuery) to analyze Google Analytics 4 (GA4) data from the Google Merchandise Store e-commerce website
- It answers 8 specific business questions covering Traffic Performance, User Engagement, Revenue Analysis, and Conversion Funnel Optimization
- The goal is to turn raw session and event data into clear, actionable insights for marketing and product teams
Main business question:
This project uses SQL to analyze website traffic, engagement, and revenue data from Google Analytics to:
- Track changes in visits, pageviews, and transactions over time
- Evaluate which traffic sources generate the most revenue and engagement
- Compare user behavior between purchasers and non-purchasers
- Identify cross-selling opportunities and conversion funnel bottlenecks
π€ Who is this project for?
- Data analysts & business analysts who want a reference for writing analytical SQL (CTEs, window functions, cohort analysis, UNNEST operations)
- Digital marketing teams who need insights into traffic source performance and ROI
- E-commerce managers & stakeholders who need quick insights into revenue trends, user engagement, and conversion rates
- Business intelligence teams building dashboards and reporting systems
The analysis is based on Google Analytics 4 (GA4) data exported to Google BigQuery, representing the Google Merchandise Store, a real e-commerce website selling branded merchandise. It contains data on user sessions, page views, product interactions, transactions, and revenue across multiple months in 2017.
To answer the 8 business questions in this project, 6 core data structures from the GA4 export schema were used. The table below lists only the columns that were actually used in the queries.
| Schema | Table / Struct | Columns Used | Used In | Purpose |
|---|---|---|---|---|
| Sessions | ga_sessions_2017* |
date, fullVisitorId |
Q1, Q2, Q4, Q5, Q6, Q8 | Base session table tracking unique users and session timestamps for all temporal analysis. |
| Sessions | totals |
visits, pageviews, transactions, bounces |
Q1, Q2, Q4, Q5, Q6 | Aggregate metrics per session - visits, pageviews, bounce count, transaction count for KPI calculations. |
| Sessions | trafficSource |
source |
Q2, Q3 | Identifies traffic channel origin (organic search, direct, referral, paid ads) to analyze channel performance. |
| Hits | hits |
eCommerceAction |
Q8 | Unnested to capture individual user actions within a session (product view, add to cart, purchase). |
| Hits | eCommerceAction |
action_type |
Q8 | Action type codes ('2'=View, '3'=Add to Cart, '6'=Purchase) to build conversion funnel analysis. |
| Product | product |
v2ProductName, productRevenue, productQuantity |
Q3, Q4, Q6, Q7, Q8 | Unnested product-level data to track revenue, quantities sold, and product-specific insights. |
π Full Documentation: For the complete explanation of all available fields in the GA4 BigQuery export schema, please refer to the Official Google Analytics BigQuery Export schema.
Question: Calculate total visits, pageviews, and transactions for January, February, and March 2017.
Tracking monthly traffic metrics helps the business understand seasonal demand patterns and measure the impact of marketing campaigns across the first quarter.
SELECT
FORMAT_DATE('%Y%m', PARSE_DATE('%Y%m%d', date)) AS month,
COUNT(totals.visits) AS visits,
SUM(totals.pageviews) AS pageviews,
SUM(totals.transactions) AS transactions
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_2017*`
WHERE _table_suffix BETWEEN '0101' AND '0331'
GROUP BY month
ORDER BY monthπ‘ Observations:
The data from the first quarter of 2017 shows that the number of visitors and activity on the website increased over time. In January, there were around 64,000 visits and 257,000 pageviews. By March, the visits grew to nearly 70,000.
The most interesting thing is that the number of transactions went up quite a lot - from about 700 to nearly 1,000. This means not only did more people visit the website, but customers also bought more things.
This increase could be because of a promotion or sale in March, or maybe new products were added to the website. Whatever happened, it made customers more likely to make a purchase. The store should figure out what worked in March and repeat it.
Question: Calculate the bounce rate per traffic source in July 2017.
High bounce rates indicate poor landing page relevance or user experience issues. Identifying which traffic sources bounce most helps prioritize optimization efforts and reallocate budget from underperforming channels.
SELECT
trafficSource.source AS source,
SUM(totals.visits) AS total_visits,
SUM(totals.bounces) AS total_no_of_bounces,
ROUND(SUM(totals.bounces) / SUM(totals.visits) * 100, 3) AS bounce_rate
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_201707*`
GROUP BY source
ORDER BY total_visits DESCπ‘ Observations:
Google sends the most traffic (38,000 visits) but 51% leave without doing anything. Direct traffic is better with only 43% bounce rate - these are loyal customers.
Direct traffic is different. These are people who come back to the website or type the URL directly. They have a lower bounce rate of about 43%, which means they stay longer and are more likely to buy something.
YouTube traffic has the worst bounce rate at 67%, meaning the landing page doesn't match what viewers expect.
The main issue is with Google and YouTube traffic. Even though they send a lot of people, too many of them leave without doing anything. The website should improve the landing pages to make it better for these visitors so more of them will stay and make a purchase.
Question: Calculate revenue by traffic source by week and by month in June 2017.
Breaking revenue down by traffic source and time period reveals which channels are most profitable and when peak revenue occurs. This guides budget allocation and campaign timing decisions.
WITH month_data AS (
SELECT
'Month' AS time_type,
FORMAT_DATE('%Y%m', PARSE_DATE('%Y%m%d', date)) AS month,
trafficSource.source AS source,
SUM(p.productRevenue) / 1000000 AS revenue
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_201706*`,
UNNEST(hits) AS hits,
UNNEST(product) AS p
WHERE p.productRevenue IS NOT NULL
GROUP BY 1, 2, 3
),
week_data AS (
SELECT
'Week' AS time_type,
FORMAT_DATE('%Y%W', PARSE_DATE('%Y%m%d', date)) AS week,
trafficSource.source AS source,
SUM(p.productRevenue) / 1000000 AS revenue
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_201706*`,
UNNEST(hits) AS hits,
UNNEST(product) AS p
WHERE p.productRevenue IS NOT NULL
GROUP BY 1, 2, 3
)
SELECT * FROM month_data
UNION ALL
SELECT * FROM week_data
ORDER BY time_type, revenue DESCπ‘ Observations:
Direct traffic makes the most money - about 97,000 in June. These are loyal customers who keep coming back, so they spend more than other visitors.
Google is second with around 18,700 in revenue. This shows that Google ads are working and worth the investment.
But YouTube and mail.google.com bring traffic that doesn't make money. They send people to the website, but these visitors don't buy anything. So the website should focus on direct traffic and Google, and maybe stop or change YouTube ads because they're not profitable.
Question: Calculate average number of pageviews by purchaser type (purchasers vs non-purchasers) in June and July 2017.
Comparing engagement between buyers and non-buyers reveals the page view threshold needed to drive conversion. Higher pageview counts among purchasers signal deeper product exploration before purchase.
WITH
base AS (
SELECT
FORMAT_DATE('%Y%m', PARSE_DATE('%Y%m%d', date)) AS month,
totals.transactions,
product.productRevenue,
totals.pageviews,
fullVisitorId
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_2017*`,
UNNEST(hits) AS hits,
UNNEST(product) AS product
WHERE _table_suffix BETWEEN '0601' AND '0731'
),
purchase AS (
SELECT
month,
ROUND(SUM(pageviews) / COUNT(DISTINCT fullVisitorId), 8) AS avg_pageviews_purchase
FROM base
WHERE transactions >= 1
AND productRevenue IS NOT NULL
GROUP BY month
),
non_purchase AS (
SELECT
month,
ROUND(SUM(pageviews) / COUNT(DISTINCT fullVisitorId), 8) AS avg_pageviews_non_purchase
FROM base
WHERE transactions IS NULL
AND productRevenue IS NULL
GROUP BY month
)
SELECT *
FROM purchase
FULL JOIN non_purchase USING (month)
ORDER BY monthπ‘ Observations:
In June, people who bought something looked at about 94 pages, but people who didn't buy looked at 317 pages - 3 times more. July shows the same pattern.
This means high pageviews don't mean more sales. Some visitors know what they want and buy quickly. Others just browse without buying.
The website should focus on better signals like time spent on products or cart additions, not just pageview count.
Question: Calculate the average number of transactions per user that made a purchase in July 2017.
Understanding repeat purchase frequency within a month reveals customer loyalty and multi-purchase behavior. Higher repeat rates indicate strong product satisfaction and cross-sell success.
SELECT
FORMAT_DATE('%Y%m', PARSE_DATE('%Y%m%d', date)) AS month,
ROUND(SUM(totals.transactions) / COUNT(DISTINCT fullVisitorId), 4) AS avg_total_transactions_per_user
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_201707*`,
UNNEST(hits) AS hits,
UNNEST(product) AS product
WHERE totals.transactions >= 1
AND product.productRevenue IS NOT NULL
GROUP BY monthπ‘ Observations:
In July, customers bought an average of 4 times each. This is really good - it means people who buy once come back and buy more.
This could be because of subscriptions, bulk orders, or good product recommendations. Either way, the website should focus on keeping these customers happy with loyalty programs and personalized emails to get them to buy more.
Question: Calculate the average amount of money spent per session (purchasers only) in July 2017.
Revenue per session reveals the monetary value each visit generates. Higher values indicate strong product pricing, effective upselling, or high-value customer segments.
SELECT
FORMAT_DATE('%Y%m', PARSE_DATE('%Y%m%d', date)) AS month,
ROUND((SUM(product.productRevenue) / SUM(totals.visits)) / 1000000, 2) AS avg_revenue_by_user_per_visit
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_201707*`,
UNNEST(hits) AS hits,
UNNEST(product) AS product
WHERE totals.transactions >= 1
AND product.productRevenue IS NOT NULL
GROUP BY monthπ‘ Observations:
In July, each customer visit made about $44 in revenue. That's pretty good for an e-commerce site - it means people who come to buy are spending real money.
The website should put more money into marketing channels where people already want to buy, like Google branded search, email, and direct traffic. These channels work better than trying to get random people to visit.
Question: Calculate other products purchased by customers who also bought "YouTube Men's Vintage Henley" in July 2017.
Market basket analysis identifies which products are frequently purchased together. This drives product bundling, upsell strategies, and personalized recommendation engine training.
WITH
buyer_list AS (
SELECT DISTINCT fullVisitorId
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_201707*`,
UNNEST(hits) AS hits,
UNNEST(product) AS product
WHERE product.v2ProductName = "YouTube Men's Vintage Henley"
AND totals.transactions >= 1
AND product.productRevenue IS NOT NULL
)
SELECT
product.v2ProductName AS other_purchased_products,
SUM(product.productQuantity) AS quantity
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_201707*`,
UNNEST(hits) AS hits,
UNNEST(product) AS product
JOIN buyer_list USING (fullVisitorId)
WHERE product.v2ProductName != "YouTube Men's Vintage Henley"
AND product.productRevenue IS NOT NULL
AND totals.transactions >= 1
GROUP BY other_purchased_products
ORDER BY quantity DESCπ‘ Observations:
People who bought the YouTube Men's Vintage Henley also bought Google Sunglasses most often (20 times), then the Google Women's Vintage Hero Tee (7 times) and lip balm (6 times).
This shows that customers like Google branded products together. The website should create bundles like "Henley + Sunglasses" with a discount, or send emails to Henley buyers recommending sunglasses. This could make customers spend more per order.
Question: Generate a cohort map of the checkout funnel (Product View β Add to Cart β Purchase) for JanβMar 2017.
Conversion funnel analysis identifies where users drop off during the purchase journey. High drop-off rates at specific funnel stages highlight optimization priorities (e.g., cart abandonment recovery, checkout simplification).
WITH
data_overview AS (
SELECT
FORMAT_DATE('%Y%m', PARSE_DATE('%Y%m%d', date)) AS month,
eCommerceAction.action_type AS action_type,
totals.transactions,
product.productRevenue
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_2017*`,
UNNEST(hits) AS hits,
UNNEST(product) AS product
WHERE _table_suffix BETWEEN '0101' AND '0331'
),
data_count AS (
SELECT
month,
COUNTIF(action_type = '2') AS num_product_view,
COUNTIF(action_type = '3') AS num_addtocart,
COUNTIF(action_type = '6' AND productRevenue IS NOT NULL) AS num_purchase
FROM data_overview
GROUP BY month
ORDER BY month
)
SELECT
*,
ROUND(num_addtocart / num_product_view * 100.0, 2) AS add_to_cart_rate,
ROUND(num_purchase / num_product_view * 100.0, 2) AS purchase_rate
FROM data_countπ‘ Observations:
In January, 25,787 people looked at products but only 7,342 added items to cart (28%). Even fewer actually bought - just 2,143 (8%).
By March, things got better - 37% added to cart and 13% bought. This shows something changed and worked.
The biggest problem is people looking at products but not adding them to cart. This could be because of bad product descriptions, high prices, or not enough trust.
The website should add customer reviews on product pages, make "Add to Cart" easier, show low stock warnings to create urgency, and offer discounts to people about to leave.
Google-Analytics-Session-Analysis-Using-SQL/
βββ Images/ # Screenshots of each query's result
β βββ banner.png
β βββ Query_1_Output.png
β βββ Query_2_Output.png
β βββ Query_3_Output.png
β βββ Query_4_Output.png
β βββ Query_5_Output.png
β βββ Query_6_Output.png
β βββ Query_7_Output.png
β βββ Query_8_Output.png
βββ SQL_Queries/ # SQL source files for each question
β βββ Google-Analytics-Session-Analysis.sql
βββ README.md
To run these queries in Google BigQuery:
- βοΈ Set up a Google Cloud Platform (GCP) account: Create one if you don't have it yet, and enable the BigQuery API.
- π₯ Access the public dataset: The
bigquery-public-data.google_analytics_sample.ga_sessions_2017*dataset is publicly available to all GCP users - no setup or data loading required. - π Open BigQuery Console:
- Go to Google Cloud Console - BigQuery
- Create a new Google Cloud Project if needed
βΆοΈ Run the queries: Open the BigQuery console, copy each.sqlfile's content from theSQL_Queries/folder, and make sure your project context matches the dataset path before running.
π Important Notes:
- The dataset uses table wildcards (
*) to query multiple daily shards at once. For example,ga_sessions_2017*matches all tables from 2017. - Use the
_table_suffix BETWEEN '0101' AND '0331'syntax to filter by date ranges without loading the entire year. - The
UNNEST()function is required to flatten nested arrays (hits, product) into a queryable format.








