Skip to content

TascoGitGud/Google-Analytics-Session-Analysis-Using-SQL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

18 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ›’ Google Analytics Session Analysis | SQL

SQL Google BigQuery Status


Analyze website traffic, user engagement, and purchase behavior to answer 8 business questions and turn raw analytics data into clear insights.

  • 🎯 Business Question: Which traffic sources drive the most revenue - and how do user engagement patterns differ between purchasers and non-purchasers?
  • 🏬 Domain: E-commerce & Digital Marketing
  • πŸ› οΈ Tools: SQL (Google BigQuery)

πŸ‘€ Author: BαΊ‘ch Minh Nam


πŸ“‘ Table of Contents


πŸ“Œ Overview

Objective:

  • This project uses SQL (Google BigQuery) to analyze Google Analytics 4 (GA4) data from the Google Merchandise Store e-commerce website
  • It answers 8 specific business questions covering Traffic Performance, User Engagement, Revenue Analysis, and Conversion Funnel Optimization
  • The goal is to turn raw session and event data into clear, actionable insights for marketing and product teams

Main business question:

This project uses SQL to analyze website traffic, engagement, and revenue data from Google Analytics to:

  • Track changes in visits, pageviews, and transactions over time
  • Evaluate which traffic sources generate the most revenue and engagement
  • Compare user behavior between purchasers and non-purchasers
  • Identify cross-selling opportunities and conversion funnel bottlenecks

πŸ‘€ Who is this project for?

  • Data analysts & business analysts who want a reference for writing analytical SQL (CTEs, window functions, cohort analysis, UNNEST operations)
  • Digital marketing teams who need insights into traffic source performance and ROI
  • E-commerce managers & stakeholders who need quick insights into revenue trends, user engagement, and conversion rates
  • Business intelligence teams building dashboards and reporting systems

πŸ“‚ Dataset

The analysis is based on Google Analytics 4 (GA4) data exported to Google BigQuery, representing the Google Merchandise Store, a real e-commerce website selling branded merchandise. It contains data on user sessions, page views, product interactions, transactions, and revenue across multiple months in 2017.

Data Dictionary

To answer the 8 business questions in this project, 6 core data structures from the GA4 export schema were used. The table below lists only the columns that were actually used in the queries.

Schema Table / Struct Columns Used Used In Purpose
Sessions ga_sessions_2017* date, fullVisitorId Q1, Q2, Q4, Q5, Q6, Q8 Base session table tracking unique users and session timestamps for all temporal analysis.
Sessions totals visits, pageviews, transactions, bounces Q1, Q2, Q4, Q5, Q6 Aggregate metrics per session - visits, pageviews, bounce count, transaction count for KPI calculations.
Sessions trafficSource source Q2, Q3 Identifies traffic channel origin (organic search, direct, referral, paid ads) to analyze channel performance.
Hits hits eCommerceAction Q8 Unnested to capture individual user actions within a session (product view, add to cart, purchase).
Hits eCommerceAction action_type Q8 Action type codes ('2'=View, '3'=Add to Cart, '6'=Purchase) to build conversion funnel analysis.
Product product v2ProductName, productRevenue, productQuantity Q3, Q4, Q6, Q7, Q8 Unnested product-level data to track revenue, quantities sold, and product-specific insights.

πŸ”— Full Documentation: For the complete explanation of all available fields in the GA4 BigQuery export schema, please refer to the Official Google Analytics BigQuery Export schema.


πŸ”Ž Query Repository

Query 1: Monthly Traffic Overview (Jan–Mar 2017)

Question: Calculate total visits, pageviews, and transactions for January, February, and March 2017.

Tracking monthly traffic metrics helps the business understand seasonal demand patterns and measure the impact of marketing campaigns across the first quarter.

SELECT 
  FORMAT_DATE('%Y%m', PARSE_DATE('%Y%m%d', date)) AS month,
  COUNT(totals.visits) AS visits,
  SUM(totals.pageviews) AS pageviews,
  SUM(totals.transactions) AS transactions
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_2017*`
WHERE _table_suffix BETWEEN '0101' AND '0331'
GROUP BY month
ORDER BY month

πŸ“Š Actual Output: Query 1 Output

πŸ’‘ Observations:

The data from the first quarter of 2017 shows that the number of visitors and activity on the website increased over time. In January, there were around 64,000 visits and 257,000 pageviews. By March, the visits grew to nearly 70,000.

The most interesting thing is that the number of transactions went up quite a lot - from about 700 to nearly 1,000. This means not only did more people visit the website, but customers also bought more things.

This increase could be because of a promotion or sale in March, or maybe new products were added to the website. Whatever happened, it made customers more likely to make a purchase. The store should figure out what worked in March and repeat it.


Query 2: Bounce Rate by Traffic Source (July 2017)

Question: Calculate the bounce rate per traffic source in July 2017.

High bounce rates indicate poor landing page relevance or user experience issues. Identifying which traffic sources bounce most helps prioritize optimization efforts and reallocate budget from underperforming channels.

SELECT
  trafficSource.source AS source,
  SUM(totals.visits) AS total_visits,
  SUM(totals.bounces) AS total_no_of_bounces,
  ROUND(SUM(totals.bounces) / SUM(totals.visits) * 100, 3) AS bounce_rate
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_201707*`
GROUP BY source
ORDER BY total_visits DESC

πŸ“Š Actual Output: Query 2 Output

πŸ’‘ Observations:

Google sends the most traffic (38,000 visits) but 51% leave without doing anything. Direct traffic is better with only 43% bounce rate - these are loyal customers.

Direct traffic is different. These are people who come back to the website or type the URL directly. They have a lower bounce rate of about 43%, which means they stay longer and are more likely to buy something.

YouTube traffic has the worst bounce rate at 67%, meaning the landing page doesn't match what viewers expect.

The main issue is with Google and YouTube traffic. Even though they send a lot of people, too many of them leave without doing anything. The website should improve the landing pages to make it better for these visitors so more of them will stay and make a purchase.


Query 3: Revenue by Traffic Source (June 2017 – Weekly & Monthly)

Question: Calculate revenue by traffic source by week and by month in June 2017.

Breaking revenue down by traffic source and time period reveals which channels are most profitable and when peak revenue occurs. This guides budget allocation and campaign timing decisions.

WITH month_data AS (
  SELECT
    'Month' AS time_type,
    FORMAT_DATE('%Y%m', PARSE_DATE('%Y%m%d', date)) AS month,
    trafficSource.source AS source,
    SUM(p.productRevenue) / 1000000 AS revenue
  FROM `bigquery-public-data.google_analytics_sample.ga_sessions_201706*`,
    UNNEST(hits) AS hits,
    UNNEST(product) AS p
  WHERE p.productRevenue IS NOT NULL
  GROUP BY 1, 2, 3
),

week_data AS (
  SELECT
    'Week' AS time_type,
    FORMAT_DATE('%Y%W', PARSE_DATE('%Y%m%d', date)) AS week,
    trafficSource.source AS source,
    SUM(p.productRevenue) / 1000000 AS revenue
  FROM `bigquery-public-data.google_analytics_sample.ga_sessions_201706*`,
    UNNEST(hits) AS hits,
    UNNEST(product) AS p
  WHERE p.productRevenue IS NOT NULL
  GROUP BY 1, 2, 3
)

SELECT * FROM month_data
UNION ALL
SELECT * FROM week_data
ORDER BY time_type, revenue DESC

πŸ“Š Actual Output: Query 3 Output

πŸ’‘ Observations:

Direct traffic makes the most money - about 97,000 in June. These are loyal customers who keep coming back, so they spend more than other visitors.

Google is second with around 18,700 in revenue. This shows that Google ads are working and worth the investment.

But YouTube and mail.google.com bring traffic that doesn't make money. They send people to the website, but these visitors don't buy anything. So the website should focus on direct traffic and Google, and maybe stop or change YouTube ads because they're not profitable.


Query 4: Avg Pageviews - Purchasers vs Non-Purchasers (Jun–Jul 2017)

Question: Calculate average number of pageviews by purchaser type (purchasers vs non-purchasers) in June and July 2017.

Comparing engagement between buyers and non-buyers reveals the page view threshold needed to drive conversion. Higher pageview counts among purchasers signal deeper product exploration before purchase.

WITH
  base AS (
    SELECT
      FORMAT_DATE('%Y%m', PARSE_DATE('%Y%m%d', date)) AS month,
      totals.transactions,
      product.productRevenue,
      totals.pageviews,
      fullVisitorId
    FROM `bigquery-public-data.google_analytics_sample.ga_sessions_2017*`,
      UNNEST(hits) AS hits,
      UNNEST(product) AS product
    WHERE _table_suffix BETWEEN '0601' AND '0731'
  ),

  purchase AS (
    SELECT
      month,
      ROUND(SUM(pageviews) / COUNT(DISTINCT fullVisitorId), 8) AS avg_pageviews_purchase
    FROM base
    WHERE transactions >= 1 
      AND productRevenue IS NOT NULL
    GROUP BY month
  ),

  non_purchase AS (
    SELECT
      month,
      ROUND(SUM(pageviews) / COUNT(DISTINCT fullVisitorId), 8) AS avg_pageviews_non_purchase
    FROM base
    WHERE transactions IS NULL
      AND productRevenue IS NULL
    GROUP BY month
  )

SELECT *
FROM purchase
FULL JOIN non_purchase USING (month)
ORDER BY month

πŸ“Š Actual Output: Query 4 Output

πŸ’‘ Observations:

In June, people who bought something looked at about 94 pages, but people who didn't buy looked at 317 pages - 3 times more. July shows the same pattern.

This means high pageviews don't mean more sales. Some visitors know what they want and buy quickly. Others just browse without buying.

The website should focus on better signals like time spent on products or cart additions, not just pageview count.


Query 5: Avg Transactions per Purchasing User (July 2017)

Question: Calculate the average number of transactions per user that made a purchase in July 2017.

Understanding repeat purchase frequency within a month reveals customer loyalty and multi-purchase behavior. Higher repeat rates indicate strong product satisfaction and cross-sell success.

SELECT
  FORMAT_DATE('%Y%m', PARSE_DATE('%Y%m%d', date)) AS month,
  ROUND(SUM(totals.transactions) / COUNT(DISTINCT fullVisitorId), 4) AS avg_total_transactions_per_user
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_201707*`,
  UNNEST(hits) AS hits,
  UNNEST(product) AS product
WHERE totals.transactions >= 1
  AND product.productRevenue IS NOT NULL
GROUP BY month

πŸ“Š Actual Output: Query 5 Output

πŸ’‘ Observations:

In July, customers bought an average of 4 times each. This is really good - it means people who buy once come back and buy more.

This could be because of subscriptions, bulk orders, or good product recommendations. Either way, the website should focus on keeping these customers happy with loyalty programs and personalized emails to get them to buy more.


Query 6: Avg Revenue per Session (July 2017 – Purchasers Only)

Question: Calculate the average amount of money spent per session (purchasers only) in July 2017.

Revenue per session reveals the monetary value each visit generates. Higher values indicate strong product pricing, effective upselling, or high-value customer segments.

SELECT
  FORMAT_DATE('%Y%m', PARSE_DATE('%Y%m%d', date)) AS month,
  ROUND((SUM(product.productRevenue) / SUM(totals.visits)) / 1000000, 2) AS avg_revenue_by_user_per_visit
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_201707*`,
  UNNEST(hits) AS hits,
  UNNEST(product) AS product
WHERE totals.transactions >= 1
  AND product.productRevenue IS NOT NULL
GROUP BY month

πŸ“Š Actual Output: Query 6 Output

πŸ’‘ Observations:

In July, each customer visit made about $44 in revenue. That's pretty good for an e-commerce site - it means people who come to buy are spending real money.

The website should put more money into marketing channels where people already want to buy, like Google branded search, email, and direct traffic. These channels work better than trying to get random people to visit.


Query 7: Cross-Sell Analysis – "YouTube Men's Vintage Henley" (July 2017)

Question: Calculate other products purchased by customers who also bought "YouTube Men's Vintage Henley" in July 2017.

Market basket analysis identifies which products are frequently purchased together. This drives product bundling, upsell strategies, and personalized recommendation engine training.

WITH
  buyer_list AS (
    SELECT DISTINCT fullVisitorId
    FROM `bigquery-public-data.google_analytics_sample.ga_sessions_201707*`,
      UNNEST(hits) AS hits,
      UNNEST(product) AS product
    WHERE product.v2ProductName = "YouTube Men's Vintage Henley"
      AND totals.transactions >= 1
      AND product.productRevenue IS NOT NULL
  )

SELECT
  product.v2ProductName AS other_purchased_products,
  SUM(product.productQuantity) AS quantity
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_201707*`,
  UNNEST(hits) AS hits,
  UNNEST(product) AS product
JOIN buyer_list USING (fullVisitorId)
WHERE product.v2ProductName != "YouTube Men's Vintage Henley"
  AND product.productRevenue IS NOT NULL
  AND totals.transactions >= 1
GROUP BY other_purchased_products
ORDER BY quantity DESC

πŸ“Š Actual Output: Query 7 Output

πŸ’‘ Observations:

People who bought the YouTube Men's Vintage Henley also bought Google Sunglasses most often (20 times), then the Google Women's Vintage Hero Tee (7 times) and lip balm (6 times).

This shows that customers like Google branded products together. The website should create bundles like "Henley + Sunglasses" with a discount, or send emails to Henley buyers recommending sunglasses. This could make customers spend more per order.


Query 8: E-Commerce Conversion Funnel (Jan–Mar 2017)

Question: Generate a cohort map of the checkout funnel (Product View β†’ Add to Cart β†’ Purchase) for Jan–Mar 2017.

Conversion funnel analysis identifies where users drop off during the purchase journey. High drop-off rates at specific funnel stages highlight optimization priorities (e.g., cart abandonment recovery, checkout simplification).

WITH
  data_overview AS (
    SELECT
      FORMAT_DATE('%Y%m', PARSE_DATE('%Y%m%d', date)) AS month,
      eCommerceAction.action_type AS action_type,
      totals.transactions,
      product.productRevenue
    FROM `bigquery-public-data.google_analytics_sample.ga_sessions_2017*`,
      UNNEST(hits) AS hits,
      UNNEST(product) AS product
    WHERE _table_suffix BETWEEN '0101' AND '0331'
  ),

  data_count AS (
    SELECT
      month,
      COUNTIF(action_type = '2') AS num_product_view,
      COUNTIF(action_type = '3') AS num_addtocart,
      COUNTIF(action_type = '6' AND productRevenue IS NOT NULL) AS num_purchase
    FROM data_overview
    GROUP BY month
    ORDER BY month
  )

SELECT
  *,
  ROUND(num_addtocart / num_product_view * 100.0, 2) AS add_to_cart_rate,
  ROUND(num_purchase  / num_product_view * 100.0, 2) AS purchase_rate
FROM data_count

πŸ“Š Actual Output: Query 8 Output

πŸ’‘ Observations:

In January, 25,787 people looked at products but only 7,342 added items to cart (28%). Even fewer actually bought - just 2,143 (8%).

By March, things got better - 37% added to cart and 13% bought. This shows something changed and worked.

The biggest problem is people looking at products but not adding them to cart. This could be because of bad product descriptions, high prices, or not enough trust.

The website should add customer reviews on product pages, make "Add to Cart" easier, show low stock warnings to create urgency, and offer discounts to people about to leave.


πŸ—‚οΈ Project Structure

Google-Analytics-Session-Analysis-Using-SQL/
β”œβ”€β”€ Images/                             # Screenshots of each query's result
β”‚   β”œβ”€β”€ banner.png
β”‚   β”œβ”€β”€ Query_1_Output.png
β”‚   β”œβ”€β”€ Query_2_Output.png
β”‚   β”œβ”€β”€ Query_3_Output.png
β”‚   β”œβ”€β”€ Query_4_Output.png
β”‚   β”œβ”€β”€ Query_5_Output.png
β”‚   β”œβ”€β”€ Query_6_Output.png
β”‚   β”œβ”€β”€ Query_7_Output.png
β”‚   └── Query_8_Output.png
β”œβ”€β”€ SQL_Queries/                        # SQL source files for each question
β”‚   β”œβ”€β”€ Google-Analytics-Session-Analysis.sql
└── README.md

πŸš€ Setup Instructions

To run these queries in Google BigQuery:

  1. ☁️ Set up a Google Cloud Platform (GCP) account: Create one if you don't have it yet, and enable the BigQuery API.
  2. πŸ“₯ Access the public dataset: The bigquery-public-data.google_analytics_sample.ga_sessions_2017* dataset is publicly available to all GCP users - no setup or data loading required.
  3. πŸ“‚ Open BigQuery Console:
  4. ▢️ Run the queries: Open the BigQuery console, copy each .sql file's content from the SQL_Queries/ folder, and make sure your project context matches the dataset path before running.

πŸ“Œ Important Notes:

  • The dataset uses table wildcards (*) to query multiple daily shards at once. For example, ga_sessions_2017* matches all tables from 2017.
  • Use the _table_suffix BETWEEN '0101' AND '0331' syntax to filter by date ranges without loading the entire year.
  • The UNNEST() function is required to flatten nested arrays (hits, product) into a queryable format.

About

πŸ›’ An Advanced SQL project using the Google Analytics 2017 dataset to perform data analysis through 8 complex operational queries. | SQL

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors