Marketing Analytics Pipeline (GCP + dbt)

Data Engineering

Marketing Analytics

GCP

Modern data engineering architecture for marketing KPI tracking

Context & Problem

Business question: How to build a modern infrastructure to track marketing KPIs in real-time?

Marketing teams need quick access to their performance metrics to make informed decisions. This project demonstrates setting up a complete ETL pipeline using modern data engineering best practices.

Architecture

┌─────────────┐     ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Sources   │────▶│   Extract   │────▶│  BigQuery   │────▶│     dbt     │
│   (CSV)     │     │  (Python)   │     │   (Raw)     │     │ (Transform) │
└─────────────┘     └─────────────┘     └─────────────┘     └─────────────┘
                                                                   │
                                                                   ▼
                                        ┌─────────────┐     ┌─────────────┐
                                        │  Streamlit  │◀────│  BigQuery   │
                                        │ (Dashboard) │     │ (Analytics) │
                                        └─────────────┘     └─────────────┘

Components

Extract: Python script generating realistic simulated marketing data
Load: Loading to Google BigQuery (dataset marketing_raw)
Transform: dbt to calculate KPIs (dataset marketing_analytics)
Visualize: Interactive Streamlit dashboard

Implementation

1. Data Extraction

# src/extract_data.py
import pandas as pd
import numpy as np

def generate_marketing_data(start_date, end_date, seed=42):
    """Generate realistic simulated marketing data."""
    np.random.seed(seed)

    channels = {
        'Google Ads': {'sessions': 1500, 'conv_rate': 0.05, 'spend': 3000},
        'Facebook Ads': {'sessions': 1200, 'conv_rate': 0.04, 'spend': 2500},
        'LinkedIn': {'sessions': 400, 'conv_rate': 0.06, 'spend': 1500},
        'Email': {'sessions': 800, 'conv_rate': 0.08, 'spend': 500},
        'Direct': {'sessions': 2000, 'conv_rate': 0.03, 'spend': 0}
    }

    # Generation with realistic variability
    # - Poisson distribution for sessions
    # - Weekend effect (-30%)
    # - No negative values

    return df

2. dbt Transformation

-- dbt_marketing/models/staging/stg_marketing_daily.sql
WITH source AS (
    SELECT * FROM {{ source('marketing_raw', 'daily_performance') }}
),

transformed AS (
    SELECT
        date,
        DATE_TRUNC(date, WEEK) AS week_start_date,
        DATE_TRUNC(date, MONTH) AS month_start_date,
        source AS marketing_source,
        sessions,
        conversions,
        revenue,
        spend,

        -- Calculated KPIs
        SAFE_DIVIDE(revenue, spend) AS roas,
        SAFE_DIVIDE(spend, conversions) AS cost_per_conversion,
        SAFE_DIVIDE(conversions, sessions) AS conversion_rate,
        SAFE_DIVIDE(revenue, conversions) AS revenue_per_conversion,
        SAFE_DIVIDE(revenue, sessions) AS revenue_per_session,

        -- Flags
        spend = 0 AS is_organic_channel,
        EXTRACT(DAYOFWEEK FROM date) IN (1, 7) AS is_weekend

    FROM source
)

SELECT * FROM transformed

3. Streamlit Dashboard

The dashboard presents:

Global KPIs: Total Spend, Revenue, Conversions, ROAS
Time trend: Spend vs Revenue by day
Performance by channel: Marketing source comparison
ROAS by source: Return on ad spend
Conversion funnel: Sessions → Conversions

Results

Calculated KPIs

Metric	Formula	Usage
ROAS	Revenue / Spend	Overall effectiveness
CPA	Spend / Conversions	Acquisition cost
Conversion Rate	Conversions / Sessions	Traffic quality
Revenue/Session	Revenue / Sessions	Traffic value

Generated Data

Period: 84 days (Sept-Nov 2024)
Channels: 5 marketing sources
Volume: ~420 rows per execution

Demo

View Streamlit Dashboard

The dashboard allows you to:

Filter by date range
Select channels to display
Visualize trends
Compare performance

Technologies

Component	Technology	Version
Language	Python	3.13+
Data Warehouse	Google BigQuery	Cloud
Transformation	dbt-core + dbt-bigquery	1.10.3
Dashboard	Streamlit	1.51.0
Visualization	Plotly + Altair	-
Dependency Management	Poetry	-

Project Structure

marketing-data-pipeline/
├── src/
│   ├── extract_data.py      # Data generation
│   ├── load_bigquery.py     # BigQuery loading
│   └── dashboard.py         # Streamlit dashboard
├── dbt_marketing/
│   ├── dbt_project.yml      # dbt config
│   └── models/
│       └── staging/
│           ├── sources.yml
│           └── stg_marketing_daily.sql
├── data/
│   └── raw/
│       └── marketing_data.csv
├── pyproject.toml           # Poetry dependencies
└── README.md

Execution

# 1. Extraction
python src/extract_data.py

# 2. BigQuery loading
python src/load_bigquery.py

# 3. dbt transformation
cd dbt_marketing && dbt run

# 4. Dashboard
streamlit run src/dashboard.py

Learnings

This project illustrates Modern Data Stack best practices:

Separation of concerns: Distinct Extract, Load, Transform
Infrastructure as Code: dbt for versioned transformations
Reproducibility: Fixed seed for simulated data
Documentation: Documented dbt sources and models

← Back to Portfolio Analysis | View Dashboard