Dynamic Pricing Engine

Reinforcement Learning

Pricing

Economics

Python

From econometrics to Reinforcement Learning for dynamic pricing

Context & Problem

Business question: How to optimize prices in real-time based on context and demand?

Surge pricing used by Uber and Lyft is a fascinating case where data science meets economics. This project progressively explores different approaches, from classical econometrics to reinforcement learning.

Dataset

Uber & Lyft Dataset (Kaggle):

~700,000 rides in Boston
Period: November 2018
Variables: Price, distance, weather, hour, day, surge multiplier
Two companies: Uber and Lyft (for comparison)

Progressive Approach

This project is structured in 4 notebooks progressing in methodological complexity:

1. EDA & Storytelling

Data exploration to understand pricing patterns:

Prices are higher during peak hours (8-9am, 5-7pm)
Weather (rain, snow) significantly increases prices
Lyft and Uber have slightly different pricing strategies

2. Bayesian Price Elasticity

Econometric demand modeling as a function of price:

Elasticity ~ -0.8: A 10% price increase reduces demand by 8%
Demand is relatively inelastic (< 1 in absolute value)
This economically justifies surge pricing

3. Contextual Bandits (Thompson Sampling)

The explore/exploit problem: how to test new prices while maximizing revenue?

Automatically balances exploration and exploitation, converges to optimal price, adapts to context changes.

4. Reinforcement Learning (Q-Learning)

Complete approach with an agent learning a pricing policy:

State: (hour, day, weather, demand) Actions: Price levels (0.8x, 1.0x, 1.2x, 1.5x, 2.0x) Reward: Revenue = price × purchase_probability

Results

Approach Comparison

Approach	Advantages	Disadvantages
Fixed elasticity	Simple, interpretable	Ignores context
Thompson Sampling	Adaptive, theoretically optimal	No generalization
Q-Learning	Learns complete policy	Requires lots of data

Learned Policy

The Q-Learning agent learns a policy that:

Increases prices during peak hours (surge 1.5x-2.0x)
Maintains prices during normal periods (1.0x)
Slightly lowers during slow periods to stimulate demand (0.8x-1.0x)

Technologies

Component	Technology
Data Processing	pandas, numpy
Visualization	matplotlib, seaborn, plotly
Bayesian Modeling	PyMC, ArviZ
Machine Learning	scikit-learn
Dashboard	Streamlit

Learnings

Pricing economics: Elasticity, consumer surplus, price discrimination
Bandits mastery: The exploration/exploitation trade-off
RL implementation: Q-Learning, states, actions, rewards
Methodological progression: From simple to complex

← Back to Portfolio ML