Building a Marketing Mix Model Platform: Architecture and Lessons Learned
Exploring the architectural considerations for building an MMM platform—async GPU training, budget optimization, and translating Bayesian posteriors into actionable marketing recommendations.
Building a Marketing Mix Model Platform: Architecture and Lessons Learned
Marketing Mix Modeling (MMM) has been around since the 1960s, but the tooling has historically been challenging. You either shell out for expensive consultants who deliver a PDF three months later, or you cobble together something in R that only one person on your team understands.
Open-source Bayesian MMM frameworks have changed that calculus. They provide proper statistical foundations that actually work. But getting MMM into production—with real data pipelines, training infrastructure, and a UI that non-technical stakeholders can use—is where things get interesting.
This post shares architectural patterns and lessons from building such a platform.
The Problem
MMM sounds simple on paper: figure out which marketing channels are actually driving results so you can allocate budget intelligently. In practice, you're dealing with:
- Data scattered across a dozen platforms (ad networks, CRM, analytics tools)
- Carryover effects where today's ad spend influences next month's conversions
- Response curves that flatten at high spend levels
- Stakeholders who want answers now, not after a week of model training
The goal is to build something that handles all of this while remaining accessible to people who don't want to write Python.
Architecture Overview
The stack is straightforward:
┌─────────────────────────────────────────────────┐
│ Frontend (React/Next.js) │
│ (TypeScript, visualization libraries) │
└─────────────────────┬───────────────────────────┘
│ REST API
┌─────────────────────▼───────────────────────────┐
│ Backend (FastAPI/Flask) │
│ (Python, async job handling) │
└─────────────────────┬───────────────────────────┘
│
┌─────────────────────▼───────────────────────────┐
│ Bayesian MMM Framework │
│ (MCMC sampling, GPU acceleration) │
└─────────────────────┬───────────────────────────┘
│
┌─────────────────────▼───────────────────────────┐
│ Cloud Infrastructure │
│ (Data warehouse, object storage, compute) │
└─────────────────────────────────────────────────┘
Nothing exotic here—the complexity lives in how these pieces talk to each other and handle the async nature of model training.
Data Pipelines: The Foundation
MMM is garbage-in-garbage-out. Most of the code isn't the model itself—it's wrangling data into a format the model can consume.
Multi-Source Ingestion
Data can come from direct CSV uploads or data warehouse queries. The warehouse path is more interesting for production use:
- Analysts can write their own data preparation queries
- Data lineage is maintained
- Scheduled refreshes keep the model current
Key validation steps at ingestion:
- Check for required columns
- Validate data types and ranges
- Filter channels with insufficient data (sparse data breaks Bayesian models)
- Verify date coverage and completeness
Data Lineage
Every transformation should be persisted with clear organization:
storage/
├── raw_data/ # Untouched source data
├── transformed_data/ # Post-validation
├── model_input/ # Ready for training
├── models/ # Trained model artifacts
├── scenarios/ # Versioned results
└── optimizations/ # Budget allocation outputs
This isn't just for debugging—it's audit trail. When someone asks "why did the model say Facebook has a 3x ROI?" you need to trace back to the exact input data that produced that result.
Model Training Pipeline
Bayesian MMM frameworks use MCMC sampling for inference. This is computationally expensive—training can take 30-60 minutes for a typical model with multiple channels and years of weekly data.
Async Job Handling
Training can't block the API. The solution is a job queue:
- Submit training jobs to a background worker
- Return job ID immediately
- Frontend polls for status updates
- Stream logs via SSE or WebSocket for real-time feedback
Single-worker execution is often intentional—GPU memory is limited, and parallel training jobs can cause out-of-memory errors. Queue jobs and process them sequentially.
Model Configuration
Bayesian MMM's power comes from priors. You can encode domain knowledge:
ROI priors: Expected return on investment for each channel. A well-chosen prior (say, log-normal with median around 1.2x) prevents obviously wrong estimates while letting data speak.
Carryover effects: How long marketing effects persist. Brand campaigns might have effects lasting 8+ weeks. Performance ads might be 2 weeks. The model learns actual decay rates, but priors keep estimates reasonable.
Response curve shape: Diminishing returns parameters. Most channels saturate at high spend—priors encode this expectation.
Different channels need different configurations. The UI should make this accessible to marketing analysts, not just statisticians.
GPU Memory Management
When running MCMC on GPU:
- Configure memory growth to avoid pre-allocation
- Clean up sessions between training runs
- Run garbage collection explicitly
- Consider memory limits when sizing instances
Without explicit cleanup, memory fragments over multiple training runs, eventually causing failures.
Budget Optimization
A trained model is interesting. A trained model that tells you where to move money is useful.
Optimization Modes
Three scenarios come up repeatedly:
Fixed budget: "We have $1M. How should we split it across channels?"
Target ROI: "What's the minimum spend to hit 2x ROI?"
Unconstrained: "Ignoring budget constraints, where are the opportunities?"
All modes benefit from constraints preventing unrealistic recommendations. Nobody's going to 10x their LinkedIn spend overnight, even if the model says they should. Bounds like "50% to 200% of current spend" keep recommendations actionable.
Response Curves
The response curves are the most important output. They show:
- Current position on the curve
- Where diminishing returns kick in
- Saturation points by channel
When a channel's curve is nearly flat at current spend levels, that's a clear signal to reallocate budget elsewhere.
Simulation and Forecasting
Models are only useful if people can explore them. Interactive simulation is essential:
Spend Scenario Testing
Users should be able to:
- Adjust channel spend with sliders or inputs
- See predicted outcomes update in real-time
- Compare scenarios side-by-side
- Save scenarios for later reference
The confidence intervals (from the Bayesian posterior) communicate uncertainty—a model isn't very useful if people treat point estimates as gospel.
What-If Analysis
Beyond simple scenarios, enable questions like:
- "What if we cut TV by 20% and move it to digital?"
- "What's the optimal allocation for Q4 given seasonal patterns?"
- "How would results differ with a 10% larger budget?"
Versioning and Reproducibility
A scenario represents a trained model instance with all its artifacts:
- Model configuration used
- Training data snapshot
- Trained model weights
- Generated visualizations
- Optimization results
Each training run creates a new version. The UI shows version history and lets users compare outputs across retrains—useful for debugging when results shift unexpectedly.
Visualization Considerations
Bayesian models produce rich outputs that need appropriate visualization:
Posterior distributions: Show uncertainty, not just point estimates. Box plots, violin plots, or credible intervals work well.
Response curves: Interactive charts letting users explore the relationship between spend and outcome.
Attribution over time: How much did each channel contribute week by week?
Comparison views: How does the current allocation compare to optimal?
Consider extracting chart data from model outputs for custom rendering rather than using static images. Interactive charts with tooltips, zooming, and linked brushing improve user understanding.
Lessons Learned
A few things became obvious in hindsight:
Job queue needs persistence. In-memory queues work fine for single-instance deployment, but horizontal scaling requires shared state. Plan for Redis or a cloud task queue from the start.
Cache aggressively. Response curve calculations are deterministic given a trained model. Caching these cuts frontend latency significantly.
Prior elicitation is hard. Setting Bayesian priors is powerful but requires statistical intuition. A wizard that translates "I think Facebook usually returns $1.50 per dollar spent" into proper prior parameters makes this accessible to more users.
Training time expectations. Users expecting instant results need clear communication about why MCMC takes time. Progress indicators with estimated completion help manage expectations.
Key Takeaways
The core insight: the model is the easy part. Bayesian MMM frameworks handle the hard statistics. The work is in everything around it:
- Data pipelines that don't break
- Async job handling that doesn't leak memory
- UIs that translate posterior distributions into actionable recommendations
- Versioning that enables comparison and debugging
If you're considering building something similar:
- Start with the data pipeline. Get that bulletproof before touching the model.
- Design for async training from day one.
- Budget twice as much time for the frontend as you think you'll need—stakeholders have opinions about charts.
- Make priors accessible through good UX rather than requiring statistical expertise.
This post discusses architectural patterns for MMM platforms. For MMM methodology, see resources on Bayesian Marketing Mix Modeling. Open-source frameworks like Google's Meridian, Meta's Robyn, and PyMC-Marketing provide implementations of these concepts.
Note: The patterns discussed here are intentionally generalized, drawn from industry experience but presented as transferable concepts rather than specific proprietary implementations.