Everything in one place: how the BTP ML model works for each league we cover, why predicting draws is the hardest bit, and the honest running scorecard that tells you whether any of this is any good.
WHY THIS PAGE EXISTS
Football-model writing tends to hide the methodology. We don’t. Every model version is documented, every limitation spelled out, and the live scorecard at the bottom updates automatically as each round’s results come in. Pick whichever league you care about, or read all four.
The Four Methodology Explainers
Championship
The flagship model. Platt-calibrated Logistic Regression trained on 3,266 Championship matches across six seasons. Uses rolling form, goals data, league position, and — for the 2025/26 season — player-quality features.
Model version: goals_logreg_v1_cal · Log loss: 1.036 (baseline 1.064)
League One
Our Random Forest model for League One. Structurally different from the Championship model because League One is a more draw-heavy division, and RF handles that shape better than LR. Tested against LR with player features in April 2026 — RF still wins for L1.
Model version: leagueone_goals_v1 · Trained on: League One historical data
WSL
Our most recent addition and still in genuinely exploratory territory. Smaller historical dataset, different tactical shape, and the lowest-confidence of the three. We publish the numbers with the caveat up front.
Model version: wsl_goals_v1 · Status: exploratory
Why Models Get Draws Wrong
The known blind spot. Probability-argmax models — almost all of them — under-call draws. We explain why this happens, what we tried, what worked (Platt calibration), and what we can honestly offer as an improvement vs the naïve version.
Topic: calibration · Applies to: all three models, especially Championship
The Principles (Short Version)
What We Try To Do — And What We Don’t
We try to:
- Produce calibrated probability estimates, not confident-sounding score predictions.
- Treat the model’s top call and an EV selection as two different things (different filters, different questions).
- Publish the methodology, the log loss vs baseline, and the running hit rate in public.
- Write the accountability post whether the week went well or badly.
We don’t claim:
- That the model is better than the market. It may be, on specific markets and specific fixtures. Over a large sample is the only way to know.
- That EV-positive picks are “good bets” in any individual week. Variance dominates the short term.
- That probability estimates translate directly to stake sizing. They don’t.
Live Model Scorecard
Running Scorecard — Updates Automatically
How the model’s top call has actually landed across the 2025/26 season. Each cell: correct / total predictions (hit rate). Random baseline for a 3-way market is 33%.
CHAMPIONSHIP
67/321 (21%)
LEAGUE ONE
73/300 (24%)
WSL
—
ONE LAST THING
If you spot something wrong with any of the methodology, or think one of the explainers is misleading, we want to know. The model exists because we’re interested in getting it less wrong every season, not in being right about any single call.