Predicting the Championship: How the BTP Model Works
🤖 BTP Machine Learning Series
The Championship resumes after the international break on 3 April. Before it does, this is the story of how BeyondThePrem built a statistical model to generate probability estimates for every fixture — and what we learned about football predictability in the process.
Over the past few weeks, alongside the regular gameweek coverage, I’ve been building something new for BeyondThePrem: a machine learning model that generates pre-match win probabilities for Championship fixtures. From GW40 onwards, those predictions will appear on every gameweek preview. This post explains how it was built, what it found, and — crucially — what it cannot do.
📊 The Data
📁 Championship Dataset
Everything starts with the BTP database, which holds every Championship result since 2019/20. After filtering to finished matches only and excluding the current 2025/26 season (live data — never used for training), the modelling dataset covers:
Matches
3,338
finished results
Seasons
6
2019/20 – 2024/25
Feature rows
3,266
after null drops
The 72 rows dropped are early-season matches where teams had fewer than five prior games — there simply wasn’t enough history to compute rolling form. Rather than impute these, they were dropped cleanly. The remaining 3,266 rows form the full training and test set.
COVID flag: The 2019/20 and 2020/21 seasons were played entirely or partly behind closed doors. A crowd_present feature (0 or 1) was added to capture this structural difference — something a purely football-blind model would otherwise miss.
xG Coverage by Season
🔬 The xG Discovery
The Most Interesting Finding: Data Volume Beats Feature Quality
The original plan was straightforward: use xG (expected goals) as the core feature set. xG is a far better representation of a team’s underlying performance than raw goals — it’s less noisy, less dependent on individual moments of brilliance or goalkeeping howlers, and better at capturing what a team is actually doing rather than what they happened to score.
There was one problem: Championship xG data only exists from 2023/24 onwards.
That gives us two seasons of xG coverage — one for training, one for testing. A logistic regression model trained on 538 rows (one season) and tested on another 544 is working with severely limited data. Compare that to the goals-based model: six full seasons, 2,722 training rows, the same 544-row test.
When both models were evaluated head-to-head on the 2024/25 test season, the results were instructive:
Goals model (6 seasons, 2,722 rows)
1.0640
log loss on 2024/25 test set
Beats baseline by 0.009
xG model (1 season, 538 rows)
1.0831
log loss on 2024/25 test set
Beats baseline by 0.006
The verdict: xG is almost certainly the better feature. But with only one season to train on, the goals model wins on test performance — purely because it has seen 5× more data. As xG data accumulates across further seasons, this will reverse. For now, the production model uses goals-based rolling features for reliability.
The combined model (goals + xG features together) performed worst of all — adding 24 features to a 538-row training set causes overfitting that logistic regression’s regularisation can’t fully compensate for. More features are not always better when data is scarce.
⚙️ How the Model Works
Feature Engineering
For every match, the model looks backwards at completed results and computes rolling statistics for both the home and away team. The key principle: the model never sees the match it is predicting. Every feature is calculated from prior matches only.
Rolling form windows
For each team, the model calculates goals scored, goals conceded, and points earned across the last 5 games and last 10 games — home and away combined. This gives 12 rolling features per match (6 per team). The 10-game window consistently outranked the 5-game window in importance; form over a longer spell is more predictive than recent volatility.
League position at kickoff
Both teams’ positions in the Championship table at the time of the match — computed from all completed results up to that point in the season. A team in the top 6 (promotion zone), mid-table (7th–18th), or bottom 6 (relegation zone) gets a position band flag, one-hot encoded. Being in the promotion zone is the single strongest individual signal in the feature set.
Logistic regression — and why
The model is a multinomial logistic regression: it outputs three probabilities that sum to 100% (home win / draw / away win). It was chosen over fancier alternatives — Random Forest, XGBoost — because it is transparent, well-calibrated, and generalises better with limited data. A logistic regression can tell you not just what it predicted but roughly why. XGBoost, despite its reputation, underperformed the baseline on this dataset.
Top Feature Importances
From the production goals-based logistic regression model, the top predictors by mean absolute coefficient across all three outcome classes:
📈 How Did the Models Perform?
📊 Model Comparison — 2024/25 Test Season (544 matches)
All four models were evaluated on 2024/25, a season none of them had seen during training. The naive baseline predicts the training set class distribution for every match — the floor that any model must beat to add value.
Log loss measures how well a model’s predicted probabilities match what actually happened. Unlike accuracy — which just asks “did you pick the right winner?” — log loss rewards confident correct predictions and punishes confident wrong ones. A model that says “70% home win” and the home team wins scores better than one that says “40% home win” for the same result.
Lower is better. A completely uninformed model that assigns equal probability to all three outcomes (33%/33%/33%) every time would score around 1.099. Our baseline — which uses the actual historical frequency of each outcome — scores 1.073. The logistic regression model scores 1.064, a modest but consistent improvement.
For context, even sophisticated commercial football models rarely achieve log loss below 1.00 on three-outcome prediction. The irreducible randomness in football is real.
Lower log loss = better probability calibration. Logistic Regression wins by a small but consistent margin.
Honest framing: A 0.009 improvement in log loss sounds tiny because it is. The gap between the best model and a dumb baseline is real but modest — this is normal for football prediction. The sport is genuinely uncertain. A model that claimed to be dramatically better than chance should be treated with scepticism, not admiration.
⚠️ What the Model Can and Can’t Do
🔍 Limitations
Draws are almost unpredictable
Across all models, draw recall is near zero — the model rarely predicts a draw even though draws make up ~26% of results. This is the fundamental problem in football prediction: draws look almost identical to close home or away wins before kickoff. No model based on historical team statistics solves this reliably.
No squad information
The model knows nothing about injuries, suspensions, or rotation. If a team’s first-choice striker is suspended for a fixture and the model rates them as strong favourites based on their last 10 games, that rating is stale. This is a known gap and a meaningful one.
No bookmaker comparison yet
The interesting question — whether the model identifies anything the market systematically misprices — hasn’t been tested. That would require comparing model probabilities against implied bookmaker odds over a large sample. It’s a possible future addition, not a current feature.
It will improve over time
As 2025/26 completes and xG data accumulates across more seasons, both the training volume and the quality of features will grow. The xG model’s advantage over the goals model should become clearer as the xG training set expands from one season to two, three, and beyond.
🔴 Live Predictions
⚡ How It Works on the Site
Each gameweek, before fixtures kick off, a Python script connects to the BTP database and computes rolling form and league position for every upcoming Championship fixture. Probabilities are generated by the logistic regression model and written to the database. A WordPress shortcode — — renders them as a horizontal bar chart on the page.
Here’s an example from GW41 (6 April 2026) — Hull City hosting Coventry City:
Match Prediction
Hull City vs Coventry City — GW41, 6 April 2026. Coventry sit 1st (67 pts), Hull 5th (58 pts).
The model rates Coventry as away favourites despite Hull’s home advantage — reflecting the 9-point gap in points and Coventry’s superior form over the last 10 games. 23 predictions were generated for GW41 in total, covering every Championship fixture through 7 April.
These are probability estimates, not certainties. A 44.6% probability for Coventry means the model thinks they win roughly 4 times in 10 — not that they will definitely win. Hull winning would not be a surprise at 34.5%. Football doesn’t do certainties.
📅 Going Forward
📌 Predictions Every Gameweek
From GW40 onwards, ML probability predictions will be embedded in every BTP Championship gameweek preview. They sit alongside the usual form tables, xG charts, and fixture analysis — another data layer to inform how you think about each game.
The model will be retrained at the end of 2025/26 to incorporate the full current season’s data. The plan is to shift progressively to the xG model as coverage grows. The methodology will always be documented here.
The prediction system is entirely built on BTP’s own database — 3,338 finished Championship matches, six seasons, built and trained locally without any external data provider. If you have questions about the methodology or want to see specific numbers, the full technical write-up is available on request.
Model: multinomial logistic regression (scikit-learn 1.8, lbfgs solver). Training data: Championship 2019/20–2023/24 (2,722 matches). Test: 2024/25 (544 matches). Predictions for 2025/26 use completed fixtures up to point of generation. Not financial advice.
