Football has undergone a quiet revolution over the past decade. What was once a sport governed purely by intuition, experience, and the "eye test" is now deeply intertwined with data science. Clubs at every level — from grassroots academies to elite Champions League sides — use data to inform decisions on player recruitment, tactical setups, opponent analysis, and in-game adjustments.
This shift was popularized by the "Moneyball" approach in baseball, but football presents unique challenges: it's a low-scoring, fluid game where context matters enormously. A shot from 25 yards under heavy pressure is fundamentally different from a tap-in at the back post, even though both count as "1 shot" in basic statistics. Modern football analytics addresses these nuances through probabilistic models and event-level data.
A probability (0-1) representing the chance a shot will result in a goal, based on shot location, body part, assist type, and defensive pressure.
- • Penalty kick: 0.76
- • Header from corner (10 yds): 0.08
- • One-on-one (8 yds): 0.45
The likelihood that a pass will become an assist, calculated from the xG of the resulting shot. Isolates chance creation from finishing quality.
- • Through ball to 1v1: 0.40
- • Square pass before long shot: 0.03
Measures the probability that a possession will lead to a goal within the next few actions, based on ball location. Each pass or carry changes xT — positive change means moving to a more dangerous area.
xG only measures shots. xT values every action — a midfielder playing a pass from the halfway line into the box generates +0.08 xT even without a shot.
Passes that bypass an entire line of opposition players (defensive or midfield). These are the most dangerous passes in football as they eliminate multiple defenders in one action.
- • Through ball splitting CBs: Breaks defensive line
- • Pass from CB to striker between lines: Breaks midfield
Sum of xG values for all shots conceded. Evaluates defensive and goalkeeping performance independent of opponent finishing quality.
A keeper conceding 25 goals from 30 xGA is overperforming (saving more than expected).
xG calculated excluding penalty kicks. Penalties have a fixed high xG (~0.76) that can skew a player's overall numbers.
A striker with 12 goals from 10 xG looks clinical — but if 4 came from penalties, npxG tells the real story.
Passes or dribbles that move the ball significantly closer to the opponent's goal (typically 10+ yards forward or into the penalty area).
70% possession with only sideways passes isn't dominating — progressive actions measure real forward momentum.
Passes Allowed Per Defensive Action — measures how aggressively a team presses in the attacking 60% of the pitch.
- • Aggressive press (Klopp's Liverpool): 8-10
- • Conservative team: 14-16
Goals For - Goals Against. Compare to xGD (xG - xGA) to detect over/underperformance.
3 for a win, 1 for a draw, 0 for a loss. The ultimate league measure.
Percentage of shots that hit the frame. Indicates shooting accuracy.
Identify undervalued players using xG and xA. A winger with high xA but few assists might be playing with poor finishers — sign him and pair with a clinical striker.
If the opposition concedes most xGA from crosses, emphasize width. If they're vulnerable to through balls behind a high line, focus on direct passes.
Some clubs feed real-time data to the bench. If a pattern generates only 0.02 xG per shot, it's time to change the approach.
Post-match analysis goes beyond the scoreline. Did we win despite being outplayed (xG 0.8 vs 1.5)? Or dominate but fail to convert (xG 2.5 vs 0.6)?
Trained analysts watch matches and log events (passes, shots, fouls) with timestamps and coordinates. Companies like Opta and StatsBomb use this method.
Cameras (8-12 per stadium) track every player and the ball 25 times per second, generating X/Y coordinates for advanced off-ball movement analysis.
GPS vests worn by players capture physical metrics: distance covered, sprint counts, acceleration/deceleration patterns.
A 1.0 xG chance missed in the 90th minute of a cup final is different from mid-season.
5 games tells you little. Season-long samples are reliable; multi-season is better.
xG models don't capture everything — goalkeeper positioning and psychological pressure are hard to quantify.
Top leagues have excellent coverage. Lower leagues and youth football often have limited data.
Free event-level data for selected matches and competitions — excellent for learning and experimentation.
Comprehensive statistics with xG data powered by StatsBomb for major leagues.
A book by James Tippett that explains xG from first principles.
David Sumpter's book on the mathematics behind football analysis.