💷📊
Introduction to Football Data
Understanding how data analytics has transformed the beautiful game — from scouting to tactics to match predictions.
The Data Revolution in Football

Football has undergone a quiet revolution over the past decade. What was once a sport governed purely by intuition, experience, and the "eye test" is now deeply intertwined with data science. Clubs at every level — from grassroots academies to elite Champions League sides — use data to inform decisions on player recruitment, tactical setups, opponent analysis, and in-game adjustments.

This shift was popularized by the "Moneyball" approach in baseball, but football presents unique challenges: it's a low-scoring, fluid game where context matters enormously. A shot from 25 yards under heavy pressure is fundamentally different from a tap-in at the back post, even though both count as "1 shot" in basic statistics. Modern football analytics addresses these nuances through probabilistic models and event-level data.

Key Metrics Explained
The essential statistics used in modern football analysis
Expected Goals (xG)

A probability (0-1) representing the chance a shot will result in a goal, based on shot location, body part, assist type, and defensive pressure.

Examples:
  • • Penalty kick: 0.76
  • • Header from corner (10 yds): 0.08
  • • One-on-one (8 yds): 0.45
Expected Assists (xA)

The likelihood that a pass will become an assist, calculated from the xG of the resulting shot. Isolates chance creation from finishing quality.

Examples:
  • • Through ball to 1v1: 0.40
  • • Square pass before long shot: 0.03
Expected Threat (xT)

Measures the probability that a possession will lead to a goal within the next few actions, based on ball location. Each pass or carry changes xT — positive change means moving to a more dangerous area.

How it differs from xG:

xG only measures shots. xT values every action — a midfielder playing a pass from the halfway line into the box generates +0.08 xT even without a shot.

Line-Breaking Passes

Passes that bypass an entire line of opposition players (defensive or midfield). These are the most dangerous passes in football as they eliminate multiple defenders in one action.

Examples:
  • • Through ball splitting CBs: Breaks defensive line
  • • Pass from CB to striker between lines: Breaks midfield
Expected Goals Against (xGA)

Sum of xG values for all shots conceded. Evaluates defensive and goalkeeping performance independent of opponent finishing quality.

Interpretation:

A keeper conceding 25 goals from 30 xGA is overperforming (saving more than expected).

Non-Penalty xG (npxG)

xG calculated excluding penalty kicks. Penalties have a fixed high xG (~0.76) that can skew a player's overall numbers.

Why it matters:

A striker with 12 goals from 10 xG looks clinical — but if 4 came from penalties, npxG tells the real story.

Progressive Passes & Carries

Passes or dribbles that move the ball significantly closer to the opponent's goal (typically 10+ yards forward or into the penalty area).

Why it matters:

70% possession with only sideways passes isn't dominating — progressive actions measure real forward momentum.

PPDA (Pressing Intensity)

Passes Allowed Per Defensive Action — measures how aggressively a team presses in the attacking 60% of the pitch.

Benchmarks:
  • • Aggressive press (Klopp's Liverpool): 8-10
  • • Conservative team: 14-16
Traditional Metrics
Goal Difference

Goals For - Goals Against. Compare to xGD (xG - xGA) to detect over/underperformance.

Points

3 for a win, 1 for a draw, 0 for a loss. The ultimate league measure.

Shots on Target %

Percentage of shots that hit the frame. Indicates shooting accuracy.

Real-World Applications
How clubs and analysts use these metrics
Player Recruitment

Identify undervalued players using xG and xA. A winger with high xA but few assists might be playing with poor finishers — sign him and pair with a clinical striker.

Tactical Analysis

If the opposition concedes most xGA from crosses, emphasize width. If they're vulnerable to through balls behind a high line, focus on direct passes.

In-Game Decisions

Some clubs feed real-time data to the bench. If a pattern generates only 0.02 xG per shot, it's time to change the approach.

Performance Evaluation

Post-match analysis goes beyond the scoreline. Did we win despite being outplayed (xG 0.8 vs 1.5)? Or dominate but fail to convert (xG 2.5 vs 0.6)?

Where Does Football Data Come From?
Manual Tagging

Trained analysts watch matches and log events (passes, shots, fouls) with timestamps and coordinates. Companies like Opta and StatsBomb use this method.

Tracking Data

Cameras (8-12 per stadium) track every player and the ball 25 times per second, generating X/Y coordinates for advanced off-ball movement analysis.

Wearables

GPS vests worn by players capture physical metrics: distance covered, sprint counts, acceleration/deceleration patterns.

Limitations & Caveats
Data is powerful, but it's not everything
Context is king

A 1.0 xG chance missed in the 90th minute of a cup final is different from mid-season.

Sample size matters

5 games tells you little. Season-long samples are reliable; multi-season is better.

Models have assumptions

xG models don't capture everything — goalkeeper positioning and psychological pressure are hard to quantify.

Data quality varies

Top leagues have excellent coverage. Lower leagues and youth football often have limited data.

Interactive Demo
An example of a dashboard a data scientist or analyst might build at a football club to communicate performance insights to coaches and recruitment staff
Hover
See detailed stats
Click legend
Toggle metrics
Zoom/Pan
Focus on areas
This dashboard uses fictional data for demonstration purposes only
Further Learning
Resources to dive deeper into football analytics
StatsBomb Open Data

Free event-level data for selected matches and competitions — excellent for learning and experimentation.

FBref

Comprehensive statistics with xG data powered by StatsBomb for major leagues.

The Expected Goals Philosophy

A book by James Tippett that explains xG from first principles.

Soccermatics

David Sumpter's book on the mathematics behind football analysis.