The Pitch Grading Pipeline

How raw Statcast data becomes Location+, Stuff+, and Pitching+ grades

Contents
1 The Full Pipeline 2 Each Stage in Detail 3 Why Residuals? The Decorrelation Insight 4 The 100 Scale 5 With vs Without Residual Training 6 Which Grade to Use When

1. The Full Pipeline

Every pitch in MLB generates dozens of measurements from Statcast: velocity, spin, movement, location, extension, and more. Our pipeline transforms this raw data into independent, interpretable grades through a carefully ordered sequence of models. Click any stage below to trace the flow.

Data Flow: Statcast Pitch Data → Grades
Statcast Pitch Data ~8.9M pitches, 2015-2025 Target: xRV expected run value per pitch 01 Location+ 11 location + count features XGBoost depth 4 | 400 trees | 6 splits 02 Residual xRV - Location+ prediction raw xRV - Loc+ pred 03 Stuff+ 24 physics-only features XGBoost depth 5 | early stop 50-200 Target: residual xRV pure nastiness signal raw xRV (full features) 04 Pitching+ 40 combined features XGBoost depth 6 | 600 max trees residual CSW 05 Arsenal Synergy 26 arsenal features | RidgeCV how pitches work together grades + stats 06 Projections ElasticNet | 231 features → K%, BB%, kwERA

The key ordering constraint: Location+ must train first so we can compute residuals before Stuff+ trains. Pitching+ trains independently on raw xRV with the full feature set.

2. Each Stage in Detail

Click each stage to expand its details. The pipeline stages build on each other, but each grade answers a different question about a pitch.

01

Location+

Pure command — where was this pitch thrown?

Location+ measures how valuable the pitch location is, completely independent of pitch movement or velocity. A 95 mph fastball and an 82 mph changeup thrown to the exact same spot get the same Location+ grade.

11
features
400
max trees
4
tree depth
6
model splits

Feature categories:

plate_x plate_z zone_z_normalized arm-plane decomposition against_break_location vert_location count features (balls, strikes, is_ahead)

The 6 model splits: fastball/breaking/offspeed crossed with batter hand (L/R). Optimal locations differ by pitch category and batter stance, so separate models capture this naturally.

02

Residual Computation

Strip command to isolate pure pitch quality

This is a subtraction, not a model. For every pitch:

residual = raw xRVLocation+ prediction

If Location+ predicts this pitch's location is worth -0.008 RV (good spot), but the actual outcome was -0.015 RV (even better), the residual is -0.007. That leftover value came from the pitch's physical characteristics — its movement, velocity, deception — not where it was thrown.

This residual becomes the target for Stuff+. See Section 3 for why this decomposition matters.

03

Stuff+

Pure nastiness — pitch physics only

Stuff+ answers: "Given this pitch's physical properties, how hard is it to hit?" No location information. A cement mixer slider thrown down the middle still gets a bad Stuff+ grade. A wipeout slider thrown in the dirt gets a great one.

24
features
600
max trees
5
tree depth
50-200
actual trees (early stop)

Feature categories:

effective_speed plate_speed spin_rate spin_efficiency pfx_x / pfx_z IVB vmov_diff_from_ff extension release_pos tunneling metrics approach_angle

Early stopping is critical here. Because the target is a residual (noisy), the model extracts real physics signal in 50-200 trees, then additional trees memorize noise. The patience window of 50 rounds ensures we stop right when the signal runs out.

04

Pitching+

The complete picture — everything that matters

Pitching+ is the most comprehensive single grade. It trains on raw xRV (not residuals) with the full feature set: physics, location, sequencing, batter quality, and game context. It answers: "How effective is this pitch, considering everything?"

40
features
600
max trees
6
tree depth
0.08
learning rate

Feature categories (beyond Stuff+ and Location+):

decomposed inning haa (horiz approach angle) tunnel rotation axis deviation batter quality pitch_type_count_to_batter fastball_movement_spread hb_x_plate_x interaction

Because Pitching+ sees everything, it has the highest raw predictive power. But it can't tell you why a pitch is good. That's what Location+ and Stuff+ are for — they isolate the two independent dimensions of pitch quality.

05

Arsenal Synergy

How pitches work together as a repertoire

A pitcher's slider might grade as average in isolation, but devastating when paired with a fastball that tunnels off the same release point. Arsenal Synergy measures this interaction effect — the extra value that emerges from the combination of pitches.

26
features
RidgeCV
model type
0.147
test R²

The target is residual CSW (called strikes + whiffs) after removing individual pitch grades. What's left is the synergy effect. A linear model (RidgeCV) is deliberate: arsenal-level features have limited training data per pitcher, so a simpler model avoids overfitting.

Features include: velocity differentials between pitches, movement tunneling metrics, pitch-type usage entropy, and similar arsenal-composition statistics.

06

Projections

Next-season outcome forecasts

Projections move from per-pitch grades to season-level outcomes. Using all the pitch grades plus historical stats, they forecast what a pitcher will do next year.

231
features
ElasticNet
model type
3
targets

Targets: K%, BB%, and kwERA (strikeout-walk ERA). ElasticNet is a regularized linear model that balances L1 and L2 penalties — ideal for high-dimensional inputs (231 features) where many are correlated.

Features include per-pitch-type grades, usage rates, velocity trends, age, innings history, and platoon splits.

3. Why Residuals? The Decorrelation Insight

This is the most important design decision in the pipeline. Without the residual step, Stuff+ and Location+ would be correlated — a good fastball location would inflate Stuff+ because the model can't separate the two signals. With residuals, the grades become independent dimensions.

Decomposing Pitch Outcome
Total Pitch Outcome (xRV) Command Location+ Nastiness Stuff+ Overlap Without residuals, Stuff+ absorbs this sequencing, batter quality, noise... hitting the corners wipeout movement The Residual Step Subtracts command signal FIRST, so Stuff+ learns ONLY the non-overlapping part
Without Residuals

Both models trained on raw xRV

  • Stuff+ sees that well-located pitches get better outcomes
  • Stuff+ implicitly learns location patterns
  • High Location+ correlates with high Stuff+
  • Can't tell if a pitcher is good because of command or stuff
With Residuals

Stuff+ trained on residual xRV

  • Command signal already removed from target
  • Stuff+ can ONLY learn physics patterns
  • Location+ and Stuff+ are near-zero correlation
  • Each grade measures an independent skill dimension
🎯
Analogy: Imagine evaluating a dart player. You want to grade their throw mechanics (arm speed, release point, follow-through) separately from their aim (accuracy to the bullseye). If you just grade both on "points scored," a player who aims well will get inflated mechanics grades because good aim also scores points. The residual step is like first crediting the aim, then only grading mechanics on the leftover performance.

4. The 100 Scale

All grades are expressed on a scale centered at 100. This familiar framing (like the baseball scouting 20-80 scale, or IQ) makes grades immediately interpretable.

From Run Value to Grade
Raw Model Output (xRV) -0.020 elite prevention -0.008 above average 0.000 league average +0.008 below average +0.020 poor prevention GOOD BAD NEGATE + SCALE Displayed Grade 140 ELITE 120 GREAT 110 ABOVE AVG 100 AVERAGE 90 80 POOR 60 GOOD BAD THE SIGN FLIP Negative xRV (good for pitcher) → Grade above 100 (good)

In Statcast, negative run value = runs prevented = good for the pitcher. Our scaling negates the xRV prediction, then shifts and scales so the league average maps to 100 and one standard deviation maps to roughly 15 points. A grade of 115 means the pitch is one SD better than average.

5. With vs Without Residual Training

This interactive comparison shows why the residual step matters in practice. Toggle between the two approaches to see how correlation between grades changes.

Without Residuals
With Residuals
Both Models Trained on Raw xRV Location+ Grade Stuff+ Grade 70 100 130 70 100 130 r = 0.45 Strong positive correlation Pitchers with good location also show inflated Stuff+ grades THE PROBLEM A control artist like Greg Maddux gets a high Stuff+ just from location. Grades don't tell you why he's good.
Stuff+ Trained on Residual xRV Location+ Grade Stuff+ Grade 70 100 130 70 100 130 r = 0.02 Near-zero correlation Location+ and Stuff+ are effectively independent measures THE BENEFIT Greg Maddux: elite Location+, avg Stuff+ Aroldis Chapman: avg Location+, elite Stuff+ Grades tell you exactly why each pitcher succeeds — two independent skill axes.

6. Which Grade to Use When

Each grade answers a different question. Here's a guide for when to use each one.

"Does this pitcher hit his spots?"
Command and control, independent of pitch movement. Useful for evaluating pitch-calling strategy and pitcher accuracy. A high Location+ with low Stuff+ is the Greg Maddux archetype — precision over power.
USE: Location+
"How nasty is this pitch, independent of where it's thrown?"
Pure pitch quality based on physics: velocity, spin, movement, tunneling. A wipeout slider gets a great Stuff+ grade even if the pitcher can't locate it. The best grade for evaluating raw pitch design, development changes, and stuff potential.
USE: Stuff+
"How effective is this pitcher overall, right now?"
The most comprehensive grade — accounts for stuff, command, sequencing, and batter quality. The best single number for evaluating current real-world effectiveness. Highest raw predictive power for game outcomes.
USE: Pitching+
"How well do this pitcher's pitches complement each other?"
Arsenal-level interaction effects. A pitcher might have average individual pitches but elite Synergy because they tunnel off the same release point. Useful for trade evaluations and understanding why some pitchers outperform their individual pitch grades.
USE: Arsenal Synergy
"What will this pitcher do next season?"
Forward-looking K%, BB%, and kwERA projections. Combines current grades with historical trends, age, and workload. The right choice for fantasy baseball, contract evaluations, and preseason outlook.
USE: Projections
Quick Reference: Model Specs
Grade Model Features Target Trees / Config
Location+ XGBoost 11 raw xRV 400 / depth 4
Stuff+ XGBoost 24 residual xRV 50-200 / depth 5
Pitching+ XGBoost 40 raw xRV 600 max / depth 6
Arsenal Synergy RidgeCV 26 residual CSW linear (L2)
Projections ElasticNet 231 K%, BB%, kwERA linear (L1+L2)

Stockyard Baseball — Model Explainers