The Pitch Grading Pipeline

How raw Statcast data becomes Location+, Stuff+, and Pitching+ grades

Contents

1 The Full Pipeline 2 Each Stage in Detail 3 Why Residuals? The Decorrelation Insight 4 The 100 Scale 5 With vs Without Residual Training 6 Which Grade to Use When

1. The Full Pipeline

Every pitch in MLB generates dozens of measurements from Statcast: velocity, spin, movement, location, extension, and more. Our pipeline transforms this raw data into independent, interpretable grades through a carefully ordered sequence of models. Click any stage below to trace the flow.

Data Flow: Statcast Pitch Data → Grades

The key ordering constraint: Location+ must train first so we can compute residuals before Stuff+ trains. Pitching+ trains independently on raw xRV with the full feature set.

2. Each Stage in Detail

Click each stage to expand its details. The pipeline stages build on each other, but each grade answers a different question about a pitch.

Location+

Pure command — where was this pitch thrown?

▼

Location+ measures how valuable the pitch location is, completely independent of pitch movement or velocity. A 95 mph fastball and an 82 mph changeup thrown to the exact same spot get the same Location+ grade.

features

400

max trees

tree depth

model splits

Feature categories:

plate_x plate_z zone_z_normalized arm-plane decomposition against_break_location vert_location count features (balls, strikes, is_ahead)

The 6 model splits: fastball/breaking/offspeed crossed with batter hand (L/R). Optimal locations differ by pitch category and batter stance, so separate models capture this naturally.

Residual Computation

Strip command to isolate pure pitch quality

▼

This is a subtraction, not a model. For every pitch:

      residual = raw xRV − Location+ prediction
    

If Location+ predicts this pitch's location is worth -0.008 RV (good spot), but the actual outcome was -0.015 RV (even better), the residual is -0.007. That leftover value came from the pitch's physical characteristics — its movement, velocity, deception — not where it was thrown.

This residual becomes the target for Stuff+. See Section 3 for why this decomposition matters.

Stuff+

Pure nastiness — pitch physics only

▼

Stuff+ answers: "Given this pitch's physical properties, how hard is it to hit?" No location information. A cement mixer slider thrown down the middle still gets a bad Stuff+ grade. A wipeout slider thrown in the dirt gets a great one.

features

600

max trees

tree depth

50-200

actual trees (early stop)

Feature categories:

effective_speed plate_speed spin_rate spin_efficiency pfx_x / pfx_z IVB vmov_diff_from_ff extension release_pos tunneling metrics approach_angle

Early stopping is critical here. Because the target is a residual (noisy), the model extracts real physics signal in 50-200 trees, then additional trees memorize noise. The patience window of 50 rounds ensures we stop right when the signal runs out.

Pitching+

The complete picture — everything that matters

▼

Pitching+ is the most comprehensive single grade. It trains on raw xRV (not residuals) with the full feature set: physics, location, sequencing, batter quality, and game context. It answers: "How effective is this pitch, considering everything?"

features

600

max trees

tree depth

0.08

learning rate

Feature categories (beyond Stuff+ and Location+):

decomposed inning haa (horiz approach angle) tunnel rotation axis deviation batter quality pitch_type_count_to_batter fastball_movement_spread hb_x_plate_x interaction

Because Pitching+ sees everything, it has the highest raw predictive power. But it can't tell you why a pitch is good. That's what Location+ and Stuff+ are for — they isolate the two independent dimensions of pitch quality.

Arsenal Synergy

How pitches work together as a repertoire

▼

A pitcher's slider might grade as average in isolation, but devastating when paired with a fastball that tunnels off the same release point. Arsenal Synergy measures this interaction effect — the extra value that emerges from the combination of pitches.

features

RidgeCV

model type

0.147

test R²

The target is residual CSW (called strikes + whiffs) after removing individual pitch grades. What's left is the synergy effect. A linear model (RidgeCV) is deliberate: arsenal-level features have limited training data per pitcher, so a simpler model avoids overfitting.

Features include: velocity differentials between pitches, movement tunneling metrics, pitch-type usage entropy, and similar arsenal-composition statistics.

Projections

Next-season outcome forecasts

▼

Projections move from per-pitch grades to season-level outcomes. Using all the pitch grades plus historical stats, they forecast what a pitcher will do next year.

231

features

ElasticNet

model type

targets

Targets: K%, BB%, and kwERA (strikeout-walk ERA). ElasticNet is a regularized linear model that balances L1 and L2 penalties — ideal for high-dimensional inputs (231 features) where many are correlated.

Features include per-pitch-type grades, usage rates, velocity trends, age, innings history, and platoon splits.

3. Why Residuals? The Decorrelation Insight

This is the most important design decision in the pipeline. Without the residual step, Stuff+ and Location+ would be correlated — a good fastball location would inflate Stuff+ because the model can't separate the two signals. With residuals, the grades become independent dimensions.

Decomposing Pitch Outcome

Without Residuals

Both models trained on raw xRV

Stuff+ sees that well-located pitches get better outcomes
Stuff+ implicitly learns location patterns
High Location+ correlates with high Stuff+
Can't tell if a pitcher is good because of command or stuff

With Residuals

Stuff+ trained on residual xRV

Command signal already removed from target
Stuff+ can ONLY learn physics patterns
Location+ and Stuff+ are near-zero correlation
Each grade measures an independent skill dimension

🎯

Analogy: Imagine evaluating a dart player. You want to grade their throw mechanics (arm speed, release point, follow-through) separately from their aim (accuracy to the bullseye). If you just grade both on "points scored," a player who aims well will get inflated mechanics grades because good aim also scores points. The residual step is like first crediting the aim, then only grading mechanics on the leftover performance.

4. The 100 Scale

All grades are expressed on a scale centered at 100. This familiar framing (like the baseball scouting 20-80 scale, or IQ) makes grades immediately interpretable.

From Run Value to Grade

In Statcast, negative run value = runs prevented = good for the pitcher. Our scaling negates the xRV prediction, then shifts and scales so the league average maps to 100 and one standard deviation maps to roughly 15 points. A grade of 115 means the pitch is one SD better than average.

5. With vs Without Residual Training

This interactive comparison shows why the residual step matters in practice. Toggle between the two approaches to see how correlation between grades changes.

Without Residuals

With Residuals

6. Which Grade to Use When

Each grade answers a different question. Here's a guide for when to use each one.

"Does this pitcher hit his spots?"

Command and control, independent of pitch movement. Useful for evaluating pitch-calling strategy and pitcher accuracy. A high Location+ with low Stuff+ is the Greg Maddux archetype — precision over power.

USE: Location+

"How nasty is this pitch, independent of where it's thrown?"

Pure pitch quality based on physics: velocity, spin, movement, tunneling. A wipeout slider gets a great Stuff+ grade even if the pitcher can't locate it. The best grade for evaluating raw pitch design, development changes, and stuff potential.

USE: Stuff+

"How effective is this pitcher overall, right now?"

The most comprehensive grade — accounts for stuff, command, sequencing, and batter quality. The best single number for evaluating current real-world effectiveness. Highest raw predictive power for game outcomes.

USE: Pitching+

"How well do this pitcher's pitches complement each other?"

Arsenal-level interaction effects. A pitcher might have average individual pitches but elite Synergy because they tunnel off the same release point. Useful for trade evaluations and understanding why some pitchers outperform their individual pitch grades.

USE: Arsenal Synergy

"What will this pitcher do next season?"

Forward-looking K%, BB%, and kwERA projections. Combines current grades with historical trends, age, and workload. The right choice for fantasy baseball, contract evaluations, and preseason outlook.

USE: Projections

Quick Reference: Model Specs

Grade	Model	Features	Target	Trees / Config
Location+	XGBoost	11	raw xRV	400 / depth 4
Stuff+	XGBoost	24	residual xRV	50-200 / depth 5
Pitching+	XGBoost	40	raw xRV	600 max / depth 6
Arsenal Synergy	RidgeCV	26	residual CSW	linear (L2)
Projections	ElasticNet	231	K%, BB%, kwERA	linear (L1+L2)

Stockyard Baseball — Model Explainers