How raw Statcast data becomes Location+, Stuff+, and Pitching+ grades
Every pitch in MLB generates dozens of measurements from Statcast: velocity, spin, movement, location, extension, and more. Our pipeline transforms this raw data into independent, interpretable grades through a carefully ordered sequence of models. Click any stage below to trace the flow.
The key ordering constraint: Location+ must train first so we can compute residuals before Stuff+ trains. Pitching+ trains independently on raw xRV with the full feature set.
Click each stage to expand its details. The pipeline stages build on each other, but each grade answers a different question about a pitch.
Location+ measures how valuable the pitch location is, completely independent of pitch movement or velocity. A 95 mph fastball and an 82 mph changeup thrown to the exact same spot get the same Location+ grade.
Feature categories:
The 6 model splits: fastball/breaking/offspeed crossed with batter hand (L/R). Optimal locations differ by pitch category and batter stance, so separate models capture this naturally.
This is a subtraction, not a model. For every pitch:
If Location+ predicts this pitch's location is worth -0.008 RV (good spot), but the actual outcome was -0.015 RV (even better), the residual is -0.007. That leftover value came from the pitch's physical characteristics — its movement, velocity, deception — not where it was thrown.
This residual becomes the target for Stuff+. See Section 3 for why this decomposition matters.
Stuff+ answers: "Given this pitch's physical properties, how hard is it to hit?" No location information. A cement mixer slider thrown down the middle still gets a bad Stuff+ grade. A wipeout slider thrown in the dirt gets a great one.
Feature categories:
Early stopping is critical here. Because the target is a residual (noisy), the model extracts real physics signal in 50-200 trees, then additional trees memorize noise. The patience window of 50 rounds ensures we stop right when the signal runs out.
Pitching+ is the most comprehensive single grade. It trains on raw xRV (not residuals) with the full feature set: physics, location, sequencing, batter quality, and game context. It answers: "How effective is this pitch, considering everything?"
Feature categories (beyond Stuff+ and Location+):
Because Pitching+ sees everything, it has the highest raw predictive power. But it can't tell you why a pitch is good. That's what Location+ and Stuff+ are for — they isolate the two independent dimensions of pitch quality.
A pitcher's slider might grade as average in isolation, but devastating when paired with a fastball that tunnels off the same release point. Arsenal Synergy measures this interaction effect — the extra value that emerges from the combination of pitches.
The target is residual CSW (called strikes + whiffs) after removing individual pitch grades. What's left is the synergy effect. A linear model (RidgeCV) is deliberate: arsenal-level features have limited training data per pitcher, so a simpler model avoids overfitting.
Features include: velocity differentials between pitches, movement tunneling metrics, pitch-type usage entropy, and similar arsenal-composition statistics.
Projections move from per-pitch grades to season-level outcomes. Using all the pitch grades plus historical stats, they forecast what a pitcher will do next year.
Targets: K%, BB%, and kwERA (strikeout-walk ERA). ElasticNet is a regularized linear model that balances L1 and L2 penalties — ideal for high-dimensional inputs (231 features) where many are correlated.
Features include per-pitch-type grades, usage rates, velocity trends, age, innings history, and platoon splits.
This is the most important design decision in the pipeline. Without the residual step, Stuff+ and Location+ would be correlated — a good fastball location would inflate Stuff+ because the model can't separate the two signals. With residuals, the grades become independent dimensions.
Both models trained on raw xRV
Stuff+ trained on residual xRV
All grades are expressed on a scale centered at 100. This familiar framing (like the baseball scouting 20-80 scale, or IQ) makes grades immediately interpretable.
In Statcast, negative run value = runs prevented = good for the pitcher. Our scaling negates the xRV prediction, then shifts and scales so the league average maps to 100 and one standard deviation maps to roughly 15 points. A grade of 115 means the pitch is one SD better than average.
This interactive comparison shows why the residual step matters in practice. Toggle between the two approaches to see how correlation between grades changes.
Each grade answers a different question. Here's a guide for when to use each one.
| Grade | Model | Features | Target | Trees / Config |
|---|---|---|---|---|
| Location+ | XGBoost | 11 | raw xRV | 400 / depth 4 |
| Stuff+ | XGBoost | 24 | residual xRV | 50-200 / depth 5 |
| Pitching+ | XGBoost | 40 | raw xRV | 600 max / depth 6 |
| Arsenal Synergy | RidgeCV | 26 | residual CSW | linear (L2) |
| Projections | ElasticNet | 231 | K%, BB%, kwERA | linear (L1+L2) |
Stockyard Baseball — Model Explainers