Heads up: All models on this page are in their early stages and should be considered placeholders. The underlying methodology is still being refined — take the specific numbers with a grain of salt.
Where the models are strong, and where they aren't
Every model has boundaries. Understanding where the grades are reliable — and where they should be taken with a grain of salt — makes you a better consumer of the data.
With 100+ pitches, noise cancels out and grades reliably rank pitchers by skill.
Stuff+ isolates physical properties. Compare movement profiles, velocity bands, and spin characteristics to find what works.
Location+ and Stuff+ are decorrelated by design. You can identify whether a pitcher succeeds via command, stuff, or both.
The 100-centered scale adjusts for league context. A 115 Stuff+ in 2018 and 2025 both mean the same thing relative to peers.
A pitcher throws ~90 pitches per start. Even good models can’t separate skill from luck in that sample. Use game grades as directional, not definitive.
A reliever with 40 pitches in a month can look elite or terrible by chance. Wait for 100+ pitches before drawing conclusions.
Projections assume typical health, workload, and role. A starter-to-reliever conversion, injury, or pitch design change invalidates the baseline.
The models don’t account for catcher framing skill or defensive alignment. Pitchers with elite framers may have slightly inflated Location+ grades.
Most model training uses regular-season data. Postseason grades are scored using regular-season models — they’re valid but have no postseason-specific adjustments.
The models are designed for season-level evaluation. As sample size shrinks, noise dominates. A full-season Stuff+ of 115 is meaningful. A single-start Stuff+ of 115 might just be luck.
Stuff+ only measures pitch physics — how nasty the ball moves. If the pitcher throws those nasty pitches down the middle, Location+ will be low and results will suffer. The pitch is good; the execution isn’t.
Location+ is intentionally blind to velocity. It only sees where the pitch crosses the plate and the count. A 100-mph fastball and an 85-mph changeup in the same location get the same Location+ grade. That’s the point — it isolates command.
Raw xRV is hard to interpret (is -0.008 good?). The 100 scale is intuitive: 100 is average, 110 is good, 120 is elite. One standard deviation is roughly 10 points.
Stuff+ stabilizes fastest (physics are consistent pitch-to-pitch). Location+ takes longer because command is more variable. For reliable season grades, wait for 100+ pitches of that type.
Yes, but signal strength varies. Fastballs have the strongest signal (most data, clearest physics). Rare pitch types like knuckleballs or eephus pitches have less data and noisier grades.
Pitching+ captures interaction effects that neither model alone can see — like sequencing, batter quality, and the extra value of a great pitch in the right count. It’s not a simple average of the other two.
