Heads up: All models on this page are in their early stages and should be considered placeholders. The underlying methodology is still being refined — take the specific numbers with a grain of salt.

Limitations & FAQ

Where the models are strong, and where they aren't

Every model has boundaries. Understanding where the grades are reliable — and where they should be taken with a grain of salt — makes you a better consumer of the data.

Where the models are strong
Season-level pitcher evaluation

With 100+ pitches, noise cancels out and grades reliably rank pitchers by skill.

Pitch design feedback

Stuff+ isolates physical properties. Compare movement profiles, velocity bands, and spin characteristics to find what works.

Command vs stuff diagnosis

Location+ and Stuff+ are decorrelated by design. You can identify whether a pitcher succeeds via command, stuff, or both.

Cross-era comparison

The 100-centered scale adjusts for league context. A 115 Stuff+ in 2018 and 2025 both mean the same thing relative to peers.

Where to be cautious
Single-game grades are noisy

A pitcher throws ~90 pitches per start. Even good models can’t separate skill from luck in that sample. Use game grades as directional, not definitive.

Small samples mislead

A reliever with 40 pitches in a month can look elite or terrible by chance. Wait for 100+ pitches before drawing conclusions.

Context changes break projections

Projections assume typical health, workload, and role. A starter-to-reliever conversion, injury, or pitch design change invalidates the baseline.

No catcher or defense adjustment

The models don’t account for catcher framing skill or defensive alignment. Pitchers with elite framers may have slightly inflated Location+ grades.

Postseason data is limited

Most model training uses regular-season data. Postseason grades are scored using regular-season models — they’re valid but have no postseason-specific adjustments.

Rule of thumb
Trust season grades. Question monthly grades. Ignore single-game grades.

The models are designed for season-level evaluation. As sample size shrinks, noise dominates. A full-season Stuff+ of 115 is meaningful. A single-start Stuff+ of 115 might just be luck.

Frequently asked questions
Why can a pitcher have elite Stuff+ but mediocre results?

Stuff+ only measures pitch physics — how nasty the ball moves. If the pitcher throws those nasty pitches down the middle, Location+ will be low and results will suffer. The pitch is good; the execution isn’t.

Why does Location+ look average for a 100-mph pitcher?

Location+ is intentionally blind to velocity. It only sees where the pitch crosses the plate and the count. A 100-mph fastball and an 85-mph changeup in the same location get the same Location+ grade. That’s the point — it isolates command.

Why are grades scaled to 100 instead of showing raw run values?

Raw xRV is hard to interpret (is -0.008 good?). The 100 scale is intuitive: 100 is average, 110 is good, 120 is elite. One standard deviation is roughly 10 points.

How quickly do grades stabilize?

Stuff+ stabilizes fastest (physics are consistent pitch-to-pitch). Location+ takes longer because command is more variable. For reliable season grades, wait for 100+ pitches of that type.

Do the models work for all pitch types?

Yes, but signal strength varies. Fastballs have the strongest signal (most data, clearest physics). Rare pitch types like knuckleballs or eephus pitches have less data and noisier grades.

Why is Pitching+ different from Location+ + Stuff+?

Pitching+ captures interaction effects that neither model alone can see — like sequencing, batter quality, and the extra value of a great pitch in the right count. It’s not a simple average of the other two.