How one extra level of splits fixes the slider problem and lets Location+ properly rank pitcher command skill.
The Problem
Sliders vs RHB: A Broken Sub-Model
Five of six Location+ sub-models predict run value well. But breaking balls vs right-handed batters has a seasonal correlation near zero — the model can't rank slider command at all.
0.37
Fastball R (best)
0.31
Offspeed L
0.27
Breaking L
0.035
Breaking R
The breaking_R correlation of 0.035 means: Knowing a pitcher's average Location+ grade for their slider tells you almost nothing about how much run value that slider actually prevented. Sliders specifically show r = −0.050 — the grades are inversely related to outcomes.
Root Cause
The Slider Value Surface Is Non-Convex
Unlike fastballs (good up, bad down center) or changeups (good low), sliders vs same-side batters have two distinct "good" zones separated by the worst possible location.
Fastball / Changeup: Convex
One good region, one bad region. A single split on plate_z gets you 80% of the way there. The value surface is a smooth gradient — easy for shallow trees.
Slider vs RHB: Non-Convex
Two separated good regions with the worst zone in between. No single split on any feature isolates "good." The tree must carve two disconnected pockets in 2D space — an XOR-like pattern.
The Mechanism
Split Budget: Where Do 4 Levels Go?
Each tree path has exactly depth splits. With 11 features competing for those splits, the budget is tight. Here's how a single tree must allocate splits to capture slider value.
Visualization
What the Tree Actually Looks Like
Toggle between depth 4 and depth 5 to see how one extra level changes what the tree can represent. Watch how the leaf nodes (colored boxes at bottom) go from blurry to precise.
The Effect
How Blurry Grades Kill Seasonal Correlation
Per-pitch noise washes out over a season. Systematic bias doesn't. When the model assigns similar grades to "chase zone slider" and "hanging slider," averaging over 100+ pitches preserves the error.
The Verdict
This Is Underfit, Not Overfit
Four signs that depth 4 was too shallow — and depth 5 is the right fix.
1. Systematic failure, not random
Overfitting = great training, bad test. Here the model fails everywhere — it never captures the pattern. That's textbook underfit.
2. The fix generalizes
Depth 5 improves held-out 2024 test seasonal correlations. Overfit would collapse on test data; this doesn't.
3. Depth hierarchy makes sense
Location+ (11 features) at depth 4 had the fewest splits-per-feature of any model. Stuff+ (24 features, depth 5) and Pitching+ (40 features, depth 6) were already deeper.
4. Still conservative capacity
Depth 5 with 11 features = each path uses ~half the features. Nowhere near memorizing individual pitches. 32 leaf nodes for a continuous 2D surface is modest.
4 → 5
Depth Change
16 → 32
Max Leaf Nodes
+0.003
Pitch-Type r Gain
+0.007
Pitcher r Gain
Summary
One Extra Split Changes Everything
●
The Problem
Sliders have a non-convex value surface. Two good zones separated by the worst zone. Depth 4 can't carve both.
●
The Mechanism
3 of 4 splits consumed by spatial interaction → no budget for count context. Trees face impossible tradeoff.
●
The Fix
Depth 5 gives one extra split per path. Now each tree resolves spatial pattern AND conditions on count.