Model Deep-Dive

Why Tree Depth Matters
for Location+

How one extra level of splits fixes the slider problem and lets Location+ properly rank pitcher command skill.

Sliders vs RHB: A Broken Sub-Model

Five of six Location+ sub-models predict run value well. But breaking balls vs right-handed batters has a seasonal correlation near zero — the model can't rank slider command at all.

0.37
Fastball R (best)
0.31
Offspeed L
0.27
Breaking L
0.035
Breaking R

The breaking_R correlation of 0.035 means: Knowing a pitcher's average Location+ grade for their slider tells you almost nothing about how much run value that slider actually prevented. Sliders specifically show r = −0.050 — the grades are inversely related to outcomes.

The Slider Value Surface Is Non-Convex

Unlike fastballs (good up, bad down center) or changeups (good low), sliders vs same-side batters have two distinct "good" zones separated by the worst possible location.

CATCHER'S VIEW — RHB standing to the right Strike Zone RHB ← ARM SIDE (RHP) GLOVE SIDE → Chase Zone Low + away Backdoor Arm-side edge Hanging Slider Middle-middle Bad zone sits BETWEEN the two good zones

Fastball / Changeup: Convex

One good region, one bad region. A single split on plate_z gets you 80% of the way there. The value surface is a smooth gradient — easy for shallow trees.

Slider vs RHB: Non-Convex

Two separated good regions with the worst zone in between. No single split on any feature isolates "good." The tree must carve two disconnected pockets in 2D space — an XOR-like pattern.

Split Budget: Where Do 4 Levels Go?

Each tree path has exactly depth splits. With 11 features competing for those splits, the budget is tight. Here's how a single tree must allocate splits to capture slider value.

Depth 4 — Split Budget Per Path 4 total splits to represent spatial interaction + count context SPLIT 1 plate_x > -0.3? SPLIT 2 plate_z < 2.0? SPLIT 3 plate_z > 2.8? SPLIT 4 — ONLY ONE LEFT strikes? balls? outs? ⚠ 3 splits consumed for spatial interaction → only 1 remains for all count features Depth 5 — Split Budget Per Path 5 total splits — enough for spatial + count context SPLIT 1 plate_x SPLIT 2 plate_z SPLIT 3 plate_z SPLIT 4 strikes SPLIT 5 balls ✓ Spatial pattern resolved AND count context preserved — each tree can do both jobs

What the Tree Actually Looks Like

Toggle between depth 4 and depth 5 to see how one extra level changes what the tree can represent. Watch how the leaf nodes (colored boxes at bottom) go from blurry to precise.

plate_x arm-side glove-side plate_z plate_z strikes strikes strikes strikes Good? Meh Meh Bad Meh Good? Meh Meh Problem: No splits left for spatial refinement Backdoor zone (arm-side, mid-height) is lumped with "arm-side, low" → blurry Chase zone (glove-side, very low) lumped with "glove-side, low" → imprecise

How Blurry Grades Kill Seasonal Correlation

Per-pitch noise washes out over a season. Systematic bias doesn't. When the model assigns similar grades to "chase zone slider" and "hanging slider," averaging over 100+ pitches preserves the error.

Depth 4: Grades Are Blurry Depth 5: Grades Are Sharp Pitcher A — Great Slider Command Lives in chase zone, avoids middle Actual outcome: -0.015 avg xRV (saves runs) Loc+ = 102 Pitcher B — Hangs Sliders Catches too much plate, elevated Actual outcome: +0.012 avg xRV (gives up runs) Loc+ = 99 Only 3 pts apart! Can't rank them Pitcher A — Great Slider Command Chase zone pitches correctly valued as elite Actual outcome: -0.015 avg xRV (saves runs) Loc+ = 118 Pitcher B — Hangs Sliders Middle-plate pitches correctly penalized Actual outcome: +0.012 avg xRV (gives up runs) Loc+ = 84 34 pts apart! Clear ranking Seasonal r = 0.035 (no signal) Seasonal r = 0.27+ (real signal)

This Is Underfit, Not Overfit

Four signs that depth 4 was too shallow — and depth 5 is the right fix.

1. Systematic failure, not random

Overfitting = great training, bad test. Here the model fails everywhere — it never captures the pattern. That's textbook underfit.

2. The fix generalizes

Depth 5 improves held-out 2024 test seasonal correlations. Overfit would collapse on test data; this doesn't.

3. Depth hierarchy makes sense

Location+ (11 features) at depth 4 had the fewest splits-per-feature of any model. Stuff+ (24 features, depth 5) and Pitching+ (40 features, depth 6) were already deeper.

4. Still conservative capacity

Depth 5 with 11 features = each path uses ~half the features. Nowhere near memorizing individual pitches. 32 leaf nodes for a continuous 2D surface is modest.

4 → 5
Depth Change
16 → 32
Max Leaf Nodes
+0.003
Pitch-Type r Gain
+0.007
Pitcher r Gain

One Extra Split Changes Everything

The Problem

Sliders have a non-convex value surface. Two good zones separated by the worst zone. Depth 4 can't carve both.

The Mechanism

3 of 4 splits consumed by spatial interaction → no budget for count context. Trees face impossible tradeoff.

The Fix

Depth 5 gives one extra split per path. Now each tree resolves spatial pattern AND conditions on count.