A record of every significant upgrade to the Stockyard pitch grading models, in the order they happened.
Stuff+ now trains against a pitcher-facing xwOBA target that strips batter-controlled variance (spray angle, sprint speed), and sees each pitch in the context of the pitcher’s full arsenal. The result is a purer measure of pitch quality.
The first coordinated batch retrain: four confirmed improvements deployed together across all three models. Location+ gains count awareness and edge+lane features (13 features, up from 9). Stuff+ inherits the improved residuals. Pitching+ switches from 6 category models to 18 movement-routed models that specialize by pitch movement profile, and adds bat speed and attack angle as features.
Pitching+ now uses a Residual-Corrected Multi-Target Ensemble (RC-MTE) architecture that trains decomposition heads for swing probability, contact probability, and expected run value by outcome type. This is the largest single model improvement in project history, nearly tripling the model's explanatory power.
Location+ and Pitching+ grades are now rescaled separately for starters and relievers, rather than pooling both populations together. This fixes a systematic distortion where relievers' command grades were compressed.
Training targets are now adjusted by park factor, reducing venue bias in pitch grades. A pitcher's stuff and command are measured against what's expected at that ballpark, not league-wide averages.
Location+ now trains on a pooled expected-wOBA xRV target instead of per-year run-environment tables. This removes year-to-year target drift while preserving pitch-level evaluation on canonical xRV.
Replaced the monolithic Pitching+ model with a two-tier stacked ensemble. Level-0 trains separate Stuff+ and Location+ models with leakage-free out-of-fold predictions, then a Level-1 Ridge meta-model combines them with arsenal synergy scores. Pitching+ now builds on top of the component models rather than duplicating their work.
All strikeouts and walks are now valued at their cross-count average run value during training, removing count-context noise from the target variable. This is the single largest model improvement in the project’s history, with Stuff+ FIP correlation improving 41% and Pitching+ FIP improving 16%.
After 31 incremental upgrades, the low-hanging fruit was gone. We’d tested every promising feature idea, tuned every knob, and tried four different weighting schemes — all dead ends. Instead of continuing to search for the next thing to add, we stopped and asked a different question: what if the problem isn’t what the model knows, but what we’re asking it to do?
Added XGBoost monotonic constraints on effective_speed and plate_speed for fastball models (FF/SI/FC), guaranteeing that higher velocity always produces a better Stuff+ grade when all other pitch characteristics are equal. Eliminates counterintuitive grades for sub-MLB velocities with only minimal MLB-range shifts.
Arsenal Synergy is now decomposed into two interpretable sub-scores: Mix+ (arsenal construction) and Match+ (arsenal deployment), giving pitchers actionable insight into what drives their synergy grade.
Added XGBoost early stopping (50-round patience) with a leak-safe validation split to all three pitch grade models. Stuff+ trees cut by 60-95%, reducing overfitting on the noisy residual target. No regression in seasonal correlations.
Excluded all Coors Field pitches from model training and scoring. Coors' extreme altitude distorts pitch physics (reduced movement, inflated velocity readings), which was biasing grades for pitchers who happened to pitch there. All three models now train on non-Coors data only, and Coors appearances are excluded from pitcher grades.
Removed release_speed from Stuff+ after full confirmation showed it was redundant with effective_speed and plate_speed. Improving by subtraction: all seasonal RV metrics improved, and per-pitch R² jumped 17.8%.
Added against_break_location and against_break_vert to Location+, measuring whether a pitch is located opposite its movement direction. A sinker placed glove-side (against its natural arm-side run) or a fastball located low (against its natural rise) now gets credit for elite command.
Added location_along_arm and location_across_arm to Location+, rotating plate coordinates into each pitcher's arm-plane frame. A sweeper from a low-slot pitcher and a curveball from an over-the-top pitcher now get evaluated in their natural coordinate systems.
Added vmov_diff_from_ff (vertical movement relative to pitcher's fastball) to Stuff+. Previously failed on the old raw xRV target but succeeded after the residual Stuff+ redesign liberated the pure physics signal.
Added three new features to Pitching+, improving predictive accuracy by 0.76%. The model now tracks how many times a pitcher has thrown each pitch type to a batter in the game, how varied a pitcher’s fastball arsenal is, and an explicit arm-side synergy interaction between horizontal break and plate location.
Fundamentally redefined Stuff+ to measure pitch nastiness independent of location. Stuff+ now trains on xRV residuals after removing Location+ predictions, decorrelating it from command quality. Year-over-year stability improved 17%, small-sample reliability jumped 38%, and Stuff+ vs Location+ correlation dropped from r=0.55 to near zero.
Reduced model overfitting by 29–14% through hyperparameter re-optimization. Both Stuff+ and Pitching+ now use shallower trees with faster learning rates, improving generalization to unseen data while maintaining or improving FIP correlation.
Completely redesigned Location+ to match FanGraphs' definition: pure location and count features only, with zero physics. Stripped 16 physics features (38% of old model importance), reducing from 23 to 7 features. BB% prediction improved 154%, Stuff+ correlation dropped 60%.
Re-ran hyperparameter grid search on Stuff+ and Pitching+ using the full 2016-2024 training set. The xRV target benefits from different hyperparameters than CSW did. Both models updated with optimized tree count, depth, and learning rate settings.
Added two new features to Arsenal Synergy capturing pitch sequencing unpredictability and extension-adjusted velocity variation. Combined improvement of +12.9% across three independent validation methods (hold-out, leave-one-year-out CV, and LOOCV).
Systematic post-rebuild optimization of Location+ and Pitching+. Ran six overnight experiments covering feature ablation, new feature screening, regularization tuning, and hyperparameter grid search. Location+ gained +3.24% FIP from three combined changes; Pitching+ gained +0.63% from regularization. Also corrected the 2025 xRV target using the updated wOBA scale.
Introduced Deception Score — a new metric that measures how much harder a pitcher is to hit than expected based on raw pitch characteristics alone.
Scrapped the entire CSW-based pitch grading system and rebuilt from the ground up using expected run value (xRV) as the training target. Every pitch grade on the site — Stuff+, Location+, and Pitching+ — has been retrained. This is the single largest change in Stockyard’s history.
Added axis deviation to Stuff+, trajectory break to Location+, and synced four missing features to Pitching+ that were accidentally left out after earlier upgrades.
Decomposed the raw “inning” feature into three independent signals — role (reliever vs starter), fatigue (pitch count), and familiarity (innings in game) — after discovering that a prior “failed” test was actually caused by a data-corruption bug.
Added induced vertical break (IVB) as a physics context feature for Location+, helping the model understand which zone heights are effective for different pitch movement profiles.
Added three context features that help Location+ and Pitching+ understand where a pitch lands relative to each batter's unique strike zone, what inning it is, and whether runners are on base.
Retired the Sequence+ model after it failed to meet quality standards. Its core sequencing features were already captured by Pitching+.
Completely rebuilt the Arsenal Synergy model with pairwise contrast features and RidgeCV, achieving 10x better accuracy than v1.
Created count-normalized versions of Stuff+ and Pitching+ that strip out the bias from pitch count (a 3-0 fastball isn't worse than an 0-2 fastball, it's just thrown in a different context).
Added horizontal approach angle (HAA) — the left/right angle at which a pitch arrives at the plate, complementing the existing vertical approach angle.
Added seam-shifted wake (SSW) axis deviation to Location+, capturing the gap between a pitch's spin axis and its actual movement direction.
Decomposed tunnel distance into approach-axis components, measuring how much a pitch deviates from the fastball both laterally and along the approach path.
Replaced flat per-pitch-type contact adjustments with a dedicated XGBoost model trained on 982K batted balls to predict expected damage from pitch physics.
Added three prior-year batter stats (whiff rate, chase rate, CSW rate) so the models account for how tough the opposing batter is.
Added arm slot angle as a feature for all models, and spin efficiency for Location+ only.
Ran a 256-experiment grid search to find the best algorithm and hyperparameters for each model type, replacing the initial defaults.