About This Project
A full-stack web app for analyzing MLB pitching data. Built on pitch-level Statcast data with custom-computed statistics and machine learning pitch grades.
7.7M
Pitches Tracked
2,500+
Pitchers
11
Seasons
180+
Statistics
What You Can Do
Analysis
Explore correlations, stickiness, predictive power, and trends across 180+ pitching stats. Includes team-level analysis against runs allowed.
Heatmaps
Strike zone heatmaps for any pitcher, with 6 metrics. Compare two pitchers side-by-side or view league-wide patterns.
Pitcher Profiles
Detailed pages with season stats, pitch arsenal cards, game logs, strike zone scatter plots, and trend charts.
Leaderboards
Sortable rankings across Results, Stuff, Velocity, and Contact categories with adjustable filters.
Stockyard Grades
Machine learning pitch quality grades trained on 7.7M pitches. Three models: Stuff+, Location+, and Pitching+.
Projections
Next-season K%, BB%, and kwERA projections powered by ElasticNet models trained on 200+ pitcher features.
Compare
Side-by-side pitcher comparison with overlaid velocity trends, arsenal breakdowns, and heatmaps.
Stockyard Pitch Grades
Three machine learning models grade every pitch thrown since 2015. All grades use a 100-based scale where 100 is league average and higher is better. Each pitch is scored separately against left-handed and right-handed batters.
Stuff+
How nasty is the pitch? Grades based on physical characteristics: velocity, movement, spin, release point, extension, and trajectory.
XGBoost · 23 features · Includes per-pitch contact quality adjustment
Location+
How well is the pitch located? Grades based on plate location, distance from the zone, count awareness, and sequencing context.
LightGBM · 13 features · Most temporally stable of the three models
Pitching+
The full picture. Combines stuff and location features with sequencing (pitch mix, fastball differentials) for an overall pitch effectiveness grade.
XGBoost · 27 features · Highest predictive power (R² = 0.25)
All models trained on Called Strikes + Whiffs (CSW) as the target variable across 7.7M+ pitches. Models are trained per pitch type and batter handedness, with batter quality features (prior-year whiff rate, chase rate, CSW rate) to account for opponent strength.
Stockyard Projections
Next-season predictions for pitcher performance. ElasticNet models trained on 10 years of pitcher-season transitions (1,700+ data points) with 200+ features from Statcast, FanGraphs, and Stockyard grades.
K% & BB%
Direct predictions of strikeout rate and walk rate for the following season, the two stats most within a pitcher's control.
K% R² = 0.52 · BB% R² = 0.34 · Beats Marcel-style baselines
kwERA
Derived from projected K% and BB% using the formula 5.40 − 12 × (K% − BB%). A simple ERA estimator that's nearly as predictive as SIERA with no fitting required.
Public formula · r = 0.60 year-over-year · r = 0.25 vs next-year ERA
Features include 5 time windows (full season, halves, last 60/30 days), 2-year history with decay weighting, Stockyard grade trajectories, pitch mix changes, and postseason data when available.
Data Pipeline
Statcast (Primary Source)
Every pitch thrown since 2015 — velocity, spin, movement, location, outcomes, and extended tracking data (arm angle, bat speed, game situation). The majority of stats on this site are computed directly from this pitch-level data, including K%, BB%, FIP, xFIP, BABIP, batted ball stats, plate discipline, contact quality, pitch movement, and more.
MLB Stats API
Official scorer data that can't be derived from pitch tracking alone — ERA (requires earned/unearned run distinction), games started, wins, losses, saves, and holds.
FanGraphs
A small set of advanced metrics that require park factors or complex modeling beyond what pitch data alone can produce: WAR, SIERA, xERA, park-adjusted stats (ERA-, FIP-), leverage/clutch metrics (WPA, RE24, Clutch), and run values per pitch type.
Three-tier merge priority
FanGraphs provides the baseline, then self-computed Statcast aggregations override where possible, then MLB official stats take highest priority. This ensures maximum accuracy at every level.
Tech Stack
Frontend
Next.js 14, React, Tailwind CSS, TanStack Query, Recharts
Backend
Python FastAPI, SQLAlchemy, SQLite
ML Models
XGBoost + LightGBM for pitch grading, with a separate contact quality model for Stuff+ adjustment
Data Ingestion
pybaseball for Statcast, MLB Stats API for official records, FanGraphs API for advanced metrics
What We Compute Ourselves
Rather than relying on third-party leaderboards, the majority of statistics on this site are aggregated directly from raw Statcast pitch data. This gives us full control over methodology and lets us offer stats that aren't available elsewhere.
Rate Stats
K%, BB%, K-BB%, HR%, TTO%, BABIP, AVG, OBP, SLG, WHIP, K/9, BB/9, HR/9
ERA Estimators
FIP and xFIP computed with year-specific constants and league HR/FB rates
Batted Ball
GB%, FB%, LD%, IFFB%, Pull%, Cent%, Oppo%, FB Pull% — spray angles from hit coordinates
Contact Quality
Hard Hit%, Soft%, Barrel%, Sweet Spot%, Avg/Max Exit Velo, Avg Launch Angle
Plate Discipline
O-Swing%, Z-Swing%, Swing%, O-Contact%, Z-Contact%, Contact%, SwStr%, CStr%, CSW%
Pitch Movement
iVB and iHB from raw Statcast pfx data — more accurate than FanGraphs, which measures at a shorter distance
Arm-Slot Adjusted
Movement and acceleration adjusted for arm angle via regression, with submarine outlier detection
Per-Pitch-Type
Velocity, usage, CSW%, Barrel%, and xRV per 100 for each of 7 pitch types
Stockyard Originals
xRV (BABIP-neutral expected run value), Stuff+, Location+, Pitching+ grades for every pitch
