About This Project

A full-stack web app for analyzing MLB pitching data. Built on pitch-level Statcast data with custom-computed statistics and machine learning pitch grades.

7.7M

Pitches Tracked

2,500+

Pitchers

Seasons

180+

Statistics

What You Can Do

Analysis

Explore correlations, stickiness, predictive power, and trends across 180+ pitching stats. Includes team-level analysis against runs allowed.

Heatmaps

Strike zone heatmaps for any pitcher, with 6 metrics. Compare two pitchers side-by-side or view league-wide patterns.

Pitcher Profiles

Detailed pages with season stats, pitch arsenal cards, game logs, strike zone scatter plots, and trend charts.

Leaderboards

Sortable rankings across Results, Stuff, Velocity, and Contact categories with adjustable filters.

Stockyard Grades

Machine learning pitch quality grades trained on 7.7M pitches. Three models: Stuff+, Location+, and Pitching+.

Projections

Next-season K%, BB%, and kwERA projections powered by ElasticNet models trained on 200+ pitcher features.

Compare

Side-by-side pitcher comparison with overlaid velocity trends, arsenal breakdowns, and heatmaps.

Stockyard Pitch Grades

Three machine learning models grade every pitch thrown since 2015. All grades use a 100-based scale where 100 is league average and higher is better. Each pitch is scored separately against left-handed and right-handed batters.

Stuff+

How nasty is the pitch? Grades based on physical characteristics: velocity, movement, spin, release point, extension, and trajectory.

XGBoost · 23 features · Includes per-pitch contact quality adjustment

Location+

How well is the pitch located? Grades based on plate location, distance from the zone, count awareness, and sequencing context.

LightGBM · 13 features · Most temporally stable of the three models

Pitching+

The full picture. Combines stuff and location features with sequencing (pitch mix, fastball differentials) for an overall pitch effectiveness grade.

XGBoost · 27 features · Highest predictive power (R² = 0.25)

All models trained on Called Strikes + Whiffs (CSW) as the target variable across 7.7M+ pitches. Models are trained per pitch type and batter handedness, with batter quality features (prior-year whiff rate, chase rate, CSW rate) to account for opponent strength.

Stockyard Projections

Next-season predictions for pitcher performance. ElasticNet models trained on 10 years of pitcher-season transitions (1,700+ data points) with 200+ features from Statcast, FanGraphs, and Stockyard grades.

K% & BB%

Direct predictions of strikeout rate and walk rate for the following season, the two stats most within a pitcher's control.

K% R² = 0.52 · BB% R² = 0.34 · Beats Marcel-style baselines

kwERA

Derived from projected K% and BB% using the formula 5.40 − 12 × (K% − BB%). A simple ERA estimator that's nearly as predictive as SIERA with no fitting required.

Public formula · r = 0.60 year-over-year · r = 0.25 vs next-year ERA

Features include 5 time windows (full season, halves, last 60/30 days), 2-year history with decay weighting, Stockyard grade trajectories, pitch mix changes, and postseason data when available.

Data Pipeline

Statcast (Primary Source)

Every pitch thrown since 2015 — velocity, spin, movement, location, outcomes, and extended tracking data (arm angle, bat speed, game situation). The majority of stats on this site are computed directly from this pitch-level data, including K%, BB%, FIP, xFIP, BABIP, batted ball stats, plate discipline, contact quality, pitch movement, and more.

MLB Stats API

Official scorer data that can't be derived from pitch tracking alone — ERA (requires earned/unearned run distinction), games started, wins, losses, saves, and holds.

FanGraphs

A small set of advanced metrics that require park factors or complex modeling beyond what pitch data alone can produce: WAR, SIERA, xERA, park-adjusted stats (ERA-, FIP-), leverage/clutch metrics (WPA, RE24, Clutch), and run values per pitch type.

Three-tier merge priority

FanGraphs provides the baseline, then self-computed Statcast aggregations override where possible, then MLB official stats take highest priority. This ensures maximum accuracy at every level.

Tech Stack

Frontend

Next.js 14, React, Tailwind CSS, TanStack Query, Recharts

Backend

Python FastAPI, SQLAlchemy, SQLite

ML Models

XGBoost + LightGBM for pitch grading, with a separate contact quality model for Stuff+ adjustment

Data Ingestion

pybaseball for Statcast, MLB Stats API for official records, FanGraphs API for advanced metrics

What We Compute Ourselves

Rather than relying on third-party leaderboards, the majority of statistics on this site are aggregated directly from raw Statcast pitch data. This gives us full control over methodology and lets us offer stats that aren't available elsewhere.

Rate Stats

K%, BB%, K-BB%, HR%, TTO%, BABIP, AVG, OBP, SLG, WHIP, K/9, BB/9, HR/9

ERA Estimators

FIP and xFIP computed with year-specific constants and league HR/FB rates

Batted Ball

GB%, FB%, LD%, IFFB%, Pull%, Cent%, Oppo%, FB Pull% — spray angles from hit coordinates

Contact Quality

Hard Hit%, Soft%, Barrel%, Sweet Spot%, Avg/Max Exit Velo, Avg Launch Angle

Plate Discipline

O-Swing%, Z-Swing%, Swing%, O-Contact%, Z-Contact%, Contact%, SwStr%, CStr%, CSW%

Pitch Movement

iVB and iHB from raw Statcast pfx data — more accurate than FanGraphs, which measures at a shorter distance

Arm-Slot Adjusted

Movement and acceleration adjusted for arm angle via regression, with submarine outlier detection

Per-Pitch-Type

Velocity, usage, CSW%, Barrel%, and xRV per 100 for each of 7 pitch types

Stockyard Originals

xRV (BABIP-neutral expected run value), Stuff+, Location+, Pitching+ grades for every pitch