How AI Projects Pitcher Performance: Make Smarter Picks
As a sports analyst who builds AI models for a living, I am obsessed with what truly drives pitcher outcomes. This piece shows how I turn Statcast data, weather, and situational context into reliable next-start projections you can use for betting decisions and smarter fantasy moves. I will explain this in plain language, with practical steps and tools you can try yourself.
Table Of Contents
- Data Signals AI Uses to Project Pitcher Performance
- Modeling Approaches That Actually Work
- Workflow and Evaluation That Prevents Leakage
- Practical Build and Tools for Analysts
- Use Cases and Limits Worth Flagging
- Step-by-Step: Build a Next-30-Day K% and xERA Projection
- Frequently Asked Questions (FAQs)
Data Signals AI Uses to Project Pitcher Performance
The backbone of modern pitcher projection is Statcast data. Pitch-level inputs capture how a baseball actually behaved out of the hand, not just the final outcome of the play. When building an advanced predictive model, we track four-seam and sinker velocity trends because they matter most for strikeout projections and hard-contact suppression.
Our algorithms evaluate rolling averages over 5, 10, and 30 days while accounting for the time-through-order effect. We also feed the model absolute spin rate and spin axis stability for each pitch type, converting induced vertical break and horizontal break into standardized z-scores relative to league averages.
Beyond ball flight, tracking release point variance helps preview command issues or underlying physical fatigue . Extension adds perceived velocity that aids pitchers in two-strike counts, though it often melts in high humidity. Finally, heatmaps are transformed into actionable features such as average edge percentage, shadow-zone tendency, and uncompetitive waste pitches.
Pitchers can completely reinvent themselves in two weeks, so our AI is engineered to catch these sudden shifts. The algorithm tracks pitch type shares over short windows against a baseline to flag changes. We use an unsupervised k-means clustering model on movement, spin, and velocity to trigger a new pitch indicator when a pitcher unveils a novel offering.
Usage deltas by hitter handedness and shifts in two-strike sequencing offer critical early signals for future strikeout and walk rates. Because we cannot precisely observe a player's physical intent, we use highly reliable proxies to measure command. These include zone rate, edge rate, chase rate, and first-pitch strike percentage. Called strikes plus whiffs, or CSW, is tracked globally and broken down by specific pitch type and count.
We also isolate in-zone whiff percentage and out-of-zone swing percentage to evaluate sequencing efficiency. Walk proxies like three-ball rate and uncompetitive misses after two strikes give us a clearer picture of control than traditional base-on-balls statistics.
Matchup context is huge for near-term projections, meaning batter quality and technical fit receive massive weight. The model ingests the projected opposing lineup, isolating metrics like weighted on-base percentage and expected wOBA. It evaluates strikeout and walk tendencies based on specific pitch type sensitivity, such as how a lineup handles sliders from a right-handed pitcher.
Platoon splits are heavily analyzed to adjust the expected value of the pitcher's arsenal against lefty-heavy or righty-heavy lineups. We aggregate individual hitter profiles, including out-of-zone swing and in-zone contact percentages, into lineup-weighted expectations. Squads that grind deep counts naturally raise walk rates and inflate pitch counts, forcing an early hook.
Catcher quality moves the needle a bit more than casual fans think. We convert framing runs by zone into a called-strike lift across specific edge regions. High blocking and passed-ball risks affect a pitcher’s willingness to bury breaking balls in the dirt, while catcher familiarity metrics reveal that established batterymates often correlate with superior CSW performance.
Run environments are never static, so park and weather factors must be integrated carefully. We avoid using a single composite park factor during training. Instead, we isolate specific environmental coefficients for strikeouts, home runs, and batting average on balls in play. Weather data including temperature, humidity, wind direction, wind speed, and barometric pressure are converted into clean inputs with a learned interaction against historical fly-ball rates. Altitude and stadium roof status flags round out the environmental profile.
Fatigue and feel do not live in the traditional box score, so we tag rest days and workload closely. The model monitors days since the last outing, pitch counts from the prior start, and stress reps, which we define as pitches thrown at or above 95 mph. Times through the order allowed in recent starts are heavily weighted. If a manager has actively cut a starter's third-time-through exposure, your outs recorded projection should immediately drop.
When modeling earned runs or expected ERA, baseline reliever quality behind the starter is calculated because bullpen performance changes how often inherited runners score. For injury history, the model tracks injured list stints, return-to-throw schedules, and post-IL velocity trajectories. Acute mechanical changes, such as sudden release point drift or rapid pitch mix adjustments, are flagged automatically to signal physical discomfort.
To make these outputs actionable for sports bettors, we set clear and objective targets. The model generates a next-30-day rolling projection for strikeout and walk percentages across expected opponents. It calculates the expected CSW delta for a forward two to four start window and estimates xERA or expected run prevention per start using contact quality, park factors, and defensive metrics. For player props, we predict outs recorded and pitch count distributions. We also calculate durability risk to determine the probability of a shortened outing or a missed turn in the rotation.
Modeling Approaches That Actually Work
Gradient-boosted trees and CatBoost are our preferred choices for tabular accuracy on per-start projections. Heterogeneous features like park factors, weather bins, pitch mix deltas, and catcher tags fit tree-based structures incredibly well. We train individual models for each target outcome, meaning one standalone model handles strikeout percentage, another handles walk percentage, and a third projects xERA.
This architectural choice reduces target interference and simplifies calibration. We apply monotonic constraints to enforce intuitive behavior, ensuring that higher temperatures lift home run risk. Time decay is encoded using exponentially weighted moving features. When running CatBoost, we feed raw categorical features like pitcher ID, catcher ID, and park directly into the algorithm to let it learn complex interactions natively.
For capturing localized patterns within a game, sequence models offer immense value. Temporal convolutional neural networks look at compressed windows of 10 to 20 pitches to catch physical fatigue, tunneling efficiency, and sequencing shifts. Long Short-Term Memory models and Transformers excel at analyzing multi-appearance sequences to recommend pitch mix optimizations.
The input pipeline ingests velocity, spin, movement, release metrics, count leverage, and target zone quadrant for every single pitch. The model then outputs the probability of a called strike, whiff, or specific ball-in-play quality. To keep compute costs manageable and maintain focus on current form, we limit these sequence windows to the last 200 to 400 pitches thrown.
Hierarchical Bayesian models are essential for stabilizing small samples and projecting rookies . By structuring random effects at the pitcher and pitch-type levels alongside season-level pools, the model utilizes partial pooling to prevent wild overreactions. If a newly called-up rookie throws only 25 innings, his projection is pulled toward the league average. However, if he sits in the upper tier for four-seam induced vertical break, the model shrinks his projection toward that specific high-IVB cohort rather than the general league baseline. We fit these partial pooling models with varying intercepts and slopes, then export the posterior means into our primary tree models as hybrid features.
Quantifying uncertainty is just as important as predicting the median outcome. We utilize quantile regression within LightGBM to predict specific percentiles for strikeouts and outs recorded. These ranges are highly valuable for pricing player props and assessing risk.
Converting Bayesian posterior draws into explicit prediction intervals allows us to show clear central boundaries. Showing these ranges helps analytical minds grasp volatility much faster than a single baseline number. Additionally, we run survival models to calculate durability. This approach models time to removal as a hazard function with time-varying covariates like pitch count and lineup strength, factoring in manager tendencies and bullpen rest.
Workflow and Evaluation That Prevents Leakage
To build a predictive model that holds up in production, you must use a time-aware data splitting strategy. Random splitting will ruin your model with data leakage. We use rolling-origin cross-validation, where the algorithm trains on data up to a specific date and validates strictly on a forward window. The window then slides forward across seasons. For daily matchup models, we strictly use information available prior to the first pitch, ensuring no postgame updates creep into the target windows.
Robust backtests must be segmented across distinct rule and environmental eras. We isolate historical data around the 2021 foreign-substance enforcement periods, the 2023 pitch clock implementation, and known ball composition shifts. If a regime shift breaks calibration, the models are refit and reweighted using period-specific baselines for spin and home-run-to-fly-ball ratios.
We evaluate binary and ordinal events using Brier scores and reliability plots. For distributional targets like total strikeout counts, we implement pinball loss for quantiles and continuous ranked probability scores to verify calibration. Subgroup calibration is checked regularly across park types, temperature bins, and handedness splits.
Feature stability is monitored closely by measuring the Population Stability Index for key inputs on a monthly basis. We run SHAP stability tests to compare top drivers across months, ensuring that seasonal features do not dominate the model too early in the spring.
Data schemas are locked down tightly, and features are versioned with hashes to catch silent data drift. To explain why a projection moved, we leverage SHAP values to isolate exactly how much a velocity drop or a tough opponent lineup altered the output. Partial dependence plots show expected strikeout trends as we adjust velocity or slider usage, and we use integrated gradients on sequence models to narrate the exact pitch sequences driving our forecasts.
Practical Build and Tools for Analysts
Building a lean pipeline requires structured data ingestion, thorough cleaning, and disciplined deployment. We pull pitch-by-pitch Statcast data directly from Baseball Savant and supplement it with contextual metrics from FanGraphs. Historical baselines can be verified using curated public repositories on Kaggle.
During the cleaning phase, we normalize pitch type labels across seasons and impute missing weather metrics using nearest-station logs. The feature engineering stage builds the rolling windows, platoon splits, pitch mix deltas, and environmental interaction terms. For sequence models, sliding windows are engineered to target next-pitch outcomes before aggregating everything to a per-start baseline.
Baseline training begins in scikit-learn for sanity checks before moving to LightGBM, XGBoost, or CatBoost for tabular modeling. Hierarchical models are handled in PyMC to generate partial pooling summaries. All configurations, seeds, feature versions, and evaluation windows are logged to prevent tracking errors.
Daily deployment runs early in the afternoon as expected lineups lock. The system exports per-start metrics including strikeout distributions, xERA, and the probability of completing six innings. Automated drift alerts trip if the Population Stability Index exceeds our threshold, and rolling error metrics open an investigation ticket if performance drops over a two-week window. Comprehensive documentation is maintained for every prediction target to outline all core data assumptions.
Use Cases and Limits Worth Flagging
Matchup-level projections excel at finding edges in daily player props and team totals. For strikeout lines, we convert the predicted distribution into a precise win probability against the market price. Outs recorded props combine our durability hazard model with lineup patience metrics.
When projecting earned runs, we look closely at localized weather, such as wind blowing out to left field against a pull-heavy lineup. In pre-series planning, coaches use these models to identify pitch types that offer the highest expected CSW lift against an opponent's structural weaknesses. If an upcoming opponent crushes sweepers, the model calculates the exact value shift of moving to a four-seam and changeup mix.
Workload and pitch-mix optimization are driven by counterfactual analysis. The model can simulate scenarios, showing that a specific increase in slider usage during two-strike counts yields a predictable lift in strikeouts per hundred batters.
If the hazard model detects early physical fatigue , teams can plan a shorter leash while bettors adjust their expectations to the under on total outs. For rookies and players returning from the injured list, wide uncertainty bands are treated as a feature rather than a bug. Shrinkage algorithms prevent overreacting to a single high-velocity performance, and separating post-IL returns into structured baseline stages keeps projections grounded.
Early-season noise and cold weather require distinct handling. April games suppress ball carry, so the model applies heavier priors to the previous season's established talent level until samples stabilize. We reduce the weight of park factors early in the year before the schedule cycles fully.
Umpire zone tendencies are integrated to adjust called strike probabilities at the edges of the plate. When distributing these outputs, we strictly respect data licensing policies, store complete provenance metadata, and only share derived analytical metrics with external partners to maintain complete data compliance.
Bettors use these precise model outputs to drive their daily decision-making process. For strikeout props, they target markets where the calculated edge exceeds a set percentage after validating calibration. When betting team totals, comparing xERA against bullpen fatigue metrics reveals mispriced lines where the market overreacts to short-term hot streaks.
Quantile outputs are leveraged to build ladder betting strategies, while tracking expected value against actual profit and loss helps refine model-calibrated hit rates over a long season. This disciplined approach to managing a bankroll and tracking performance mirrors the cross-sport modeling strategies used by high-volume bettors across various sports.
Step-by-Step: Build a Next-30-Day K% and xERA Projection
To build your own projection system, start by defining the unit of prediction as the individual pitcher-start. This is far more accurate than trying to project a pitcher-month directly, allowing you to aggregate up to 30-day summaries later.
First, pull pitch-by-pitch data for the last three seasons from Baseball Savant, and gather updated hitter splits from FanGraphs. Next, engineer labels for each historical start by calculating strikeout percentage and generating an xERA baseline based on expected weighted on-base percentage against contact quality.
Build out your feature pipeline by calculating rolling 7, 14, and 30-day exponentially weighted moving averages for velocity, spin, induced vertical break, and horizontal break. Compute CSW by pitch type, zone rate, chase rate, and pitch mix shares against the season baseline. Factor in matchup features like the opposing lineup's weighted strikeout rate against specific pitch types, platoon percentages, park factors, and daily weather bins.
Add workload and team features, including rest days, previous pitch counts, and manager hook tendencies. Categorical variables like park IDs and catcher IDs should be optimized for CatBoost handling.
Implement a rolling-origin cross-validation scheme at monthly increments, enforcing strict time cutoffs so no post-start information leaks into the training set. Segment your validation windows across rule changes to ensure long-term stability.
For your baseline modeling, run a LightGBM regression for point estimates alongside a quantile LightGBM regression to lock in the 10th, 50th, and 90th percentiles for strikeouts. Use CatBoost regression to handle the complex categorical interactions required for xERA projections.
To improve small-sample reliability, integrate a hierarchical Bayesian model built in PyMC to establish pitcher and season random effects, then feed those posterior means back into your tree models. For advanced sequencing, you can layer in a temporal convolutional network that predicts pitch-by-pitch called strikes and whiffs, converting those outputs into start-level features.
Diagnose your system by binning predicted strikeout percentages into deciles, plotting predicted versus actual outcomes to check your calibration slope, and measuring xERA performance using Mean Absolute Error across various temperature and park subgroups.
[Decile 1: Pred 10%] -> Actual 11% (Well Calibrated)
[Decile 5: Pred 50%] -> Actual 49% (Well Calibrated)
[Decile 9: Pred 90%] -> Actual 78% (Under-projecting High End - Adjust Priors)
Test your backtest stability across multiple historical seasons. If performance drops in a specific era, add explicit rule-interaction features or ensemble your models across era-specific data splits.
Deploy the system daily by running an automated script that pulls probables and expected lineups, computes features based on the previous day's data, and writes the final distributions to a database. Bettors can map these strikeout and walk probabilities to market lines to find value, while coaches use the resulting SHAP charts to simulate pitch mix adjustments.
Frequently Asked Questions (FAQs)
How do your models handle sudden velocity jumps or drops in a single start?
Our models handle sudden velocity changes through a combination of short-term rolling windows and Bayesian shrinkage. A single start with a major velocity shift will move the 7-day exponentially weighted moving average quickly, but the hierarchical model will pull the overall projection back toward the pitcher's established baseline talent level until the shift is sustained over multiple starts. If a velocity drop is accompanied by a sudden change in release point or pitch mix, the system automatically flags it as a potential injury risk and widens the uncertainty intervals for the next start. This balanced validation technique ensures that users at ATSWins receive projection outputs that accurately calculate true underlying performance changes rather than short-term statistical noise.
Why do you prefer xERA over traditional ERA for projecting future performance?
Traditional ERA is heavily confounded by sequence luck, defensive quality, and bullpen performance regarding inherited runners. Expected ERA isolates the factors a pitcher controls directly, primarily strikeout rate, walk rate, and contact quality allowed, measured via exit velocity and launch angle. By training our models on xERA, we remove the statistical noise inherent in standard earned runs, resulting in a metric that stabilizes much faster and offers significantly higher predictive power for a pitcher's next start.
How does the pitch clock implementation affect historical data training?
The pitch clock introduced in 2023 altered pitcher fatigue cycles, time-between-pitch dynamics, and stolen base environments. To prevent this from ruining our model's calibration, we segment our backtests and apply era-specific adjustments. Features trained on pre-2023 data are reweighted, and we include explicit interaction terms that account for how individual pitchers adjusted to the quicker pace, focusing heavily on velocity retention and command degradation within games under the new rules.
Can your model predict player props other than strikeouts and outs recorded?
Yes, the core distributional outputs can be adapted to project walk props, hits allowed, and earned run totals. By utilizing quantile regression, we generate full probability distributions for a pitcher's performance rather than simple baseline averages. This allows us to calculate the exact probability of a pitcher going over or under any specific market line, including alternate lines and ladder strategies offered by sportsbooks.
How often should the model be retrained to maintain accuracy?
We run batch projections daily using fresh data from the previous night's games, but full model retraining occurs on a monthly schedule. Complete retraining allows the tree algorithms to update their feature weights based on seasonal environmental shifts and league-wide offensive trends. We monitor feature stability and prediction errors daily, and an automatic retraining process is triggered immediately if the Population Stability Index or Mean Absolute Error exceeds our strict operational thresholds.