The Complete Guide to an MLB Bullpen Fatigue Adjustment Model and Strategic Pen Usage

Posted Feb. 23, 2026, 10:31 a.m. by Lesly Shone 1 min read

Bullpens win and lose more games than most fans realize, and fatigue is the hidden lever. Workload, leverage, and travel quietly bend run prevention and late-inning win odds in ways that are subtle but powerful. By turning messy usage data into a clear, actionable signal, bettors can gain an edge that isn’t obvious just from looking at box scores. This guide explains how to build an MLB bullpen fatigue adjustment model, what signals to track, and how to turn reliever stress into team-level adjustments that move win probabilities and totals. The goal is not to produce fancy numbers, but to create outputs that are practical, backtestable, and ready for daily use.

Table of Contents

Building an MLB Bullpen Fatigue Adjustment Model for Bettors and Quants
Problem Framing and Objectives
Data and Signals to Ingest
Model Architecture and Math
Adjustment and Deployment
Validation and Maintenance
Step-by-Step Build Plan
Practical Notes From a Bettor’s Lens
Tips for Better Signals
Example of a Daily Bullpen Sheet
Common Pitfalls and How to Avoid Them
Extending the Model In-Season
Cost-Effective Data Stack
Lightweight Heuristics When You’re Time-Constrained
Feature Importance Sanity Checks
Communication With Traders and Content Teams
Key Resources and References
Final Quick-Hit Checklist
Conclusion
Frequently Asked Questions

Key Takeaways

Bullpen fatigue can move run prevention far more than most bettors recognize. On thin days, such as the third straight day of work for key relievers or after cross-country travel, run prevention can shift by around 0.2 to 0.5 runs per nine innings. Even small adjustments in late innings can affect win expectancy and pricing. The signals that matter are straightforward: days of rest, back-to-back appearances, recent pitch counts, leverage, traffic with inherited runners, small dips in velocity or spin, and travel or altitude effects. Using short rolling windows over one, three, five, or ten days helps quantify acute and lingering fatigue.

A simple fatigue index can combine workload, pitch-type stress, and leverage multipliers. This index can then adjust each reliever’s baseline metrics, such as FIP or expected ERA, applying a capped penalty for fatigue. Translating reliever scores into team-level edges allows bettors to simulate bullpen usage, adjust RA/9 and win expectancy, and ensure the team’s late-inning performance is realistically projected. The edge is practical and actionable, rather than theoretical. ATSWins integrates these insights into daily MLB models and reports, delivering AI-powered picks, player props, betting splits, and profit tracking across multiple sports.

Building an MLB Bullpen Fatigue Adjustment Model for Bettors and Quants

Bullpens are dynamic systems. Each reliever carries a workload, a role, and a leash. Fatigue impacts their ability to prevent runs, and small differences in late innings can translate into significant swings in win probability and betting outcomes. A bullpen fatigue adjustment model measures recent workload and applies a structured penalty to estimate the true run prevention capability of a team’s bullpen on any given day. It combines usage, leverage, travel, and pitch-stress signals to adjust each reliever’s baseline performance and aggregates these into a team-level projection.

On days where the bullpen is thin, due to back-to-back appearances, long extra-inning games, or long-distance travel, reliever RA/9 can swing by 0.2 to 0.5 runs. These small differences matter because leverage amplifies the effect. Closers, setup men, and bulk arms operate in tightly linked roles, and when one arm is limited, the leverage chain shifts. Late-inning win expectancy is fragile under these conditions, so quantifying fatigue is essential for accurate projections.

Problem Framing and Objectives

The purpose of the model is simple. Bettors and quants want to know how tired the bullpen is and what that means in practical terms for expected runs and win probability. Fatigue affects more than velocity. It erodes command, reduces strike rate on secondary pitches, and worsens location on out-of-zone waste pitches. Closers on their third day of consecutive work may still pitch, but with a shorter leash and less ability to put away batters efficiently. Multi-inning relievers who threw 35 to 45 pitches two days ago may be limited to only a handful of batters. Managerial behavior shifts as well, with earlier hooks for struggling pitchers and different usage patterns for bridge arms or middle relievers.

These behavioral and physical effects combine to create a measurable impact on RA/9. On extreme days, such as after cross-country travel or elevated ballparks, the penalty can spike even higher. Modeling these shifts requires both a quantitative and contextual approach. Role, leverage, workload, and travel all factor into realistic adjustments. The result is a team-level projection that more accurately reflects the bullpen’s late-inning potential.

Leverage Chains, Roles, and Late-Inning Win Expectancy

Bullpen performance is not uniform. Closers face the highest leverage situations, and their failures cascade into potential extra innings or blown saves. Setup men share leverage in the eighth inning against the heart of the lineup, bridge arms absorb medium leverage, and bulk relievers cover multi-inning or middle innings, particularly when starters exit early. When one link in the bullpen chain is unavailable, leverage ripples down. An eighth-inning setup arm might shift to the ninth, and a bridge reliever may enter earlier than expected. These adjustments affect late-inning win expectancy more than raw ERA or FIP numbers. By modeling the availability and likely pitch limits of each reliever, the model can reprice win expectancy in a more nuanced and accurate way.

Prioritizing measurable, established signals is critical. Since there is no public off-the-shelf bullpen fatigue model, proxies such as leverage exposure, back-to-backs, pitch count spikes, travel effects, pitch-type stress, velocity and spin changes, and inherited runner sequences become the foundation of the model. These signals feed into a decaying index, layered with role-based context and uncertainty, yielding outputs that are actionable and interpretable.

Data and Signals to Ingest

Rolling workload windows capture acute and lingering stress. Tracking relievers across overlapping windows, such as one, three, five, ten, and even fifteen days, provides insight into both short-term fatigue and seasonal trends. Pitch counts, innings per outing, consecutive days worked, warmup proxies for early entries, and rest days since the last appearance are all critical metrics. Stress should be measured not only by workload totals but also by the context of when and how the pitcher entered the game.

Leverage and stressful sequences further refine the signal. GmLI and pLI metrics measure leverage at entry and per plate appearance. Inherited runners and high-stress plate appearances, such as long battles or runners in scoring position, provide additional granularity. Extra-inning appearances and elevated leverage in late innings carry multipliers in the model.

Pitch-type stress, velocity changes, and spin rate shifts provide physical insight. Small drops in velocity or spin over multiple appearances can indicate fatigue, particularly for breaking balls. Certain pitch types, like sliders and splitters, are more taxing than four-seam fastballs. Declines in command, zone rate, and first-pitch strike rates further signal deteriorating effectiveness.

Contextual features such as travel, altitude, roster moves, and weather are critical. East-west time-zone jumps, Coors or high-altitude games, and heat or humidity extremes affect recovery and performance. Injuries and options moves thin the bullpen, introducing higher uncertainty in role assignment. Proper ETL and schema design prevent data leakage, ensure accurate sequencing, and maintain feature versioning, all of which are essential for reliable model outputs.

Model Architecture and Math

The core of the bullpen fatigue adjustment model is a fatigue index that combines recent workload, pitch-type stress, and leverage to produce a single score for each reliever. This score is decayed over time to give more weight to recent work, and it adjusts for high-stress sequences, travel, and altitude. The fatigue index is unitless but calibrated so that a normal week is around 1.0, stretched arms reach 1.5 to 2.0, and well-rested relievers drop to 0.6 to 0.8. By translating this index into a probabilistic availability metric, the model can assign likely pitch limits and role adjustments for each reliever in a given game.

Leverage multiplies the stress for high-pressure situations, while baserunner pressure adds incremental fatigue per plate appearance. Inherited runners carry slightly more weight if they score, reflecting the mental and physical toll of high-stakes sequences. These stress channels are calibrated from historical patterns, looking at how quickly velocity, command, and spin return to baseline after various workloads.

Availability is modeled probabilistically rather than as a binary yes/no. Logistic regression or hierarchical Bayesian models estimate the likelihood a reliever can pitch today, considering fatigue index, days of rest, back-to-back streaks, pitch counts, and role. Closers tend to maintain higher baseline availability than middle relievers, while bulk arms show a steeper drop-off under the same workload. A second layer estimates “restricted leash” probability, essentially capping pitch counts for partially fatigued relievers, ensuring that team-level projections are realistic.

The adjusted baseline performance combines each pitcher’s inherent skill with a nonlinear penalty from the fatigue index, leverage, and role stress. Pitcher-specific random effects account for differences in fatigue tolerance, and asymmetric penalties reflect the physical demands of different pitch mixes. Hybrid architectures are common: a Bayesian availability model combined with a gradient-boosted tree penalty model allows for probabilistic uncertainty and flexible nonlinear interactions. Kalman-style updates keep pitcher baselines current as the season progresses.

Hierarchical Bayesian models excel early in the season when samples are small and uncertainty is high. Gradient-boosted trees handle messy mid-to-late-season data efficiently and capture complex interactions. Combining both approaches allows daily projections to be stable, interpretable, and responsive to recent workload changes. Simple heuristics remain useful for sanity checks or quick validation, but the hybrid method is the workhorse for production-quality outputs.

Adjustment and Deployment

Once reliever-level projections are calculated, team-level run prevention is derived by aggregating availability probabilities, expected pitch leashes, and adjusted RA/9 or xERA values. Role probabilities define likely assignments for the sixth through ninth innings based on opponent lineup clusters, left-right splits, and matchups. The model simulates inning-by-inning outcomes, sampling expected runs allowed for each reliever and weighting by leverage. Closers and setup arms carry more weight for late-inning win expectancy than middle relievers.

Simulating usage trees allows for realistic adjustments to late-inning win probability. For each plate appearance, the model draws run expectancy based on the adjusted reliever skill, batter quality, handedness, and base-out state. If a reliever exhausts their leash, the model shifts to the next arm in the bullpen chain. Win probability changes, moneyline estimates, and derivative metrics are computed and compared to market lines to identify potential edges. Days with uncertain closer status or thin arms can produce minor differences that swing actionable bets.

Operational outputs include two main artifacts: a quantitative pack with team RA/9 adjustments, intervals, top unavailable relievers, and simulated win expectancy swings, and a human-readable note summarizing fatigue status, likely leashes, and travel effects. Red flags highlight extreme usage, cross-country travel, or multi-day appearances. The outputs are designed to feed both research and daily decision-making without overloading the user with raw math.

Key tools for deployment include PyMC for hierarchical Bayesian modeling, XGBoost or LightGBM for gradient-boosted tree penalty heads, DuckDB or Parquet for local ETL and feature storage, BigQuery for scalable operations, and Prefect or Airflow for pipeline management. Versioning tools such as DVC or LakeFS maintain reproducibility, while MLflow tracks experiments and model performance. Validation notebooks allow visualization of the fatigue index versus velocity or spin deltas and enable partial dependence plots for penalty sensitivity.

Validation and Maintenance

Backtesting is critical. Out-of-sample evaluation across multiple seasons helps quantify model reliability. Micro-level validation measures reliever availability prediction using AUC, precision-recall, Brier scores, and correlations between fatigue index and subsequent velocity or command. Macro-level validation assesses game-level log loss, late-inning win expectancy calibration, and alignment with observed closing-line movement. Ablation studies quantify the contribution of individual features like travel, leverage, or pitch-type stress weights, ensuring that the model captures meaningful patterns rather than noise.

Seasonal drift requires monitoring. The All-Star break temporarily inflates the “fresh pen” signal, September call-ups change role probabilities, and postseason leverage differs from regular-season norms. Simple guardrails, such as capping team penalties or winsorizing extreme pitch counts, prevent overreaction to outlier events. Automated scraping of beat reports and transaction feeds ensures that availability signals remain current, and snapshotting features allows for reproducible auditing of decisions.

Step-by-Step Build Plan

Building the model involves a structured ten-step approach. First, assemble stable identifiers and schema tables for pitchers, games, appearances, pitches, venues, travel legs, and roles. Second, ETL core logs from Retrosheet and Statcast, aligning timestamps, computing leverage metrics, and parsing pitch-level data. Third, engineer base usage features, including rolling pitch counts, innings, back-to-back indicators, spike flags, and warmup proxies. Fourth, layer leverage and sequence stress metrics, such as inherited runners, high-pitch innings, and RISP plate appearances. Fifth, compute pitch-stress adjustments using velocity and spin deltas and assign stress weights by pitch type.

Sixth, construct the fatigue index and availability labels, summing decayed workloads with travel and altitude adjustments. Seventh, train availability and penalty models, using logistic regression, hierarchical Bayesian, and gradient-boosted trees with monotonic constraints. Eighth, simulate innings to compute team adjustments for RA/9 and win expectancy deltas. Ninth, validate through backtests, ablation studies, and control comparisons to baseline ERA or xFIP models. Tenth, deploy in production with morning updates, pre-game refreshes, and human-readable notes highlighting red-flag conditions.

Practical considerations include daily simulation of innings six through nine, red flags for extreme usage, and logging outcomes for iterative improvements. This structured approach ensures that the model remains actionable, interpretable, and reliable for bettors seeking to exploit subtle but real bullpen edges.

Practical Notes from a Bettor’s Lens

Understanding bullpen fatigue is only valuable if it changes how a bettor interacts with lines and totals. Margins matter late in games. A +0.25 RA/9 bullpen penalty may appear small, but it can shift a full-game moneyline by five to ten basis points, or move live odds in the seventh through ninth inning by fifteen to thirty. Thin bullpen signals should inform position sizing and timing, particularly in close contests. Starter edges dominate early innings, but pen risk becomes disproportionately important as leverage accumulates. Same-game props, including “outs recorded” or “save” lines for closers and bulk arms, respond directly to availability probabilities. Weather and altitude amplify or mitigate these effects. Hot, high-altitude games, like Coors Field with a fatigued pen, produce nonlinear risk exposure. Pricing adjustments should account for these stacked variables, ensuring that bets reflect real late-inning volatility rather than intuition.

Correlating fatigue signals with totals requires careful thought. A thin bullpen increases run variance, which can widen totals, but it also elevates blowup potential. For live betting, integrating these signals into dynamic decision-making is crucial. Awareness of the interplay between pitch counts, consecutive-day appearances, and travel helps identify subtle pricing edges. Tracking the top relievers’ workload and using red flags for multi-day appearances or cross-country travel ensures that bettors position themselves advantageously when markets lag behind real fatigue conditions. These practices keep bets informed, disciplined, and aligned with measurable signals rather than narratives.

Tips for Better Signals (Small but Useful)

Small adjustments can improve model sensitivity without overcomplicating calculations. Backup catcher usage occasionally alters pitcher sequences, especially on heavy bullpen days. Umpire tendencies interact with fatigued pitchers—tight zones punish relievers who cannot locate secondary pitches, amplifying fatigue penalties. Late-inning defensive substitutions influence run prevention, particularly after the seventh inning, and should be incorporated into projections. Managerial patterns are crucial: some managers avoid using relievers on a third consecutive day, while others routinely stretch arms. Encoding these tendencies as priors improves availability predictions. Layering these small, context-driven features allows for more precise, actionable estimates, particularly during stretches of heavy games or complex scheduling.

Example of a Daily Bullpen Sheet (What Your Ops Team Wants)

A clear, concise daily output communicates actionable fatigue information. For example, consider the Mariners. The team-level bullpen penalty might be +0.28 RA/9, with a 95% prediction interval of +0.12 to +0.43. The closer could be 33% available with a projected leash of 12–15 pitches and a fatigue index of 1.9, showing a velocity drop of −0.6 mph over the last two appearances. The setup right-hander could be 85% available, likely to pitch the eighth inning versus the three-four-five slots, with a leash of 20–22 pitches and a fatigue index of 1.1. Bulk left-handers might be restricted after a 41-pitch outing two days ago, capped at 15–18 pitches. Travel and weather considerations—such as a west-to-east trip or a heat index of 88—further refine expected leverage. Late-inning win expectancy impacts can be displayed, for instance, showing an 18 basis point reduction from seventh to ninth versus a baseline pen. These outputs guide research and trading teams, giving them clarity without overwhelming them with math.

Common Pitfalls and How to Avoid Them

Overfitting is a common trap, particularly to noisy velocity dips. Regression to the pitcher’s mean and requiring consistency across multiple appearances prevents overreaction. Treating all sliders or breaking balls as equally stressful ignores differences in pitch shape and release characteristics. Platoon pockets are critical; availability matters most in situations with the highest matchup leverage, so usage trees should respect left-right clustering. Extreme fatigue index days should be capped, or the model risks exaggerating penalties and inflating variance. Finally, availability is not performance. A reliever may be used at 60% capacity, so separating the probability of pitching from expected skill ensures realistic projections.

Extending the Model In-Season

The season provides opportunities to refine the model. Pitcher-specific decay rates and stress sensitivities can be learned over time, allowing for more personalized fatigue predictions. Role assignments are dynamic, with Bayesian updates helping identify hidden setup men or emerging bulk arms. When the market overreacts to fatigue signals without new information, error bars can be widened or bet sizes reduced to maintain disciplined execution. These extensions improve the model’s responsiveness and robustness while preserving interpretability.

Cost-Effective Data Stack

The data stack should balance speed, scalability, and reproducibility. DuckDB and Parquet provide low-friction local development and fast windowed queries. Historical event logs can be cached alongside derived features like fatigue index, leverage, and role priors. Continuous integration checks prevent schema drift from new Statcast fields, while unit tests verify critical transformations such as pitch counts, back-to-back usage, and time zone adjustments. Maintaining a single “today” snapshot ensures reproducibility and consistent reference points for daily projections.

Lightweight Heuristics When You’re Time-Constrained

Even without a full simulation, simplified heuristics can guide quick decisions. Weighted pitch counts over three to ten days can estimate fatigue, with higher weights for recent outings. Simple red flags—like a third consecutive day of work or cross-country travel—can adjust RA/9 by small increments, capped at plausible extremes. For live betting, these numbers can be halved depending on the game state and lineup position. While these heuristics do not match full simulations, they provide reliable signals when time or computational resources are limited.

Feature Importance Sanity Checks

Regularly verifying that features behave as expected prevents model drift. The fatigue index should remain the top driver of penalties, with leverage and spike usage contributing significantly. Travel should influence but not dominate predictions, and velocity dips should provide guidance without overwhelming other metrics. Role indicators must consistently appear among the most important features; if not, role inference mechanisms may require recalibration. These sanity checks ensure model outputs remain interpretable and trustworthy.

Communication with Traders and Content Teams

Clear, consistent communication reinforces model impact. Using standardized terminology such as “thin pen,” “restricted bulk,” or “closer 30/70” ensures shared understanding. Outputs should be concise, typically one to two lines of quant plus one to two lines of explanation. Archiving daily notes alongside outcomes supports weekly reviews and small case studies. Where relevant, linking to public references, like the FanGraphs Leverage Index, provides context for non-technical teams.

Key Resources and References

Primary resources include FanGraphs for leverage definitions, Retrosheet for pitch- and game-level logs, and Baseball Savant for velocity and spin tracking. Probabilistic modeling and hierarchical Bayes are implemented with PyMC, while gradient-boosted trees use XGBoost or LightGBM. For ATSWins members, daily fatigue tags are visible on the MLB games board, with results tracked on recent outcomes pages and additional guidance in the ATSWins MLB strategy PDF.

Final Quick-Hit Checklist (Use Daily)

A daily checklist maintains operational discipline. Update rosters and role priors, refresh travel and altitude context, compute fatigue indices, and cap extreme values. Estimate availability and leashes for top arms, adjust baseline skill metrics, and simulate innings six through nine against the opponent lineup. Export adjusted win probabilities and notes to the board, flagging third-day appearances, multi-day spikes, cross-country travel, and extra-inning situations. After games, log actual usage and compare to availability probabilities. Weekly reviews of ablation studies and drift maintain model stability and prevent overcorrection.

Conclusion

Bullpen fatigue profoundly affects late-inning run prevention and game outcomes. By tracking recent pitch counts, days of rest, and role availability, then simulating innings with probabilistic adjustments, bettors can extract meaningful edges from otherwise subtle signals. Simple, disciplined inputs consistently outperform intuition. ATSWins integrates this approach into AI-driven projections, player props, betting splits, and profit tracking across multiple sports, offering members clear, actionable insights without unnecessary noise. Fatigue-aware modeling turns complex workload data into practical betting advantages.

Frequently Asked Questions (FAQs)

What is an MLB bullpen fatigue adjustment model?

It’s a way to measure how tired a team’s relievers are and adjust run prevention and win odds accordingly. The model blends recent pitch counts, days of rest, back-to-backs, leverage faced, and travel or altitude context to score each reliever’s readiness. This score is then translated into a small penalty (or no penalty) on expected runs allowed in the late innings. It’s not fancy, but it’s practical and actionable.

Why does an MLB bullpen fatigue adjustment model matter for daily betting lines and totals?

Late innings swing games more than most casual bettors realize. When two or three high-leverage arms are fatigued, the bullpen can see a 0.2–0.5 RA/9 bump, which nudges win probabilities and affects totals. That might move a moneyline from -115 to -125 or shift a total from 8.0 to 8.5. The effect compounds over a season, making fatigue-aware models a valuable edge.

How can I build a quick MLB bullpen fatigue adjustment model using public data?

Start with the last seven to ten days, recording pitch counts, innings pitched, and whether relievers worked back-to-back. Add simple stress flags like high leverage faced, long single-game outings, and travel across time zones. Create a fatigue index with decay weights for recent days. Set availability rules, flagging relievers for third straight days or high pitch counts. Convert that into a team-level penalty on expected RA/9 or xERA. Update daily for best effect—rough is better than ignoring fatigue entirely.

What signals should I track inside an MLB bullpen fatigue adjustment model during busy stretches?

Key signals include days of rest and back-to-back appearances, pitch count spikes, and multi-inning workloads. Leverage faced—even using a simple proxy—is essential, as are small velocity or spin dips. Emergency warmups, quick turnarounds after long games, travel quirks, altitude changes, and role depth all affect late-inning run prevention. When closers and setup men are limited, penalties accelerate faster.

How does ATSWins use an MLB bullpen fatigue adjustment model, and what do members see?

ATSwins.ai integrates bullpen fatigue into its AI-powered sports platform, tracking workload, leverage, and travel to refine late-inning run expectations and win probabilities. Members see these adjustments rolled into daily projections, with notes on probable availability and team-level penalties. This affects sides, totals, and player props. Free and paid plans provide clear context so bettors can make smarter, data-driven decisions instead of guessing or relying on hype.

AI Football Betting Tools - How They Make Winning Easier

Bet Like a Pro in 2026 with Sports AI Prediction Tools

Sources

The Game Changer: How AI Is Transforming The World Of Sports Gambling

AI and the Bookie: How Artificial Intelligence is Helping Transform Sports Betting

How to Use AI for Sports Betting