Research
Working Papers
-
Toggle ItemInvestigating the Spatial Component of Serving Strategies in Tennis
We conducted an experiment with the Brigham Young University Men's Tennis Team to investigate the spatial element of serving strategies in tennis. Serve data, including precise spatial coordinates, were collected for 12 players, with known targets for each serve. Leveraging this data, we estimate player-specific optimal aim locations, accounting for factors such as first vs second serve, speed, and the distribution of their serves around the intended targets, termed "execution error". Our experiment also provides insights on the interplay between conscious beliefs and on-court performance. Our preliminary results reveal apparent differences between players' subconscious behavior and their explicit articulation of optimal aiming locations.
Nathan Sandholtz, Gilbert Fellingham, Stephanie Kovalchik, Ron Hager, and Peter Tea
In preparation -
Toggle ItemUncertainty Quantification in Inverse Optimization
In applied optimization tasks, human decisions often exhibit ``noise” around theoretical optimums. Noise may arise from measurement error, model error, or human error in decision-making. The presence of noise adds significant complexity when considering the inverse optimization problem, which is the problem of inferring unknown parameters of an optimization model such that a collection of observed decisions are rendered as optimal with respect to the inferred model. While a number of papers have proposed solutions for the noisy inverse optimization problem in terms of point estimation, to our knowledge no papers have directly addressed the uncertainty in their estimates. Despite the fact that uncertainty is inherent in the context of noisy inverse optimization, the quantification of this uncertainty has been treated very little in the literature. This paper is the first that directly addresses this topic. In this paper, we study optimization problems with linear objective functions and convex feasible regions where the objective coefficient vectors are unknown. We propose a parametric approach to estimate the unknown coefficients vectors and construct credible regions around the estimates. This parametric approach leverages the Bayesian paradigm, which provides a natural framework to make inferences on the unknown optimization model parameters while simultaneously enforcing optimality constraints.
**Timothy C. Y. Chan, Nathan Sandholtz, and Nasrin Yousefi
In preparation** Authors are listed alphabetically.
Referred Publications
-
Toggle ItemLearning Risk Preferences in Markov Decision Processes: an Application to the Fourth Down Decision in the National Football League
For decades, National Football League (NFL) coaches' observed fourth down decisions have been largely inconsistent with prescriptions based on statistical models. In this paper, we develop a framework to explain this discrepancy using an inverse optimization approach. We model the fourth down decision and the subsequent sequence of plays in a game as a Markov decision process (MDP), the dynamics of which we estimate from NFL play-by-play data from the 2014 through 2022 seasons. We assume that coaches' observed decisions are optimal but that the risk preferences governing their decisions are unknown. This yields an inverse decision problem for which the optimality criterion, or risk measure, of the MDP is the estimand. Using the quantile function to parameterize risk, we estimate which quantile-optimal policy yields the coaches' observed decisions as minimally suboptimal. In general, we find that coaches' fourth-down behavior is consistent with optimizing low quantiles of the next-state value distribution, which corresponds to conservative risk preferences. We also find that coaches exhibit higher risk tolerances when making decisions in the opponent's half of the field as opposed to their own half, and that league average fourth down risk tolerances have increased over time.
Nathan Sandholtz, Lucas Wu, Martin Puterman, and Timothy C. Y. Chan
To appear in the Annals of Applied Statistics, 2024 -
Toggle ItemMiss It Like Messi: Extracting Value from Off-Target Shots in Soccer
Measuring soccer shooting skill is a challenging analytics problem due to the scarcity and highly contextual nature of scoring events. The introduction of more advanced data surrounding soccer shots has given rise to model-based metrics which better cope with these challenges. Specifically, metrics such as expected goals added, goals above expectation, and post-shot expected goals all use advanced data to offer an improvement over the classical conversion rate. However, all metrics developed to date assign a value of zero to off-target shots, which account for almost two-thirds of all shots, since these shots have no probability of scoring. We posit that there is non-negligible shooting skill signal contained in the trajectories of off-target shots and propose two shooting skill metrics that incorporate the signal contained in off-target shots. Specifically, we develop a player-specific generative model for shot trajectories based on a mixture of truncated bivariate Gaussian distributions. We use this generative model to compute metrics that allow us to attach non-zero value to off-target shots. We demonstrate that our proposed metrics are more stable than current state-of-the-art metrics and have increased predictive power.
Ethan Baron, Nathan Sandholtz, Devin Pleuler, and Timothy Chan
Journal of Quantitative Analysis in Sports, 2024 -
Toggle ItemMoneyball for Murderball: Using Analytics to Construct Lineups in Wheelchair Rugby
Motivated by the problem of lineup optimization in wheelchair rugby (WCR), this case study covers descriptive, predictive, and prescriptive analytics. The case is presented from the perspective of a new assistant coach of Canada’s national WCR team, who has been tasked by the head coach to use various analytics techniques to improve their lineups. Whereas the data and actors are fictitious, they are based on real data and discussions with the national team coach and sport scientists. To solve the case, students must conduct data analysis, regression modeling, and optimization modeling. These three steps are tightly linked, as the data analysis is needed to prepare the data for regression, and the regression outputs are used as parameters in the optimization. As such, students build proficiency in developing an end-to-end solution approach for a complex real-world problem. The primary learning objectives for the students are to understand the differences between descriptive, predictive, and prescriptive analytics, to build proficiency in implementing the models using appropriate software, and to identify how these techniques can be applied to solve problems in other sports or other application areas.
**Timothy C. Y. Chan, Craig Fernandes, Albert Loa, Nathan Sandholtz
INFORMS Transactions on Education, 2023[Case] [Case article]
** Authors are listed alphabetically. -
Toggle ItemInverse Bayesian Optimization: Learning Human Acquisition Functions in an Exploration vs Exploitation Search Task
This paper introduces a probabilistic framework to estimate parameters of an acquisition function given observed human behavior that can be modeled as a collection of sample paths from a Bayesian optimization procedure. The methodology involves defining a likelihood on observed human behavior from an optimization task, where the likelihood is parameterized by a Bayesian optimization subroutine governed by an unknown acquisition function. This structure enables us to make inference on a subject’s acquisition function while allowing their behavior to deviate around the solution to the Bayesian optimization subroutine. To test our methods, we designed a sequential optimization task which forced subjects to balance exploration and exploitation in search of an invisible target location. Applying our proposed methods to the resulting data, we find that many subjects tend to exhibit exploration preferences beyond that of standard acquisition functions to capture. Guided by the model discrepancies, we augment the candidate acquisition functions to yield a superior fit to the human behavior in this task.
Nathan Sandholtz, Yohsuke Miyamoto, Luke Bornn, and Maurice Smith
Bayesian Analysis, 2023 -
Toggle ItemMeasuring Spatial Allocative Efficiency in Basketball
Every shot in basketball has an opportunity cost; one player’s shot eliminates all potential opportunities from their teammates for that play. For this reason, player-shot efficiency should ultimately be considered relative to the lineup. This aspect of efficiency—the optimal way to allocate shots within a lineup—is the focus of our paper. Allocative efficiency should be considered in a spatial context since the distribution of shot attempts within a lineup is highly dependent on court location. We propose a new metric for spatial allocative efficiency by comparing a player’s field goal percentage (FG%) to their field goal attempt (FGA) rate in context of both their four teammates on the court and the spatial distribution of their shots. Leveraging publicly available data provided by the National Basketball Association (NBA), we estimate player FG% at every location in the offensive half court using a Bayesian hierarchical model. Then, by ordering a lineup’s estimated FG%s and pairing these rankings with the lineup’s empirical FGA rate rankings, we detect areas where the lineup exhibits inefficient shot allocation. Lastly, we analyze the impact that sub-optimal shot allocation has on a team’s overall offensive potential, demonstrating that inefficient shot allocation correlates with reduced scoring.
Nathan Sandholtz, Jacob Mortensen, and Luke Bornn
Journal of Quantitative Analysis in Sports, 2020 -
Toggle ItemMarkov Decision Processes with Dynamic Transition Probabilities: An Analysis of Shooting Strategies In Basketball
In this paper we model basketball plays as episodes from team-specific nonstationary Markov decision processes (MDPs) with shot clock dependent transition probabilities. Bayesian hierarchical models are employed in the modeling and parametrization of the transition probabilities to borrow strength across players and through time. To enable computational feasibility, we combine lineup-specific MDPs into team-average MDPs using a novel transition weighting scheme. Specifically, we derive the dynamics of the team-average process such that the expected transition count for an arbitrary state-pair is equal to the weighted sum of the expected counts of the separate lineup-specific MDPs. We then utilize these nonstationary MDPs in the creation of a basketball play simulator with uncertainty propagated via posterior samples of the model components. After calibration, we simulate seasons both on-policy and under altered policies and explore the net changes in efficiency and production under the alternate policies. Additionally, we discuss the game-theoretic ramifications of testing alternative decision policies.
Nathan Sandholtz and Luke Bornn
Annals of Applied Statistics, 2020 -
Toggle ItemModeling sea‐level processes on the U.S. Atlantic Coast
One of the major concerns engendered by a warming climate are changing sea levels and their lasting effects on coastal populations, infrastructures, and natural habitats. Sea levels are now monitored by satellites, but long-term records are only available at discrete locations along the coasts. Sea levels and sea-level processes must be better understood at the local level to best inform real-world adaptation decisions. We propose a statistical model that facilitates the characterization of known sea-level processes, which jointly govern the observed sea level along the United States Atlantic Coast. Our model not only incorporates long-term sea level rise and seasonal cycles but also accurately accounts for residual spatiotemporal processes. By combining a spatially varying coefficient modeling approach with spatiotemporal factor analysis methods in a Bayesian framework, the method represents the contribution of each of these processes and accounts for corresponding dependencies and uncertainties in a coherent way. Additionally, the model provides a consistent way to estimate these processes and sea level values at unmonitored locations along the coast. We show the outcome of the proposed model using thirty years of sea level data from 38 stations along the Atlantic (east) Coast of the United States. Among other results, our method estimates the rate of sea level rise to range from roughly 1 mm/year in the northern and southern regions of the Atlantic coast to 5.4 mm/year in the middle region.
Candace Berrett, William F. Christensen, Stephan R. Sain, Nathan Sandholtz, David W. Coats, Claudia Tebaldi, and Hedibert F. Lopes
Environmetrics, 2020[Env]
Other Publications
-
Toggle ItemHow to Get Away with Murderball - An End-to-End Analytics Case Study to Construct Lineups in Wheelchair Rugby
Albert Loa, Craig Fernandes, Nathan Sandholtz, and Timothy C. Y. Chan
OR/MS Today, 2023 -
Toggle ItemChuckers: Measuring Lineup Shot Distribution Optimality Using Spatial Allocative Efficiency Models
Allocative efficiency is fundamentally a spatial problem—the distribution of shot attempts within a lineup is highly dependent on court location. Despite the importance of spatial context, there are very few allocative efficiency analyses which have explicitly accounted for this critical factor. Our unique contribution with this work is a method to analyze allocative efficiency spatially. The main idea behind our approach is to compare a player’s field goal percentage (FG%) to his field goal attempt rate in context of his four teammates in any given lineup. To this end, we build Bayesian hierarchical models to estimate player field goal percentages (FG%) and field goal attempt (FGA) rates at every location on the floor using publicly available NBA shot location data. We next determine the rank of each player’s FG% and FGA relative to his four teammates at every location in the half-court. Finally, by pairing each player’s lineup-specific FGA rankings with their corresponding FG% rankings, we can explore the relationship between FG% rank and FGA rank and detect areas where the lineup exhibits inefficient allocation of shots. We further analyze the impact that deviations from optimality have on a lineup’s overall efficiency. We develop a measure called lineup points lost (LPL), which we define as the difference in expected points between the observed allocation of shot attempts and the optimal redistribution. Using these metrics, we can quantify how many points are being lost through inefficient spatial lineup shot allocation, visualize where they are being lost, and identify which players are responsible.
Nathan Sandholtz, Jacob Mortensen, and Luke Bornn
Proceedings of the 13th MIT Sloan Sports Analytics Conference, 2019 -
Toggle ItemReplaying the NBA
The Cleveland Cavaliers took 329 contested mid-range jump shots with over 10 seconds remaining on the shot clock during the 2015-2016 regular season. What could’ve happened if they had taken these shots 20%lessfrequently over the season? We attempt to answer these types of questions by modeling plays from the 2015-2016 NBA regular season as episodes from team-speci ic Markov decision processes. Using STATS SportVU optical tracking data, we model the transition probabilities as a tensor indexed in time in order to simulate plays with dynamic probabilities across the shot clock. To culminate, we simulate seasons under altered shot policies of interest within the basketball analytics community and explore the net changes in ef iciency and production under these alternative shot policies.
Nathan Sandholtz and Luke Bornn
Proceedings of the 12th MIT Sloan Sports Analytics Conference, 2018 -
Toggle ItemHate crime victimization, 2003-2011
Presents annual counts and rates of hate crime victimization that occurred from 2003 through 2011, using data from the National Crime Victimization Survey (NCVS). The report examines changes over time in hate crime victimizations, including the type of bias that motivated the hate crime, the type of crime, whether the incident was reported to police, and characteristics of the incident, offender, and victim. In addition, the report compares characteristics of hate crime and nonhate crime victimization. NCVS estimates are supplemented by data from official police reports of hate crime from the FBI's Uniform Crime Reporting (UCR) Hate Crime Statistics Program.
Nathan Sandholtz, Lynn Langton, and Mike Planty
US Department of Justice, Office of Justice Programs, Bureau of Justice Statistics, 2013[BJS]