Predictive modeling has long been an integral part of the sports betting industry. For decades, sportsbooks and service providers have used predictive models when determining the odds of an upcoming sports match or when trying to equalize action across betting propositions.
However, with the advent of AI-driven predictive modeling, things have changed in a big way for both sportsbooks and the data-driven bettor. The latest AI-driven models are capable of sorting through massive amounts of data faster than ever before, and techniques such as reinforcement learning (RL) mean that models are capable of targeting profitability over accuracy. Both of these factors have led to an improvement in the end models, as well as the ability to update a prediction in near real-time as information changes
I’ve personally built many of these models over the last 20 years and have seen firsthand the improvements in accuracy, which have accelerated significantly in recent years. These improvements however are being seen on both sides of the ledger, leading to an arms race between the bettors and the operators. Moreover, I can confidently say that the underlying advances in hardware and software libraries spell huge changes in the construction and use of predictive modeling, regardless of which industry sector you’re in.
Understanding predictive models
Like any predictive model, a reinforcement learning (RL) model analyzes historical data and identifies patterns to project a future outcome. Yet with the recent developments in both hardware infrastructure and libraries, the new era of models can both process more data than ever before and do it more efficiently. In the context of sports betting, this represents a significant leap in capabilities, as the model can adapt its forecasts based on last-minute changes in the dynamics of a sports match. For example, changes in lineups or weather conditions. The current industry view is that, eventually, these models will be able to update their odds predictions efficiently during the game based on events, completely replacing the frequency-driven models of the recent past.
A significant difference between the current generation of models and their predecessors is their utilization of the many libraries that underpin large language models (LLMs), such as OpenAI’s ChatGPT. These libraries, combined with a parallel increase in hardware availability, have essentially laid the groundwork for the recent improvements in predictive modeling. While still a nascent technology, LLMs and their underlying libraries are already having a significant impact on the construction of predictive models. For instance, they can enhance accuracy by analyzing larger volumes of text and numeric data, automate processes to reduce costs, and significantly accelerate processing speeds.
Yet out of all these advances, the most tantalizing for me is the use of RL algorithms. Consider how machine learning (ML) algorithms operate, which learn from data that is explicitly labeled (called supervised learning) – that is the features and the desired outcome are identified in advance and the model is trained on how to achieve the latter from the former. By contrast, reinforcement learning develops through trial and error; with an “agent” learning to make decisions (e.g., bet small, bet large, bet favorite, bet underdog, don’t bet) by performing actions to achieve a goal – in this context, being profitable. The agent learns through consequences by being rewarded (in proportion to the winning bet) if it wins and penalized (in proportion to the losing bet) if it loses. Ultimately, determining the optimal path for maximizing its rewards over time.
As with LLMs, RL is a rapidly advancing field with a lot of research being conducted in both academic and commercial settings to improve RL algorithms and discover new applications. RL is already in use in the fields of robotics, finance, and self-driving vehicles, and I expect it to play an outsized role in solving the data science challenges of the future.
Challenges and surprises
Anyone who sets out to build an RL model for the first time is sure to encounter a few surprises that show how different these models are from their ML predecessors. For instance, with traditional models that utilized ML algorithms, you’d start with a bunch of data points such as player data, team ratings, and team statistics, with the goal of predicting the winning margin (e.g., the Colts to beat the Patriots by 3). You’d then use that winning margin alongside other techniques like push charts to generate betting probabilities and edges. You might in this case validate the model by testing it against past historic odds before deciding whether to use the model for betting or for setting the lines.
By contrast, in reinforcement learning, you need the historic odds data at the time you train the model, as each “step” of the model is based on evaluating its profit and loss on the betting markets, and rewarding or punishing the model based on its success. In addition, the decision of the model on whether and how to train the next iteration is dependent on the results of the previous iteration.
Adding to that, as we move beyond individual bettors – whether hobby or professional – and into institutional settings, it is worth mentioning that there is a substantial risk of RL models becoming black boxes, with predictive reasoning that is incomprehensible to a non-technical user. While many traditional ML approaches produce some kind of relative weights, showing the value of each field provided as an input, this is much less common and accessible in many current RL libraries. As such, data scientists and development and engineering teams may need to consider how they can otherwise validate their decisions and communicate the rationale for decision-making for public or corporate consumption. This will likely follow the same path and challenges as previous generations of organizational change.
Final thoughts
Predictive modeling has come a long way in the past few years, and I fully expect further advances within the near future. We’ve already seen how fast and accurate these models can be. In the wagering industry, the next frontier is to apply these recent advancements to in-play and micro-betting markets, requiring another step forward in both data processing and data acquisition. Work is already underway on these kinds of developments and we should expect to see new products based on this kind of modeling and analysis within the next 12 months.
About the Author
Dr. Darryl Woodford PhD, is the CTO at sports analytics company Cipher Sports Technology Group. Dr. Woodford is a data scientist and Python developer, with a particular interest in sports betting.
Sign up for the free insideBIGDATA newsletter.
Join us on Twitter: https://twitter.com/InsideBigData1
Join us on LinkedIn: https://www.linkedin.com/company/insidebigdata/
Join us on Facebook: https://www.facebook.com/insideBIGDATANOW
Speak Your Mind