This has been one of those weeks where I couldn't wait to wake up and get to work. Just last week, Luc Demanche threw me a challenge and asked that I help develop an Oracle APEX application that could be used to predict the winner of the Rugby World Cup 2023 (RWC2023). I couldn't resist the challenge!
The first challenge we had attempting to predict the outcome of a sports match was data. As with any machine learning (ML), obtaining ample high quality training data greatly influences the accuracy of the prediction models. This can sometimes be hard to find as professional sports is a lucrative market, and the highest quality data, such as player statistics, game performance, and match history, are typically collected, managed, and made available through paid channels. Something that I didn't have access to.
Fortunately, my French teammate Louis Moreaux stumbled on a Kaggle dataset that contained historical match data that was scrapped from Wikipedia and aggregated by a contributor. The dataset contained minimal data on different rugby matches played since 1871. At first glance, there wasn't really much to work on, but fortunately, another contributor had presented a notebook that demonstrated how we can enrich the data and help better predict the match outcomes. This included calculating features such as a team's ranking point and form. This was good enough for the proof-of-concept application that we needed to build.
It was only a few months ago that I sat for and passed the Oracle Cloud Infrastructure Data Science 2023 Professional, so this was the perfect opportunity to put training to practice!
Without any hesitation, I provisioned a new Oracle Cloud Infrastructure (OCI) Data Science project, and worked on training the prediction model using a notebook session. The notebook session is Oracle's managed Jupyter environment for data scientists to perform exploratory data analysis, data visualization, data cleaning and preparation, model training, validation, and deployment. I also utilise the powerful Oracle Accelerated Data Science (ADS) Python library that allows me to perform a wide variety of activities including interfacing with the OCI, accessing data stored in an Oracle Autonomous Database, managing and deploying models. Just to name a few.
Once deployed, using the ML model in a web application is performed by invoking its HTTP endpoint. Of course, this being an OCI managed service, calls to the endpoints require signing. This is no different from calling any OCI REST APIs, and relies on creating the necessary OCI Identity and Access Management (IAM) users, groups, policies, and generating the required API keys. Oracle APEX simplifies the signing process, and you will only need to create the required APEX web credentials and reference its static identifier during the web service call. Below is an example PL/SQL snippet:
c_model_url constant varchar2(32767) := 'https://modeldeployment.eu-paris-1.oci.customer-oci.com/ocid1.datasciencemodeldeployment.oc1.eu-paris-1.amaaaaa.../predict';
-- Code to retrieve the required input values based on the teams selected.
key 'neutral' value 0
, key 'world_cup' value 1
, key 'ranking_points_home' value l_ranking_points_home
, key 'ranking_points_away' value l_ranking_points_away
, key 'home_form' value l_home_form
, key 'away_form' value l_away_form
apex_web_service.g_request_headers(1).name := 'Content-Type';
apex_web_service.g_request_headers(1).value := 'application/json';
l_response := apex_web_service.make_rest_request(
p_url => c_model_url
, p_http_method => 'POST'
, p_body => l_request_body
, p_credential_static_id => pkg_oci_common.g_credential_static_id
when json_value(l_response, '$.prediction') = 'home_win' then
Here's the outcome and predictions for the next two matches to be held this week:
Let's see how our ML model fares this week.
Do take the predicted outcomes with a huge pinch of salt. I did not do sufficient justice to the tremendous amount of work required to train a good ML model. In the spirit of rapid application development, we built this demo application within days, but it clearly demonstrates how we can we can train, deploy, and perform inferences using machine learning models with minimal need to manage the underlying infrastructure. I have, unfortunately, left out much of the details as I'm rushing to have this post published before the semi-final games tomorrow. However, you can expect more details on how to do this on your own, so stay tuned!
Banner image generated using Stability AI's Stable Diffusion model.