Completed SEEMA Modeling Pipeline Code
Click to expand code
Code
## SEMMA-style multinomial model for UFC [Men's & Women's] fights
## 20% training / 80% testing split
## Includes caret confusion Matrix summary & coefficient visualizations
## Install packages if needed (run once, then comment out)
# install.packages(c("dplyr", "readr", "ggplot2", "nnet",
# "broom", "reshape2", "GGally", "caret"))
library (tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.0 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library (nnet) # multinom()
library (broom) # tidy()
library (reshape2) # melt, dcast (required for wide-to-long)
##
## Attaching package: 'reshape2'
##
## The following object is masked from 'package:tidyr':
##
## smiths
library (GGally) # ggpairs
library (caret) # confusion Matrix
## Loading required package: lattice
##
## Attaching package: 'caret'
##
## The following object is masked from 'package:purrr':
##
## lift
## ==========================================================
## Automatic ggplot saver with sequential figure numbering
## ==========================================================
.fig_counter <- 0
save_last_plot <- function (name,
folder = "figures" ,
width = 8 ,
height = 6 ,
dpi = 300 ){
dir.create (folder, showWarnings = FALSE , recursive = TRUE )
.fig_counter <<- .fig_counter + 1
fig_number <- sprintf ("Fig%02d" , .fig_counter)
file_path <- paste0 (
folder, "/" ,
fig_number, "_" ,
name,
".png"
)
ggsave (
filename = file_path,
plot = ggplot2:: last_plot (),
width = width,
height = height,
dpi = dpi
)
message ("Saved: " , file_path)
}
## S — SAMPLE
## 1. Load data (make sure the CSV is in your working directory)
ufc <- read.csv ("UFC Complete Dataset.csv" , stringsAsFactors = FALSE )
#table(ufc)
#summary(ufc)
dim (ufc)
## [1] 7439 95
## 2. Filter to Women's fights using the gender/Gender column
gender_col <- if ("gender" %in% names (ufc)) "gender" else "Gender"
ufc_women <- ufc %>%
filter (grepl ("Women" , .data[[gender_col]], ignore.case = TRUE ))
dim (ufc_women)
## [1] 733 95
## E — EXPLORE
## Quick summary and distribution of fight methods
print (summary (ufc_women))
## event_name r_fighter b_fighter winner
## Length:733 Length:733 Length:733 Length:733
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## weight_class is_title_bout gender method
## Length:733 Min. :0.00000 Length:733 Length:733
## Class :character 1st Qu.:0.00000 Class :character Class :character
## Mode :character Median :0.00000 Mode :character Mode :character
## Mean :0.07503
## 3rd Qu.:0.00000
## Max. :1.00000
##
## finish_round total_rounds time_sec referee
## Min. :1.000 Min. :3.000 Min. : 14.0 Length:733
## 1st Qu.:2.000 1st Qu.:3.000 1st Qu.:222.0 Class :character
## Median :3.000 Median :3.000 Median :300.0 Mode :character
## Mean :2.641 Mean :3.202 Mean :253.5
## 3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.:300.0
## Max. :5.000 Max. :5.000 Max. :300.0
##
## r_kd r_sig_str r_sig_str_att r_sig_str_acc
## Min. :0.000 Min. : 0.00 Min. : 0.0 Min. :0.0000
## 1st Qu.:0.000 1st Qu.: 23.00 1st Qu.: 51.0 1st Qu.:0.3700
## Median :0.000 Median : 45.00 Median :102.0 Median :0.4600
## Mean :0.105 Mean : 52.03 Mean :116.3 Mean :0.4688
## 3rd Qu.:0.000 3rd Qu.: 72.00 3rd Qu.:162.0 3rd Qu.:0.5600
## Max. :4.000 Max. :225.00 Max. :456.0 Max. :1.0000
##
## r_str r_str_att r_str_acc r_td
## Min. : 0.00 Min. : 0.0 Min. :0.0000 Min. : 0.00
## 1st Qu.: 43.00 1st Qu.: 77.0 1st Qu.:0.4400 1st Qu.: 0.00
## Median : 74.00 Median :143.0 Median :0.5500 Median : 1.00
## Mean : 79.68 Mean :148.4 Mean :0.5534 Mean : 1.16
## 3rd Qu.:107.00 3rd Qu.:207.0 3rd Qu.:0.6600 3rd Qu.: 2.00
## Max. :326.00 Max. :524.0 Max. :1.0000 Max. :10.00
##
## r_td_att r_td_acc r_sub_att r_rev
## Min. : 0.000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.: 0.000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median : 2.000 Median :0.1800 Median :0.0000 Median :0.0000
## Mean : 3.082 Mean :0.3185 Mean :0.4256 Mean :0.1392
## 3rd Qu.: 5.000 3rd Qu.:0.5000 3rd Qu.:1.0000 3rd Qu.:0.0000
## Max. :18.000 Max. :1.0000 Max. :6.0000 Max. :3.0000
##
## r_ctrl_sec r_wins_total r_losses_total r_age
## Min. : 0.0 Min. : 1.00 Min. : 0.000 Min. :22.00
## 1st Qu.: 17.0 1st Qu.: 9.00 1st Qu.: 4.000 1st Qu.:31.00
## Median : 111.0 Median :13.00 Median : 6.000 Median :35.00
## Mean : 181.2 Mean :13.11 Mean : 6.056 Mean :34.32
## 3rd Qu.: 301.0 3rd Qu.:16.00 3rd Qu.: 8.000 3rd Qu.:37.00
## Max. :1110.0 Max. :25.00 Max. :21.000 Max. :46.00
##
## r_height r_weight r_reach r_stance
## Min. :152.4 Min. :52.16 Min. :147.3 Length:733
## 1st Qu.:162.6 1st Qu.:52.16 1st Qu.:162.6 Class :character
## Median :165.1 Median :56.70 Median :167.6 Mode :character
## Mean :165.7 Mean :56.34 Mean :166.9
## 3rd Qu.:170.2 3rd Qu.:61.23 3rd Qu.:172.7
## Max. :182.9 Max. :65.77 Max. :188.0
## NA's :12
## r_SLpM_total r_SApM_total r_sig_str_acc_total r_td_acc_total
## Min. :1.160 Min. :0.45 Min. :0.1900 Min. :0.0000
## 1st Qu.:3.160 1st Qu.:2.98 1st Qu.:0.4000 1st Qu.:0.2700
## Median :3.890 Median :3.87 Median :0.4500 Median :0.4000
## Mean :4.043 Mean :3.89 Mean :0.4479 Mean :0.3873
## 3rd Qu.:4.670 3rd Qu.:4.75 3rd Qu.:0.5000 3rd Qu.:0.5100
## Max. :8.410 Max. :8.43 Max. :0.7100 Max. :1.0000
##
## r_str_def_total r_td_def_total r_sub_avg r_td_avg
## Min. :0.2700 Min. :0.0000 Min. :0.0000 Min. :0.00
## 1st Qu.:0.5100 1st Qu.:0.4700 1st Qu.:0.1000 1st Qu.:0.65
## Median :0.5500 Median :0.5900 Median :0.4000 Median :1.22
## Mean :0.5446 Mean :0.5786 Mean :0.5447 Mean :1.47
## 3rd Qu.:0.5900 3rd Qu.:0.7100 3rd Qu.:0.7000 3rd Qu.:2.02
## Max. :0.8000 Max. :1.0000 Max. :5.4000 Max. :7.50
##
## b_kd b_sig_str b_sig_str_att b_sig_str_acc
## Min. :0.00000 Min. : 0.00 Min. : 0.0 Min. :0.0000
## 1st Qu.:0.00000 1st Qu.: 21.00 1st Qu.: 43.0 1st Qu.:0.3600
## Median :0.00000 Median : 42.00 Median : 97.0 Median :0.4500
## Mean :0.07094 Mean : 47.35 Mean :108.3 Mean :0.4504
## 3rd Qu.:0.00000 3rd Qu.: 68.00 3rd Qu.:158.0 3rd Qu.:0.5400
## Max. :3.00000 Max. :231.00 Max. :403.0 Max. :0.9200
##
## b_str b_str_att b_str_acc b_td
## Min. : 0.00 Min. : 1.0 Min. :0.0000 Min. : 0.0000
## 1st Qu.: 35.00 1st Qu.: 66.0 1st Qu.:0.4200 1st Qu.: 0.0000
## Median : 68.00 Median :132.0 Median :0.5300 Median : 0.0000
## Mean : 70.08 Mean :134.9 Mean :0.5338 Mean : 0.9482
## 3rd Qu.: 97.00 3rd Qu.:192.0 3rd Qu.:0.6400 3rd Qu.: 1.0000
## Max. :242.00 Max. :433.0 Max. :1.0000 Max. :10.0000
##
## b_td_att b_td_acc b_sub_att b_rev
## Min. : 0.000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.: 0.000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median : 1.000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean : 2.476 Mean :0.2857 Mean :0.2892 Mean :0.1528
## 3rd Qu.: 4.000 3rd Qu.:0.5000 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :19.000 Max. :1.0000 Max. :6.0000 Max. :3.0000
##
## b_ctrl_sec b_wins_total b_losses_total b_age b_height
## Min. : 0.0 Min. : 1.00 Min. : 0.000 Min. :22 Min. :152.4
## 1st Qu.: 16.0 1st Qu.: 9.00 1st Qu.: 3.000 1st Qu.:31 1st Qu.:162.6
## Median : 78.0 Median :11.00 Median : 6.000 Median :34 Median :165.1
## Mean :141.2 Mean :11.93 Mean : 5.937 Mean :34 Mean :165.4
## 3rd Qu.:212.0 3rd Qu.:15.00 3rd Qu.: 8.000 3rd Qu.:37 3rd Qu.:170.2
## Max. :806.0 Max. :25.00 Max. :21.000 Max. :46 Max. :185.4
##
## b_weight b_reach b_stance b_SLpM_total
## Min. :52.16 Min. :147.3 Length:733 Min. :0.000
## 1st Qu.:52.16 1st Qu.:162.6 Class :character 1st Qu.:3.010
## Median :56.70 Median :167.6 Mode :character Median :3.890
## Mean :56.23 Mean :166.8 Mean :3.928
## 3rd Qu.:61.23 3rd Qu.:172.7 3rd Qu.:4.630
## Max. :65.77 Max. :188.0 Max. :8.410
## NA's :29
## b_SApM_total b_sig_str_acc_total b_td_acc_total b_str_def_total
## Min. :1.180 Min. :0.0000 Min. :0.0000 Min. :0.2800
## 1st Qu.:3.190 1st Qu.:0.4000 1st Qu.:0.2600 1st Qu.:0.5000
## Median :3.950 Median :0.4500 Median :0.3700 Median :0.5500
## Mean :4.073 Mean :0.4418 Mean :0.3764 Mean :0.5418
## 3rd Qu.:4.890 3rd Qu.:0.4900 3rd Qu.:0.5000 3rd Qu.:0.5800
## Max. :8.430 Max. :0.6300 Max. :1.0000 Max. :0.8200
##
## b_td_def_total b_sub_avg b_td_avg kd_diff
## Min. :0.0000 Min. :0.0000 Min. :0.000 Min. :-3.00000
## 1st Qu.:0.5000 1st Qu.:0.1000 1st Qu.:0.520 1st Qu.: 0.00000
## Median :0.6400 Median :0.3000 Median :0.950 Median : 0.00000
## Mean :0.5971 Mean :0.4302 Mean :1.246 Mean : 0.03411
## 3rd Qu.:0.7300 3rd Qu.:0.7000 3rd Qu.:1.830 3rd Qu.: 0.00000
## Max. :1.0000 Max. :4.8000 Max. :6.440 Max. : 4.00000
##
## sig_str_diff sig_str_att_diff sig_str_acc_diff str_diff
## Min. :-131.000 Min. :-188.000 Min. :-0.85000 Min. :-195.000
## 1st Qu.: -13.000 1st Qu.: -20.000 1st Qu.:-0.11000 1st Qu.: -19.000
## Median : 4.000 Median : 6.000 Median : 0.02000 Median : 6.000
## Mean : 4.679 Mean : 7.978 Mean : 0.01843 Mean : 9.606
## 3rd Qu.: 21.000 3rd Qu.: 34.000 3rd Qu.: 0.13000 3rd Qu.: 35.000
## Max. : 142.000 Max. : 180.000 Max. : 0.80000 Max. : 300.000
##
## str_att_diff str_acc_diff td_diff td_att_diff
## Min. :-259.00 Min. :-0.88000 Min. :-10.0000 Min. :-15.0000
## 1st Qu.: -26.00 1st Qu.:-0.09000 1st Qu.: -1.0000 1st Qu.: -2.0000
## Median : 10.00 Median : 0.02000 Median : 0.0000 Median : 0.0000
## Mean : 13.59 Mean : 0.01954 Mean : 0.2115 Mean : 0.6057
## 3rd Qu.: 57.00 3rd Qu.: 0.12000 3rd Qu.: 1.0000 3rd Qu.: 3.0000
## Max. : 346.00 Max. : 0.89000 Max. : 8.0000 Max. : 18.0000
##
## td_acc_diff sub_att_diff rev_diff ctrl_sec_diff
## Min. :-1.0000 Min. :-6.0000 Min. :-2.00000 Min. :-806.00
## 1st Qu.:-0.3300 1st Qu.: 0.0000 1st Qu.: 0.00000 1st Qu.:-106.00
## Median : 0.0000 Median : 0.0000 Median : 0.00000 Median : 9.00
## Mean : 0.0328 Mean : 0.1364 Mean :-0.01364 Mean : 40.02
## 3rd Qu.: 0.4600 3rd Qu.: 0.0000 3rd Qu.: 0.00000 3rd Qu.: 195.00
## Max. : 1.0000 Max. : 6.0000 Max. : 3.00000 Max. :1011.00
##
## wins_total_diff losses_total_diff age_diff height_diff
## Min. :-20.000 Min. :-18.0000 Min. :-15.0000 Min. :-15.2400
## 1st Qu.: -3.000 1st Qu.: -3.0000 1st Qu.: -3.0000 1st Qu.: -2.5400
## Median : 1.000 Median : 0.0000 Median : 0.0000 Median : 0.0000
## Mean : 1.177 Mean : 0.1187 Mean : 0.3206 Mean : 0.2668
## 3rd Qu.: 5.000 3rd Qu.: 3.0000 3rd Qu.: 4.0000 3rd Qu.: 5.0800
## Max. : 19.000 Max. : 19.0000 Max. : 17.0000 Max. : 20.3200
##
## weight_diff reach_diff SLpM_total_diff SApM_total_diff
## Min. :-9.0700 Min. :-25.40000 Min. :-5.0700 Min. :-4.920
## 1st Qu.: 0.0000 1st Qu.: -5.08000 1st Qu.:-1.0200 1st Qu.:-1.280
## Median : 0.0000 Median : 0.00000 Median : 0.1100 Median :-0.140
## Mean : 0.1113 Mean : 0.04405 Mean : 0.1154 Mean :-0.183
## 3rd Qu.: 0.0000 3rd Qu.: 5.08000 3rd Qu.: 1.3300 3rd Qu.: 0.980
## Max. : 9.0700 Max. : 25.40000 Max. : 6.1200 Max. : 5.070
## NA's :41
## sig_str_acc_total_diff td_acc_total_diff str_def_total_diff
## Min. :-0.270000 Min. :-0.8500 Min. :-0.340000
## 1st Qu.:-0.060000 1st Qu.:-0.1600 1st Qu.:-0.060000
## Median : 0.000000 Median : 0.0100 Median : 0.000000
## Mean : 0.006126 Mean : 0.0109 Mean : 0.002824
## 3rd Qu.: 0.070000 3rd Qu.: 0.1700 3rd Qu.: 0.060000
## Max. : 0.400000 Max. : 0.7900 Max. : 0.430000
##
## td_def_total_diff sub_avg_diff td_avg_diff
## Min. :-0.78000 Min. :-4.1000 Min. :-6.440
## 1st Qu.:-0.19000 1st Qu.:-0.3000 1st Qu.:-0.640
## Median :-0.03000 Median : 0.0000 Median : 0.110
## Mean :-0.01857 Mean : 0.1146 Mean : 0.224
## 3rd Qu.: 0.15000 3rd Qu.: 0.4000 3rd Qu.: 1.030
## Max. : 0.84000 Max. : 5.4000 Max. : 5.960
##
print (table (ufc_women$ method))
##
## Decision - Majority Decision - Split Decision - Unanimous
## 5 97 355
## DQ KO/TKO Submission
## 1 126 143
## TKO - Doctor's Stoppage
## 6
## 3. Combined distribution plot (Faceted by Gender)
## We filter the original dataset to include both groups for comparison
ufc_eda <- ufc %>%
filter (grepl ("Men|Women" , .data[[gender_col]], ignore.case = TRUE )) %>%
mutate (gender_label = ifelse (grepl ("Women" , .data[[gender_col]], ignore.case = TRUE ), "Women" , "Men" ))
ggplot (ufc_eda, aes (x = method)) +
geom_bar () +
facet_wrap (~ gender_label, scales = "free_y" ) +
theme_minimal () +
theme (axis.text.x = element_text (angle = 45 , hjust = 1 )) +
labs (
title = "Distribution of Fight Methods by Gender" ,
x = "Method" ,
y = "Count"
)
Code
save_last_plot ("combined_method_distribution" )
## Saved: figures/Fig01_combined_method_distribution.png
## M — MODIFY
## 1. Create winner-based variables (collapse red/blue stats into winner_* variables)
women_winner_stats <- ufc_women %>%
mutate (
winner_kd = ifelse (winner == "Red" , r_kd, b_kd),
winner_str_acc = ifelse (winner == "Red" , r_str_acc, b_str_acc),
winner_sub_att = ifelse (winner == "Red" , r_sub_att, b_sub_att),
winner_td_acc = ifelse (winner == "Red" , r_td_acc, b_td_acc),
winner_ctrl_sec = ifelse (winner == "Red" , r_ctrl_sec, b_ctrl_sec),
winner_height = ifelse (winner == "Red" , r_height, b_height),
winner_reach = ifelse (winner == "Red" , r_reach, b_reach),
method = as.factor (method)
) %>%
select (
method,
winner_kd,
winner_str_acc,
winner_sub_att,
winner_td_acc,
winner_ctrl_sec,
winner_height,
winner_reach
)
## Inspect the resulting winner-centric dataset
print (head (women_winner_stats))
## method winner_kd winner_str_acc winner_sub_att winner_td_acc
## 1 Decision - Unanimous 0 0.60 0 0.33
## 2 Decision - Unanimous 0 0.43 0 0.00
## 3 Submission 0 0.45 1 1.00
## 4 Decision - Unanimous 0 0.57 0 1.00
## 5 Submission 0 0.60 4 0.00
## 6 Decision - Unanimous 0 0.65 0 1.00
## winner_ctrl_sec winner_height winner_reach
## 1 419 165.10 165.10
## 2 1 175.26 172.72
## 3 148 180.34 182.88
## 4 422 172.72 172.72
## 5 0 160.02 172.72
## 6 285 165.10 165.10
print (summary (women_winner_stats))
## method winner_kd winner_str_acc
## Decision - Majority : 5 Min. :0.0000 Min. :0.0000
## Decision - Split : 97 1st Qu.:0.0000 1st Qu.:0.4800
## Decision - Unanimous :355 Median :0.0000 Median :0.5900
## DQ : 1 Mean :0.1501 Mean :0.5833
## KO/TKO :126 3rd Qu.:0.0000 3rd Qu.:0.6900
## Submission :143 Max. :4.0000 Max. :1.0000
## TKO - Doctor's Stoppage: 6
## winner_sub_att winner_td_acc winner_ctrl_sec winner_height
## Min. :0.0000 Min. :0.0000 Min. : 0.0 Min. :152.4
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.: 43.0 1st Qu.:162.6
## Median :0.0000 Median :0.3500 Median : 155.0 Median :165.1
## Mean :0.5389 Mean :0.4044 Mean : 214.3 Mean :165.6
## 3rd Qu.:1.0000 3rd Qu.:0.7500 3rd Qu.: 345.0 3rd Qu.:170.2
## Max. :6.0000 Max. :1.0000 Max. :1110.0 Max. :182.9
##
## winner_reach
## Min. :147.3
## 1st Qu.:162.6
## Median :167.6
## Mean :166.8
## 3rd Qu.:172.7
## Max. :188.0
## NA's :1
summary (women_winner_stats)
## method winner_kd winner_str_acc
## Decision - Majority : 5 Min. :0.0000 Min. :0.0000
## Decision - Split : 97 1st Qu.:0.0000 1st Qu.:0.4800
## Decision - Unanimous :355 Median :0.0000 Median :0.5900
## DQ : 1 Mean :0.1501 Mean :0.5833
## KO/TKO :126 3rd Qu.:0.0000 3rd Qu.:0.6900
## Submission :143 Max. :4.0000 Max. :1.0000
## TKO - Doctor's Stoppage: 6
## winner_sub_att winner_td_acc winner_ctrl_sec winner_height
## Min. :0.0000 Min. :0.0000 Min. : 0.0 Min. :152.4
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.: 43.0 1st Qu.:162.6
## Median :0.0000 Median :0.3500 Median : 155.0 Median :165.1
## Mean :0.5389 Mean :0.4044 Mean : 214.3 Mean :165.6
## 3rd Qu.:1.0000 3rd Qu.:0.7500 3rd Qu.: 345.0 3rd Qu.:170.2
## Max. :6.0000 Max. :1.0000 Max. :1110.0 Max. :182.9
##
## winner_reach
## Min. :147.3
## 1st Qu.:162.6
## Median :167.6
## Mean :166.8
## 3rd Qu.:172.7
## Max. :188.0
## NA's :1
## Optional: quick ggpairs (without color)
ggpairs (women_winner_stats)
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.
Code
## 2. Train/test split: 20% training, 80% testing
set.seed (123 ) # for reproducibility
n <- nrow (women_winner_stats)
train_idx <- sample (1 : n, size = 0.8 * n) # 80% train
train <- women_winner_stats[train_idx, ]
test <- women_winner_stats[- train_idx, ]
cat ("Training rows:" , nrow (train), " \n " )
## Training rows: 586
cat ("Testing rows:" , nrow (test), " \n " )
## Testing rows: 147
## M — MODEL (Multinomial logistic regression)
multi_model <- multinom (
method ~ winner_kd +
winner_str_acc +
winner_sub_att +
winner_td_acc +
winner_ctrl_sec +
winner_height +
winner_reach,
data = train
)
## # weights: 63 (48 variable)
## initial value 1138.357437
## iter 10 value 791.816043
## iter 20 value 691.360916
## iter 30 value 601.141418
## iter 40 value 590.941971
## iter 50 value 588.665045
## iter 60 value 588.159074
## iter 70 value 587.941281
## iter 80 value 587.676662
## iter 90 value 587.578259
## iter 100 value 587.537776
## final value 587.537776
## stopped after 100 iterations
## Model summary
print (summary (multi_model))
## Call:
## multinom(formula = method ~ winner_kd + winner_str_acc + winner_sub_att +
## winner_td_acc + winner_ctrl_sec + winner_height + winner_reach,
## data = train)
##
## Coefficients:
## (Intercept) winner_kd winner_str_acc winner_sub_att
## Decision - Split -10.810442 9.1968003 -11.429288 -0.05546283
## Decision - Unanimous -10.178406 9.9905084 -8.790440 -0.05521339
## DQ 43.072897 -0.2934672 -2.952837 -10.46524225
## KO/TKO -26.416576 11.5303800 -3.409988 -0.28914968
## Submission -10.086723 10.1895925 -3.825616 1.40340678
## TKO - Doctor's Stoppage -7.203071 -0.8728219 -4.378087 -7.45488378
## winner_td_acc winner_ctrl_sec winner_height
## Decision - Split 5.738603 -0.005949618 0.001388355
## Decision - Unanimous 6.419818 -0.005326686 0.010784047
## DQ 12.051987 -0.023491252 -0.066270102
## KO/TKO 7.206487 -0.009744188 0.048697320
## Submission 7.791795 -0.013907071 0.014759894
## TKO - Doctor's Stoppage 3.820045 -0.020135351 -0.274595485
## winner_reach
## Decision - Split 0.12763754
## Decision - Unanimous 0.11121966
## DQ -0.20341467
## KO/TKO 0.14739474
## Submission 0.08257412
## TKO - Doctor's Stoppage 0.34892939
##
## Std. Errors:
## (Intercept) winner_kd winner_str_acc winner_sub_att
## Decision - Split 0.0446168670 4.056484e-01 0.885446548 6.717131e-01
## Decision - Unanimous 0.0310705029 2.224954e-01 0.588455407 6.475801e-01
## DQ 0.0008828157 2.736016e-07 0.006352148 1.958821e-05
## KO/TKO 0.0415063913 2.233966e-01 0.785422963 6.755776e-01
## Submission 0.0203056561 3.180239e-01 0.790873375 6.644776e-01
## TKO - Doctor's Stoppage 0.0009030024 6.646877e-07 0.004885686 4.453873e-04
## winner_td_acc winner_ctrl_sec winner_height
## Decision - Split 0.319491149 0.002565939 0.1513142
## Decision - Unanimous 0.204851355 0.002452143 0.1488587
## DQ 0.017874368 0.020807052 0.3614652
## KO/TKO 0.265815812 0.002547132 0.1512648
## Submission 0.272928926 0.002645282 0.1519418
## TKO - Doctor's Stoppage 0.002698326 0.009600189 0.1969202
## winner_reach
## Decision - Split 0.1523249
## Decision - Unanimous 0.1498989
## DQ 0.3620816
## KO/TKO 0.1522388
## Submission 0.1529580
## TKO - Doctor's Stoppage 0.1959371
##
## Residual Deviance: 1175.076
## AIC: 1271.076
## Coefficients, z-values, and p-values
coef_mat <- summary (multi_model)$ coefficients
se_mat <- summary (multi_model)$ standard.errors
z_values <- coef_mat / se_mat
p_values <- 2 * (1 - pnorm (abs (z_values)))
# Check which rows are which methods
levels (train$ method)
## [1] "Decision - Majority" "Decision - Split"
## [3] "Decision - Unanimous" "DQ"
## [5] "KO/TKO" "Submission"
## [7] "TKO - Doctor's Stoppage"
rownames (p_values) # should correspond to non-baseline methods
## [1] "Decision - Split" "Decision - Unanimous"
## [3] "DQ" "KO/TKO"
## [5] "Submission" "TKO - Doctor's Stoppage"
# 1) List all (method, predictor) combinations with p < .05
sig_idx <- which (p_values < 0.05 , arr.ind = TRUE )
sig_effects <- data.frame (
method = rownames (p_values)[sig_idx[, "row" ]],
predictor = colnames (p_values)[sig_idx[, "col" ]],
p_value = p_values[sig_idx]
)
sig_effects <- sig_effects[order (sig_effects$ predictor, sig_effects$ method), ]
sig_effects
## method predictor p_value
## 1 Decision - Split (Intercept) 0.000000e+00
## 2 Decision - Unanimous (Intercept) 0.000000e+00
## 3 DQ (Intercept) 0.000000e+00
## 4 KO/TKO (Intercept) 0.000000e+00
## 5 Submission (Intercept) 0.000000e+00
## 6 TKO - Doctor's Stoppage (Intercept) 0.000000e+00
## 28 Decision - Split winner_ctrl_sec 2.041184e-02
## 29 Decision - Unanimous winner_ctrl_sec 2.983626e-02
## 30 KO/TKO winner_ctrl_sec 1.304789e-04
## 31 Submission winner_ctrl_sec 1.461773e-07
## 32 TKO - Doctor's Stoppage winner_ctrl_sec 3.595896e-02
## 7 Decision - Split winner_kd 0.000000e+00
## 8 Decision - Unanimous winner_kd 0.000000e+00
## 9 DQ winner_kd 0.000000e+00
## 10 KO/TKO winner_kd 0.000000e+00
## 11 Submission winner_kd 0.000000e+00
## 12 TKO - Doctor's Stoppage winner_kd 0.000000e+00
## 13 Decision - Split winner_str_acc 0.000000e+00
## 14 Decision - Unanimous winner_str_acc 0.000000e+00
## 15 DQ winner_str_acc 0.000000e+00
## 16 KO/TKO winner_str_acc 1.414521e-05
## 17 Submission winner_str_acc 1.316777e-06
## 18 TKO - Doctor's Stoppage winner_str_acc 0.000000e+00
## 19 DQ winner_sub_att 0.000000e+00
## 20 Submission winner_sub_att 3.468256e-02
## 21 TKO - Doctor's Stoppage winner_sub_att 0.000000e+00
## 22 Decision - Split winner_td_acc 0.000000e+00
## 23 Decision - Unanimous winner_td_acc 0.000000e+00
## 24 DQ winner_td_acc 0.000000e+00
## 25 KO/TKO winner_td_acc 0.000000e+00
## 26 Submission winner_td_acc 0.000000e+00
## 27 TKO - Doctor's Stoppage winner_td_acc 0.000000e+00
cat (" \n Z-values: \n " )
##
## Z-values:
print (z_values)
## (Intercept) winner_kd winner_str_acc winner_sub_att
## Decision - Split -242.2950 2.267185e+01 -12.907937 -8.256922e-02
## Decision - Unanimous -327.5906 4.490209e+01 -14.938159 -8.526110e-02
## DQ 48790.3613 -1.072608e+06 -464.856471 -5.342622e+05
## KO/TKO -636.4460 5.161394e+01 -4.341595 -4.280037e-01
## Submission -496.7445 3.204034e+01 -4.837205 2.112045e+00
## TKO - Doctor's Stoppage -7976.8021 -1.313131e+06 -896.104892 -1.673798e+04
## winner_td_acc winner_ctrl_sec winner_height
## Decision - Split 17.96170 -2.318690 0.009175313
## Decision - Unanimous 31.33891 -2.172257 0.072444846
## DQ 674.26085 -1.129004 -0.183337437
## KO/TKO 27.11083 -3.825553 0.321934315
## Submission 28.54881 -5.257311 0.097141788
## TKO - Doctor's Stoppage 1415.70926 -2.097391 -1.394450649
## winner_reach
## Decision - Split 0.8379294
## Decision - Unanimous 0.7419647
## DQ -0.5617924
## KO/TKO 0.9681812
## Submission 0.5398483
## TKO - Doctor's Stoppage 1.7808233
cat (" \n P-values: \n " )
##
## P-values:
print (p_values)
## (Intercept) winner_kd winner_str_acc winner_sub_att
## Decision - Split 0 0 0.000000e+00 0.93419408
## Decision - Unanimous 0 0 0.000000e+00 0.93205382
## DQ 0 0 0.000000e+00 0.00000000
## KO/TKO 0 0 1.414521e-05 0.66864845
## Submission 0 0 1.316777e-06 0.03468256
## TKO - Doctor's Stoppage 0 0 0.000000e+00 0.00000000
## winner_td_acc winner_ctrl_sec winner_height
## Decision - Split 0 2.041184e-02 0.9926793
## Decision - Unanimous 0 2.983626e-02 0.9422479
## DQ 0 2.588960e-01 0.8545333
## KO/TKO 0 1.304789e-04 0.7475025
## Submission 0 1.461773e-07 0.9226138
## TKO - Doctor's Stoppage 0 3.595896e-02 0.1631816
## winner_reach
## Decision - Split 0.40207036
## Decision - Unanimous 0.45810870
## DQ 0.57425746
## KO/TKO 0.33295391
## Submission 0.58930168
## TKO - Doctor's Stoppage 0.07494132
## A — ASSESS
## Predictions on test set
pred_test <- predict (multi_model, newdata = test, type = "class" )
## FIX: ensure factor levels of predictions match the actuals
pred_test <- factor (pred_test, levels = levels (test$ method))
## Confusion matrix (raw table)
conf_mat <- table (Predicted = pred_test, Actual = test$ method)
cat (" \n Confusion matrix (Test set): \n " )
##
## Confusion matrix (Test set):
print (conf_mat)
## Actual
## Predicted Decision - Majority Decision - Split
## Decision - Majority 0 0
## Decision - Split 0 0
## Decision - Unanimous 1 19
## DQ 0 0
## KO/TKO 0 0
## Submission 0 0
## TKO - Doctor's Stoppage 0 0
## Actual
## Predicted Decision - Unanimous DQ KO/TKO Submission
## Decision - Majority 1 0 0 0
## Decision - Split 0 0 0 1
## Decision - Unanimous 61 0 13 14
## DQ 0 0 0 0
## KO/TKO 4 0 9 2
## Submission 2 0 1 17
## TKO - Doctor's Stoppage 0 0 0 0
## Actual
## Predicted TKO - Doctor's Stoppage
## Decision - Majority 0
## Decision - Split 0
## Decision - Unanimous 1
## DQ 1
## KO/TKO 0
## Submission 0
## TKO - Doctor's Stoppage 0
## Accuracy (direct comparison)
accuracy <- mean (pred_test == test$ method, na.rm = TRUE )
cat (" \n Accuracy on test set (direct):" , accuracy, " \n " )
##
## Accuracy on test set (direct): 0.5918367
## Accuracy from confusion matrix
accuracy_from_confmat <- sum (diag (conf_mat)) / sum (conf_mat)
cat ("Accuracy from confusion matrix:" , accuracy_from_confmat, " \n " )
## Accuracy from confusion matrix: 0.5918367
cat ("Difference:" , accuracy - accuracy_from_confmat, " \n " )
## Difference: 0
## caret confusionMatrix summary
cm <- confusionMatrix (
data = pred_test, # predictions
reference = test$ method # true labels
)
cat (" \n === caret::confusionMatrix Summary === \n " )
##
## === caret::confusionMatrix Summary ===
print (cm)
## Confusion Matrix and Statistics
##
## Reference
## Prediction Decision - Majority Decision - Split
## Decision - Majority 0 0
## Decision - Split 0 0
## Decision - Unanimous 1 19
## DQ 0 0
## KO/TKO 0 0
## Submission 0 0
## TKO - Doctor's Stoppage 0 0
## Reference
## Prediction Decision - Unanimous DQ KO/TKO Submission
## Decision - Majority 1 0 0 0
## Decision - Split 0 0 0 1
## Decision - Unanimous 61 0 13 14
## DQ 0 0 0 0
## KO/TKO 4 0 9 2
## Submission 2 0 1 17
## TKO - Doctor's Stoppage 0 0 0 0
## Reference
## Prediction TKO - Doctor's Stoppage
## Decision - Majority 0
## Decision - Split 0
## Decision - Unanimous 1
## DQ 1
## KO/TKO 0
## Submission 0
## TKO - Doctor's Stoppage 0
##
## Overall Statistics
##
## Accuracy : 0.5918
## 95% CI : (0.5078, 0.6721)
## No Information Rate : 0.4626
## P-Value [Acc > NIR] : 0.001116
##
## Kappa : 0.3294
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: Decision - Majority Class: Decision - Split
## Sensitivity 0.000000 0.000000
## Specificity 0.993151 0.992188
## Pos Pred Value 0.000000 0.000000
## Neg Pred Value 0.993151 0.869863
## Prevalence 0.006803 0.129252
## Detection Rate 0.000000 0.000000
## Detection Prevalence 0.006803 0.006803
## Balanced Accuracy 0.496575 0.496094
## Class: Decision - Unanimous Class: DQ Class: KO/TKO
## Sensitivity 0.8971 NA 0.39130
## Specificity 0.3924 0.993197 0.95161
## Pos Pred Value 0.5596 NA 0.60000
## Neg Pred Value 0.8158 NA 0.89394
## Prevalence 0.4626 0.000000 0.15646
## Detection Rate 0.4150 0.000000 0.06122
## Detection Prevalence 0.7415 0.006803 0.10204
## Balanced Accuracy 0.6447 NA 0.67146
## Class: Submission Class: TKO - Doctor's Stoppage
## Sensitivity 0.5000 0.00000
## Specificity 0.9735 1.00000
## Pos Pred Value 0.8500 NaN
## Neg Pred Value 0.8661 0.98639
## Prevalence 0.2313 0.01361
## Detection Rate 0.1156 0.00000
## Detection Prevalence 0.1361 0.00000
## Balanced Accuracy 0.7367 0.50000
## Overall stats (Accuracy, Kappa, etc.)
cm$ overall
## Accuracy Kappa AccuracyLower AccuracyUpper AccuracyNull
## 0.591836735 0.329379562 0.507798491 0.672096274 0.462585034
## AccuracyPValue McnemarPValue
## 0.001115703 NaN
## Per-class stats (Sensitivity, Specificity, Precision, F1, etc.)
cm_byclass_df <- as.data.frame (cm$ byClass)
cm_byclass_df # nice per-class table
## Sensitivity Specificity Pos Pred Value
## Class: Decision - Majority 0.0000000 0.9931507 0.000000
## Class: Decision - Split 0.0000000 0.9921875 0.000000
## Class: Decision - Unanimous 0.8970588 0.3924051 0.559633
## Class: DQ NA 0.9931973 NA
## Class: KO/TKO 0.3913043 0.9516129 0.600000
## Class: Submission 0.5000000 0.9734513 0.850000
## Class: TKO - Doctor's Stoppage 0.0000000 1.0000000 NaN
## Neg Pred Value Precision Recall F1
## Class: Decision - Majority 0.9931507 0.000000 0.0000000 NaN
## Class: Decision - Split 0.8698630 0.000000 0.0000000 NaN
## Class: Decision - Unanimous 0.8157895 0.559633 0.8970588 0.6892655
## Class: DQ NA 0.000000 NA NA
## Class: KO/TKO 0.8939394 0.600000 0.3913043 0.4736842
## Class: Submission 0.8661417 0.850000 0.5000000 0.6296296
## Class: TKO - Doctor's Stoppage 0.9863946 NA 0.0000000 NA
## Prevalence Detection Rate Detection Prevalence
## Class: Decision - Majority 0.006802721 0.00000000 0.006802721
## Class: Decision - Split 0.129251701 0.00000000 0.006802721
## Class: Decision - Unanimous 0.462585034 0.41496599 0.741496599
## Class: DQ 0.000000000 0.00000000 0.006802721
## Class: KO/TKO 0.156462585 0.06122449 0.102040816
## Class: Submission 0.231292517 0.11564626 0.136054422
## Class: TKO - Doctor's Stoppage 0.013605442 0.00000000 0.000000000
## Balanced Accuracy
## Class: Decision - Majority 0.4965753
## Class: Decision - Split 0.4960938
## Class: Decision - Unanimous 0.6447319
## Class: DQ NA
## Class: KO/TKO 0.6714586
## Class: Submission 0.7367257
## Class: TKO - Doctor's Stoppage 0.5000000
## McFadden R2 (manual)
null_model <- multinom (method ~ 1 , data = train, trace = FALSE )
LL_full <- as.numeric (logLik (multi_model))
LL_null <- as.numeric (logLik (null_model))
McFadden_R2 <- 1 - (LL_full / LL_null)
cat (" \n McFadden R2:" , McFadden_R2, " \n " )
##
## McFadden R2: 0.2378017
## Optional: visual comparison of predicted vs actual
comparison_df <- data.frame (
actual = test$ method,
predicted = pred_test
)
ggplot (comparison_df, aes (x = actual, fill = predicted)) +
geom_bar (position = "dodge" ) +
theme_minimal () +
labs (
title = "Predicted vs Actual Methods (Test Set) - Women" ,
x = "Actual Fight Outcome" ,
y = "Predicted Fight Outcome" ,
fill = "Predicted"
)
Code
save_last_plot ("women_predicted_vs_actual" )
## Saved: figures/Fig02_women_predicted_vs_actual.png
## Coefficient plot
coef_df <- tidy (multi_model) %>%
filter (term != "(Intercept)" )
coef_df <- coef_df %>%
mutate (
term_pretty = recode (
term,
winner_kd = "Knockdowns" ,
winner_str_acc = "Striking Accuracy" ,
winner_sub_att = "Submission Attempts" ,
winner_td_acc = "Takedown Accuracy" ,
winner_ctrl_sec = "Control Time (sec)" ,
winner_height = "Height" ,
winner_reach = "Reach" ,
.default = term
)
)
ggplot (coef_df, aes (x = term_pretty, y = estimate, fill = y.level)) +
geom_col (position = "dodge" ) +
coord_flip () +
labs (
title = "Coefficient Estimates by Win Condition - Women" ,
x = "Predictor" ,
y = "Log-odds Coefficient" ,
fill = "Win Condition"
) +
theme_minimal ()
Code
save_last_plot ("women_coefficients" )
## Saved: figures/Fig03_women_coefficients.png
## Coefficient heatmap
heat_df_long <- coef_df %>%
select (y.level, term_pretty, estimate)
heat_df_wide <- dcast (heat_df_long, term_pretty ~ y.level, value.var = "estimate" )
heat_df_melt <- melt (
heat_df_wide,
id.vars = "term_pretty" ,
variable.name = "win_condition" ,
value.name = "coef"
)
ggplot (heat_df_melt, aes (x = win_condition, y = term_pretty, fill = coef)) +
geom_tile () +
scale_fill_gradient2 (
low = "blue" ,
mid = "white" ,
high = "red" ,
midpoint = 0
) +
labs (
title = "Predictor Importance Heatmap (Coefficients) - Women" ,
x = "Win Condition" ,
y = "Predictor" ,
fill = "Coef"
) +
theme_minimal () +
theme (axis.text.x = element_text (angle = 45 , hjust = 1 ))
Code
save_last_plot ("women_heatmap" )
## Saved: figures/Fig04_women_heatmap.png
#Men's Section
## ==========================================================
## SEMMA-style multinomial model for UFC Men's fights
## S — SAMPLE
## 1. Load data (make sure the CSV is in your working directory)
ufc <- read.csv ("UFC Complete Dataset.csv" , stringsAsFactors = FALSE )
#table(ufc)
#summary(ufc)
dim (ufc)
## [1] 7439 95
## S — SAMPLE
## Filter to Men's fights
ufc_men <- ufc %>%
filter (grepl ("Men" , .data[[gender_col]], ignore.case = TRUE ))
dim (ufc_men)
## [1] 7439 95
## E — EXPLORE
print (summary (ufc_men))
## event_name r_fighter b_fighter winner
## Length:7439 Length:7439 Length:7439 Length:7439
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## weight_class is_title_bout gender method
## Length:7439 Min. :0.00000 Length:7439 Length:7439
## Class :character 1st Qu.:0.00000 Class :character Class :character
## Mode :character Median :0.00000 Mode :character Mode :character
## Mean :0.05579
## 3rd Qu.:0.00000
## Max. :1.00000
##
## finish_round total_rounds time_sec referee
## Min. :1.000 Min. :1.000 Min. : 5 Length:7439
## 1st Qu.:1.000 1st Qu.:3.000 1st Qu.: 149 Class :character
## Median :3.000 Median :3.000 Median : 287 Mode :character
## Mean :2.336 Mean :3.129 Mean : 227
## 3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.: 300
## Max. :5.000 Max. :5.000 Max. :1080
## NA's :31
## r_kd r_sig_str r_sig_str_att r_sig_str_acc
## Min. :0.0000 Min. : 0.00 Min. : 0.00 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.: 14.00 1st Qu.: 29.00 1st Qu.:0.3700
## Median :0.0000 Median : 31.00 Median : 66.00 Median :0.4700
## Mean :0.2492 Mean : 38.36 Mean : 83.79 Mean :0.4753
## 3rd Qu.:0.0000 3rd Qu.: 54.00 3rd Qu.:120.00 3rd Qu.:0.5700
## Max. :5.0000 Max. :445.00 Max. :744.00 Max. :1.0000
##
## r_str r_str_att r_str_acc r_td
## Min. : 0.0 Min. : 0.0 Min. :0.0000 Min. : 0.000
## 1st Qu.: 22.0 1st Qu.: 40.0 1st Qu.:0.4400 1st Qu.: 0.000
## Median : 50.0 Median : 94.0 Median :0.5500 Median : 1.000
## Mean : 58.2 Mean :106.4 Mean :0.5554 Mean : 1.227
## 3rd Qu.: 83.0 3rd Qu.:156.0 3rd Qu.:0.6800 3rd Qu.: 2.000
## Max. :447.0 Max. :746.0 Max. :1.0000 Max. :21.000
##
## r_td_att r_td_acc r_sub_att r_rev
## Min. : 0.000 Min. :0.0000 Min. : 0.0000 Min. :0.0000
## 1st Qu.: 0.000 1st Qu.:0.0000 1st Qu.: 0.0000 1st Qu.:0.0000
## Median : 1.000 Median :0.1100 Median : 0.0000 Median :0.0000
## Mean : 2.941 Mean :0.3111 Mean : 0.4557 Mean :0.1344
## 3rd Qu.: 4.000 3rd Qu.:0.5700 3rd Qu.: 1.0000 3rd Qu.:0.0000
## Max. :27.000 Max. :1.0000 Max. :10.0000 Max. :6.0000
##
## r_ctrl_sec r_wins_total r_losses_total r_age
## Min. : 0.0 Min. : 0.00 Min. : 0.00 Min. :19.00
## 1st Qu.: 7.0 1st Qu.: 13.00 1st Qu.: 5.00 1st Qu.:34.00
## Median : 72.0 Median : 18.00 Median : 7.00 Median :38.00
## Mean : 152.1 Mean : 19.19 Mean : 7.94 Mean :38.32
## 3rd Qu.: 230.0 3rd Qu.: 24.00 3rd Qu.:11.00 3rd Qu.:42.00
## Max. :1342.0 Max. :253.00 Max. :53.00 Max. :65.00
## NA's :76
## r_height r_weight r_reach r_stance
## Min. :152.4 Min. : 52.16 Min. :147.3 Length:7439
## 1st Qu.:172.7 1st Qu.: 65.77 1st Qu.:177.8 Class :character
## Median :177.8 Median : 77.11 Median :182.9 Mode :character
## Mean :178.6 Mean : 76.49 Mean :183.2
## 3rd Qu.:185.4 3rd Qu.: 83.91 3rd Qu.:190.5
## Max. :210.8 Max. :156.49 Max. :213.4
## NA's :412
## r_SLpM_total r_SApM_total r_sig_str_acc_total r_td_acc_total
## Min. : 0.000 Min. : 0.000 Min. :0.0000 Min. :0.0000
## 1st Qu.: 2.570 1st Qu.: 2.510 1st Qu.:0.4000 1st Qu.:0.2900
## Median : 3.330 Median : 3.180 Median :0.4500 Median :0.3900
## Mean : 3.412 Mean : 3.283 Mean :0.4414 Mean :0.3883
## 3rd Qu.: 4.205 3rd Qu.: 3.945 3rd Qu.:0.5000 3rd Qu.:0.5000
## Max. :23.330 Max. :15.480 Max. :0.8300 Max. :1.0000
##
## r_str_def_total r_td_def_total r_sub_avg r_td_avg
## Min. :0.0000 Min. :0.0000 Min. : 0.0000 Min. : 0.000
## 1st Qu.:0.5100 1st Qu.:0.5000 1st Qu.: 0.1000 1st Qu.: 0.620
## Median :0.5600 Median :0.6300 Median : 0.5000 Median : 1.330
## Mean :0.5431 Mean :0.6028 Mean : 0.6452 Mean : 1.598
## 3rd Qu.:0.6000 3rd Qu.:0.7400 3rd Qu.: 0.9000 3rd Qu.: 2.290
## Max. :0.8400 Max. :1.0000 Max. :13.8000 Max. :11.110
##
## b_kd b_sig_str b_sig_str_att b_sig_str_acc
## Min. :0.0000 Min. : 0.0 Min. : 0.00 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.: 10.0 1st Qu.: 24.00 1st Qu.:0.3300
## Median :0.0000 Median : 25.0 Median : 60.00 Median :0.4300
## Mean :0.1813 Mean : 33.5 Mean : 78.22 Mean :0.4298
## 3rd Qu.:0.0000 3rd Qu.: 48.0 3rd Qu.:115.00 3rd Qu.:0.5300
## Max. :4.0000 Max. :241.0 Max. :510.00 Max. :1.0000
##
## b_str b_str_att b_str_acc b_td
## Min. : 0.00 Min. : 0.00 Min. :0.0000 Min. : 0.000
## 1st Qu.: 16.00 1st Qu.: 33.00 1st Qu.:0.4000 1st Qu.: 0.000
## Median : 41.00 Median : 83.00 Median :0.5200 Median : 0.000
## Mean : 49.42 Mean : 96.38 Mean :0.5152 Mean : 0.893
## 3rd Qu.: 72.00 3rd Qu.:143.00 3rd Qu.:0.6400 3rd Qu.: 1.000
## Max. :336.00 Max. :556.00 Max. :1.0000 Max. :12.000
##
## b_td_att b_td_acc b_sub_att b_rev
## Min. : 0.000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.: 0.000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median : 1.000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean : 2.653 Mean :0.2262 Mean :0.3254 Mean :0.1331
## 3rd Qu.: 4.000 3rd Qu.:0.4000 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :49.000 Max. :1.0000 Max. :7.0000 Max. :4.0000
##
## b_ctrl_sec b_wins_total b_losses_total b_age
## Min. : 0.0 Min. : 0.00 Min. : 0.000 Min. :21.00
## 1st Qu.: 2.0 1st Qu.: 11.00 1st Qu.: 4.000 1st Qu.:33.00
## Median : 42.0 Median : 16.00 Median : 7.000 Median :37.00
## Mean : 109.2 Mean : 17.17 Mean : 7.399 Mean :37.72
## 3rd Qu.: 159.5 3rd Qu.: 21.00 3rd Qu.:10.000 3rd Qu.:42.00
## Max. :1193.0 Max. :253.00 Max. :53.000 Max. :81.00
## NA's :190
## b_height b_weight b_reach b_stance
## Min. :152.4 Min. : 52.16 Min. :147.3 Length:7439
## 1st Qu.:172.7 1st Qu.: 65.77 1st Qu.:175.3 Class :character
## Median :177.8 Median : 70.31 Median :182.9 Mode :character
## Mean :178.6 Mean : 76.32 Mean :182.8
## 3rd Qu.:185.4 3rd Qu.: 83.91 3rd Qu.:190.5
## Max. :210.8 Max. :349.27 Max. :213.4
## NA's :888
## b_SLpM_total b_SApM_total b_sig_str_acc_total b_td_acc_total
## Min. : 0.000 Min. : 0.000 Min. :0.0000 Min. :0.0000
## 1st Qu.: 2.340 1st Qu.: 2.600 1st Qu.:0.3900 1st Qu.:0.2500
## Median : 3.250 Median : 3.290 Median :0.4400 Median :0.3600
## Mean : 3.269 Mean : 3.455 Mean :0.4293 Mean :0.3602
## 3rd Qu.: 4.110 3rd Qu.: 4.180 3rd Qu.:0.4900 3rd Qu.:0.4800
## Max. :11.030 Max. :42.000 Max. :1.0000 Max. :1.0000
##
## b_str_def_total b_td_def_total b_sub_avg b_td_avg
## Min. :0.0000 Min. :0.0000 Min. : 0.0000 Min. : 0.000
## 1st Qu.:0.4900 1st Qu.:0.4500 1st Qu.: 0.0000 1st Qu.: 0.490
## Median :0.5400 Median :0.6100 Median : 0.4000 Median : 1.160
## Mean :0.5222 Mean :0.5653 Mean : 0.5995 Mean : 1.464
## 3rd Qu.:0.5900 3rd Qu.:0.7200 3rd Qu.: 0.8000 3rd Qu.: 2.120
## Max. :1.0000 Max. :1.0000 Max. :16.4000 Max. :13.950
##
## kd_diff sig_str_diff sig_str_att_diff sig_str_acc_diff
## Min. :-4.00000 Min. :-157.000 Min. :-318.000 Min. :-1.00000
## 1st Qu.: 0.00000 1st Qu.: -8.000 1st Qu.: -14.000 1st Qu.:-0.09000
## Median : 0.00000 Median : 4.000 Median : 4.000 Median : 0.04000
## Mean : 0.06789 Mean : 4.861 Mean : 5.564 Mean : 0.04549
## 3rd Qu.: 0.00000 3rd Qu.: 18.000 3rd Qu.: 26.000 3rd Qu.: 0.17000
## Max. : 5.00000 Max. : 312.000 Max. : 461.000 Max. : 1.00000
##
## str_diff str_att_diff str_acc_diff td_diff
## Min. :-276.000 Min. :-339.000 Min. :-1.00000 Min. :-12.0000
## 1st Qu.: -12.000 1st Qu.: -19.000 1st Qu.:-0.09000 1st Qu.: -1.0000
## Median : 6.000 Median : 6.000 Median : 0.03000 Median : 0.0000
## Mean : 8.783 Mean : 9.993 Mean : 0.04029 Mean : 0.3335
## 3rd Qu.: 29.000 3rd Qu.: 38.000 3rd Qu.: 0.17000 3rd Qu.: 1.0000
## Max. : 315.000 Max. : 462.000 Max. : 1.00000 Max. : 20.0000
##
## td_att_diff td_acc_diff sub_att_diff rev_diff
## Min. :-44.0000 Min. :-1.00000 Min. :-7.0000 Min. :-3.000000
## 1st Qu.: -2.0000 1st Qu.:-0.20000 1st Qu.: 0.0000 1st Qu.: 0.000000
## Median : 0.0000 Median : 0.00000 Median : 0.0000 Median : 0.000000
## Mean : 0.2883 Mean : 0.08486 Mean : 0.1303 Mean : 0.001344
## 3rd Qu.: 3.0000 3rd Qu.: 0.50000 3rd Qu.: 0.0000 3rd Qu.: 0.000000
## Max. : 26.0000 Max. : 1.00000 Max. :10.0000 Max. : 5.000000
##
## ctrl_sec_diff wins_total_diff losses_total_diff age_diff
## Min. :-1164.00 Min. :-241.000 Min. :-47.0000 Min. :-24.000
## 1st Qu.: -53.00 1st Qu.: -4.000 1st Qu.: -3.0000 1st Qu.: -3.000
## Median : 5.00 Median : 2.000 Median : 0.0000 Median : 0.000
## Mean : 42.92 Mean : 2.019 Mean : 0.5412 Mean : 0.388
## 3rd Qu.: 144.00 3rd Qu.: 8.000 3rd Qu.: 4.0000 3rd Qu.: 4.000
## Max. : 1301.00 Max. : 249.000 Max. : 46.0000 Max. : 17.000
## NA's :213
## height_diff weight_diff reach_diff SLpM_total_diff
## Min. :-30.48000 Min. :-258.550 Min. :-27.9400 Min. :-8.9900
## 1st Qu.: -5.08000 1st Qu.: 0.000 1st Qu.: -5.0800 1st Qu.:-0.8600
## Median : 0.00000 Median : 0.000 Median : 0.0000 Median : 0.1300
## Mean : 0.04609 Mean : 0.171 Mean : 0.1901 Mean : 0.1426
## 3rd Qu.: 5.08000 3rd Qu.: 0.000 3rd Qu.: 5.0800 3rd Qu.: 1.1600
## Max. : 33.02000 Max. : 52.160 Max. : 33.0200 Max. :18.7800
## NA's :1038
## SApM_total_diff sig_str_acc_total_diff td_acc_total_diff
## Min. :-39.4900 Min. :-0.70000 Min. :-1.00000
## 1st Qu.: -1.0400 1st Qu.:-0.06000 1st Qu.:-0.13000
## Median : -0.1200 Median : 0.01000 Median : 0.02000
## Mean : -0.1716 Mean : 0.01211 Mean : 0.02816
## 3rd Qu.: 0.7900 3rd Qu.: 0.08000 3rd Qu.: 0.19000
## Max. : 12.6400 Max. : 0.83000 Max. : 1.00000
##
## str_def_total_diff td_def_total_diff sub_avg_diff td_avg_diff
## Min. :-0.58000 Min. :-1.00000 Min. :-15.10000 Min. :-11.7700
## 1st Qu.:-0.04000 1st Qu.:-0.14000 1st Qu.: -0.40000 1st Qu.: -0.8700
## Median : 0.01000 Median : 0.02000 Median : 0.00000 Median : 0.0900
## Mean : 0.02091 Mean : 0.03751 Mean : 0.04565 Mean : 0.1345
## 3rd Qu.: 0.08000 3rd Qu.: 0.21000 3rd Qu.: 0.50000 3rd Qu.: 1.1600
## Max. : 0.72000 Max. : 1.00000 Max. : 13.80000 Max. : 11.1100
##
print (table (ufc_men$ method))
##
## Decision - Majority Decision - Split Decision - Unanimous
## 59 704 2660
## DQ KO/TKO Submission
## 22 2404 1501
## TKO - Doctor's Stoppage
## 89
## M — MODIFY
men_winner_stats <- ufc_men %>%
mutate (
winner_kd = ifelse (winner == "Red" , r_kd, b_kd),
winner_str_acc = ifelse (winner == "Red" , r_str_acc, b_str_acc),
winner_sub_att = ifelse (winner == "Red" , r_sub_att, b_sub_att),
winner_td_acc = ifelse (winner == "Red" , r_td_acc, b_td_acc),
winner_ctrl_sec = ifelse (winner == "Red" , r_ctrl_sec, b_ctrl_sec),
winner_height = ifelse (winner == "Red" , r_height, b_height),
winner_reach = ifelse (winner == "Red" , r_reach, b_reach),
method = as.factor (method)
) %>%
select (
method,
winner_kd,
winner_str_acc,
winner_sub_att,
winner_td_acc,
winner_ctrl_sec,
winner_height,
winner_reach
)
print (head (men_winner_stats))
## method winner_kd winner_str_acc winner_sub_att winner_td_acc
## 1 Decision - Unanimous 0 0.60 0 0.33
## 2 Decision - Unanimous 0 0.70 1 0.58
## 3 KO/TKO 1 0.61 0 0.33
## 4 KO/TKO 1 0.61 0 0.00
## 5 Submission 0 0.60 2 1.00
## 6 Submission 1 0.55 1 0.00
## winner_ctrl_sec winner_height winner_reach
## 1 419 165.10 165.10
## 2 631 190.50 200.66
## 3 49 187.96 190.50
## 4 30 177.80 177.80
## 5 175 177.80 182.88
## 6 47 185.42 193.04
print (summary (men_winner_stats))
## method winner_kd winner_str_acc
## Decision - Majority : 59 Min. :0.0000 Min. :0.0000
## Decision - Split : 704 1st Qu.:0.0000 1st Qu.:0.4800
## Decision - Unanimous :2660 Median :0.0000 Median :0.5900
## DQ : 22 Mean :0.3702 Mean :0.5864
## KO/TKO :2404 3rd Qu.:1.0000 3rd Qu.:0.7000
## Submission :1501 Max. :5.0000 Max. :1.0000
## TKO - Doctor's Stoppage: 89
## winner_sub_att winner_td_acc winner_ctrl_sec winner_height
## Min. : 0.0000 Min. :0.0000 Min. : 0.0 Min. :152.4
## 1st Qu.: 0.0000 1st Qu.:0.0000 1st Qu.: 14.0 1st Qu.:172.7
## Median : 0.0000 Median :0.2600 Median : 99.0 Median :180.3
## Mean : 0.5356 Mean :0.3557 Mean : 180.5 Mean :178.7
## 3rd Qu.: 1.0000 3rd Qu.:0.6600 3rd Qu.: 283.0 3rd Qu.:185.4
## Max. :10.0000 Max. :1.0000 Max. :1342.0 Max. :210.8
##
## winner_reach
## Min. :147.3
## 1st Qu.:177.8
## Median :182.9
## Mean :183.3
## 3rd Qu.:190.5
## Max. :213.4
## NA's :323
ggpairs (men_winner_stats)
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.
Code
## 2. Train/test split (same seed for comparability)
set.seed (213 )
n_men <- nrow (men_winner_stats)
train_idx_men <- sample (1 : n_men, size = 0.8 * n_men)
train_men <- men_winner_stats[train_idx_men, ]
test_men <- men_winner_stats[- train_idx_men, ]
cat ("Men - Training rows:" , nrow (train_men), " \n " )
## Men - Training rows: 5951
cat ("Men - Testing rows:" , nrow (test_men), " \n " )
## Men - Testing rows: 1488
## M — MODEL
multi_model_men <- multinom (
method ~ winner_kd +
winner_str_acc +
winner_sub_att +
winner_td_acc +
winner_ctrl_sec +
winner_height +
winner_reach,
data = train_men
)
## # weights: 63 (48 variable)
## initial value 11091.687850
## iter 10 value 7802.080742
## iter 20 value 6500.315388
## iter 30 value 5812.675640
## iter 40 value 5552.226957
## iter 50 value 5522.094640
## iter 60 value 5519.859297
## iter 70 value 5519.756607
## final value 5519.754811
## converged
print (summary (multi_model_men))
## Call:
## multinom(formula = method ~ winner_kd + winner_str_acc + winner_sub_att +
## winner_td_acc + winner_ctrl_sec + winner_height + winner_reach,
## data = train_men)
##
## Coefficients:
## (Intercept) winner_kd winner_str_acc winner_sub_att
## Decision - Split 11.9259027 -0.57550450 -4.2315040 -0.3681001
## Decision - Unanimous 11.0940044 0.31323481 -3.7130109 -0.1299592
## DQ -0.5565765 -1.47826416 -0.8312199 0.1184727
## KO/TKO 1.5189475 1.83269529 2.8011993 -0.7924909
## Submission 5.3776496 -0.04151452 0.1349387 1.3846509
## TKO - Doctor's Stoppage 4.3854781 0.61144859 1.5668175 -0.8427312
## winner_td_acc winner_ctrl_sec winner_height
## Decision - Split 0.1926110 -0.0006225035 -0.002752281
## Decision - Unanimous 0.6289374 0.0016636293 0.009336466
## DQ 0.9883153 -0.0058225947 0.077286867
## KO/TKO 0.9778683 -0.0065802786 0.020795540
## Submission 1.5306409 -0.0069957298 0.017605809
## TKO - Doctor's Stoppage 0.8497547 -0.0030891465 -0.033462065
## winner_reach
## Decision - Split -0.034295218
## Decision - Unanimous -0.040544620
## DQ -0.070235466
## KO/TKO -0.016750200
## Submission -0.031934803
## TKO - Doctor's Stoppage 0.007664103
##
## Std. Errors:
## (Intercept) winner_kd winner_str_acc winner_sub_att
## Decision - Split 0.126692885 0.12762021 0.27734166 0.2239190
## Decision - Unanimous 0.491528798 0.08274134 0.19929015 0.2100884
## DQ 0.004058922 0.03559572 0.02809145 0.4209908
## KO/TKO 0.250674538 0.07909873 0.22779100 0.2211581
## Submission 0.106951273 0.10472096 0.22922943 0.2132005
## TKO - Doctor's Stoppage 0.005217409 0.20756090 0.02538591 0.3659162
## winner_td_acc winner_ctrl_sec winner_height
## Decision - Split 0.1915157 0.0007282785 0.03241923
## Decision - Unanimous 0.1594633 0.0006809868 0.03154756
## DQ 0.4909471 0.0020094514 0.05782793
## KO/TKO 0.1637804 0.0007298580 0.03175879
## Submission 0.1682738 0.0007457273 0.03206719
## TKO - Doctor's Stoppage 0.3378717 0.0010397530 0.04080604
## winner_reach
## Decision - Split 0.03139503
## Decision - Unanimous 0.03040381
## DQ 0.05642828
## KO/TKO 0.03072872
## Submission 0.03106221
## TKO - Doctor's Stoppage 0.03952539
##
## Residual Deviance: 11039.51
## AIC: 11135.51
## Coefficient inference
coef_mat_men <- summary (multi_model_men)$ coefficients
se_mat_men <- summary (multi_model_men)$ standard.errors
z_values_men <- coef_mat_men / se_mat_men
p_values_men <- 2 * (1 - pnorm (abs (z_values_men)))
levels (train_men$ method)
## [1] "Decision - Majority" "Decision - Split"
## [3] "Decision - Unanimous" "DQ"
## [5] "KO/TKO" "Submission"
## [7] "TKO - Doctor's Stoppage"
sig_idx_men <- which (p_values_men < 0.05 , arr.ind = TRUE )
sig_effects_men <- data.frame (
method = rownames (p_values_men)[sig_idx_men[, "row" ]],
predictor = colnames (p_values_men)[sig_idx_men[, "col" ]],
p_value = p_values_men[sig_idx_men]
)
sig_effects_men <- sig_effects_men[
order (sig_effects_men$ predictor, sig_effects_men$ method),
]
sig_effects_men
## method predictor p_value
## 1 Decision - Split (Intercept) 0.000000e+00
## 2 Decision - Unanimous (Intercept) 0.000000e+00
## 3 DQ (Intercept) 0.000000e+00
## 4 KO/TKO (Intercept) 1.365957e-09
## 5 Submission (Intercept) 0.000000e+00
## 6 TKO - Doctor's Stoppage (Intercept) 0.000000e+00
## 25 Decision - Unanimous winner_ctrl_sec 1.456701e-02
## 26 DQ winner_ctrl_sec 3.760249e-03
## 27 KO/TKO winner_ctrl_sec 0.000000e+00
## 28 Submission winner_ctrl_sec 0.000000e+00
## 29 TKO - Doctor's Stoppage winner_ctrl_sec 2.967941e-03
## 7 Decision - Split winner_kd 6.497782e-06
## 8 Decision - Unanimous winner_kd 1.532698e-04
## 9 DQ winner_kd 0.000000e+00
## 10 KO/TKO winner_kd 0.000000e+00
## 11 TKO - Doctor's Stoppage winner_kd 3.220418e-03
## 12 Decision - Split winner_str_acc 0.000000e+00
## 13 Decision - Unanimous winner_str_acc 0.000000e+00
## 14 DQ winner_str_acc 0.000000e+00
## 15 KO/TKO winner_str_acc 0.000000e+00
## 16 TKO - Doctor's Stoppage winner_str_acc 0.000000e+00
## 17 KO/TKO winner_sub_att 3.391920e-04
## 18 Submission winner_sub_att 8.325718e-11
## 19 TKO - Doctor's Stoppage winner_sub_att 2.127481e-02
## 20 Decision - Unanimous winner_td_acc 8.010377e-05
## 21 DQ winner_td_acc 4.410631e-02
## 22 KO/TKO winner_td_acc 2.363758e-09
## 23 Submission winner_td_acc 0.000000e+00
## 24 TKO - Doctor's Stoppage winner_td_acc 1.190249e-02
cat (" \n Men Z-values: \n " )
##
## Men Z-values:
print (z_values_men)
## (Intercept) winner_kd winner_str_acc winner_sub_att
## Decision - Split 94.132379 -4.509509 -15.2573688 -1.6438986
## Decision - Unanimous 22.570406 3.785711 -18.6311808 -0.6185932
## DQ -137.124197 -41.529268 -29.5897874 0.2814140
## KO/TKO 6.059441 23.169718 12.2972344 -3.5833682
## Submission 50.281305 -0.396430 0.5886623 6.4945946
## TKO - Doctor's Stoppage 840.547060 2.945876 61.7199557 -2.3030716
## winner_td_acc winner_ctrl_sec winner_height
## Decision - Split 1.005719 -0.8547603 -0.08489654
## Decision - Unanimous 3.944089 2.4429684 0.29594891
## DQ 2.013079 -2.8976041 1.33649730
## KO/TKO 5.970605 -9.0158337 0.65479640
## Submission 9.096133 -9.3810829 0.54902868
## TKO - Doctor's Stoppage 2.515022 -2.9710389 -0.82002732
## winner_reach
## Decision - Split -1.0923774
## Decision - Unanimous -1.3335376
## DQ -1.2446856
## KO/TKO -0.5450992
## Submission -1.0280917
## TKO - Doctor's Stoppage 0.1939033
cat (" \n Men P-values: \n " )
##
## Men P-values:
print (p_values_men)
## (Intercept) winner_kd winner_str_acc winner_sub_att
## Decision - Split 0.000000e+00 6.497782e-06 0.0000000 1.001972e-01
## Decision - Unanimous 0.000000e+00 1.532698e-04 0.0000000 5.361844e-01
## DQ 0.000000e+00 0.000000e+00 0.0000000 7.783929e-01
## KO/TKO 1.365957e-09 0.000000e+00 0.0000000 3.391920e-04
## Submission 0.000000e+00 6.917879e-01 0.5560878 8.325718e-11
## TKO - Doctor's Stoppage 0.000000e+00 3.220418e-03 0.0000000 2.127481e-02
## winner_td_acc winner_ctrl_sec winner_height
## Decision - Split 3.145507e-01 0.392683875 0.9323436
## Decision - Unanimous 8.010377e-05 0.014567012 0.7672691
## DQ 4.410631e-02 0.003760249 0.1813868
## KO/TKO 2.363758e-09 0.000000000 0.5125989
## Submission 0.000000e+00 0.000000000 0.5829858
## TKO - Doctor's Stoppage 1.190249e-02 0.002967941 0.4122005
## winner_reach
## Decision - Split 0.2746673
## Decision - Unanimous 0.1823555
## DQ 0.2132473
## KO/TKO 0.5856853
## Submission 0.3039067
## TKO - Doctor's Stoppage 0.8462516
## A — ASSESS
pred_test_men <- predict (multi_model_men, newdata = test_men, type = "class" )
pred_test_men <- factor (pred_test_men, levels = levels (test_men$ method))
conf_mat_men <- table (Predicted = pred_test_men, Actual = test_men$ method)
cat (" \n Men Confusion matrix (Test set): \n " )
##
## Men Confusion matrix (Test set):
print (conf_mat_men)
## Actual
## Predicted Decision - Majority Decision - Split
## Decision - Majority 0 0
## Decision - Split 0 0
## Decision - Unanimous 5 102
## DQ 0 0
## KO/TKO 3 24
## Submission 4 21
## TKO - Doctor's Stoppage 0 0
## Actual
## Predicted Decision - Unanimous DQ KO/TKO Submission
## Decision - Majority 0 0 0 0
## Decision - Split 0 0 0 0
## Decision - Unanimous 408 1 76 55
## DQ 0 0 0 0
## KO/TKO 82 1 343 19
## Submission 49 0 17 192
## TKO - Doctor's Stoppage 0 0 0 0
## Actual
## Predicted TKO - Doctor's Stoppage
## Decision - Majority 0
## Decision - Split 0
## Decision - Unanimous 8
## DQ 0
## KO/TKO 6
## Submission 0
## TKO - Doctor's Stoppage 0
accuracy_men <- mean (pred_test_men == test_men$ method, na.rm = TRUE )
cat (" \n Men Accuracy (direct):" , accuracy_men, " \n " )
##
## Men Accuracy (direct): 0.6659605
accuracy_from_confmat_men <- sum (diag (conf_mat_men)) / sum (conf_mat_men)
cat ("Men Accuracy (confusion matrix):" , accuracy_from_confmat_men, " \n " )
## Men Accuracy (confusion matrix): 0.6659605
cm_men <- confusionMatrix (
data = pred_test_men,
reference = test_men$ method
)
cat (" \n === caret::confusionMatrix Summary (Men) === \n " )
##
## === caret::confusionMatrix Summary (Men) ===
print (cm_men)
## Confusion Matrix and Statistics
##
## Reference
## Prediction Decision - Majority Decision - Split
## Decision - Majority 0 0
## Decision - Split 0 0
## Decision - Unanimous 5 102
## DQ 0 0
## KO/TKO 3 24
## Submission 4 21
## TKO - Doctor's Stoppage 0 0
## Reference
## Prediction Decision - Unanimous DQ KO/TKO Submission
## Decision - Majority 0 0 0 0
## Decision - Split 0 0 0 0
## Decision - Unanimous 408 1 76 55
## DQ 0 0 0 0
## KO/TKO 82 1 343 19
## Submission 49 0 17 192
## TKO - Doctor's Stoppage 0 0 0 0
## Reference
## Prediction TKO - Doctor's Stoppage
## Decision - Majority 0
## Decision - Split 0
## Decision - Unanimous 8
## DQ 0
## KO/TKO 6
## Submission 0
## TKO - Doctor's Stoppage 0
##
## Overall Statistics
##
## Accuracy : 0.666
## 95% CI : (0.6407, 0.6905)
## No Information Rate : 0.3806
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.5105
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: Decision - Majority Class: Decision - Split
## Sensitivity 0.000000 0.0000
## Specificity 1.000000 1.0000
## Pos Pred Value NaN NaN
## Neg Pred Value 0.991525 0.8962
## Prevalence 0.008475 0.1038
## Detection Rate 0.000000 0.0000
## Detection Prevalence 0.000000 0.0000
## Balanced Accuracy 0.500000 0.5000
## Class: Decision - Unanimous Class: DQ Class: KO/TKO
## Sensitivity 0.7570 0.000000 0.7867
## Specificity 0.7184 1.000000 0.8622
## Pos Pred Value 0.6229 NaN 0.7176
## Neg Pred Value 0.8279 0.998588 0.9009
## Prevalence 0.3806 0.001412 0.3079
## Detection Rate 0.2881 0.000000 0.2422
## Detection Prevalence 0.4626 0.000000 0.3376
## Balanced Accuracy 0.7377 0.500000 0.8245
## Class: Submission Class: TKO - Doctor's Stoppage
## Sensitivity 0.7218 0.000000
## Specificity 0.9209 1.000000
## Pos Pred Value 0.6784 NaN
## Neg Pred Value 0.9347 0.990113
## Prevalence 0.1879 0.009887
## Detection Rate 0.1356 0.000000
## Detection Prevalence 0.1999 0.000000
## Balanced Accuracy 0.8213 0.500000
cm_men$ overall
## Accuracy Kappa AccuracyLower AccuracyUpper AccuracyNull
## 6.659605e-01 5.105198e-01 6.407161e-01 6.905142e-01 3.806497e-01
## AccuracyPValue McnemarPValue
## 1.838332e-104 NaN
cm_byclass_df_men <- as.data.frame (cm_men$ byClass)
cm_byclass_df_men
## Sensitivity Specificity Pos Pred Value
## Class: Decision - Majority 0.0000000 1.0000000 NaN
## Class: Decision - Split 0.0000000 1.0000000 NaN
## Class: Decision - Unanimous 0.7569573 0.7183580 0.6229008
## Class: DQ 0.0000000 1.0000000 NaN
## Class: KO/TKO 0.7866972 0.8622449 0.7175732
## Class: Submission 0.7218045 0.9208696 0.6784452
## Class: TKO - Doctor's Stoppage 0.0000000 1.0000000 NaN
## Neg Pred Value Precision Recall F1
## Class: Decision - Majority 0.9915254 NA 0.0000000 NA
## Class: Decision - Split 0.8961864 NA 0.0000000 NA
## Class: Decision - Unanimous 0.8278581 0.6229008 0.7569573 0.6834171
## Class: DQ 0.9985876 NA 0.0000000 NA
## Class: KO/TKO 0.9008529 0.7175732 0.7866972 0.7505470
## Class: Submission 0.9346867 0.6784452 0.7218045 0.6994536
## Class: TKO - Doctor's Stoppage 0.9901130 NA 0.0000000 NA
## Prevalence Detection Rate Detection Prevalence
## Class: Decision - Majority 0.008474576 0.0000000 0.0000000
## Class: Decision - Split 0.103813559 0.0000000 0.0000000
## Class: Decision - Unanimous 0.380649718 0.2881356 0.4625706
## Class: DQ 0.001412429 0.0000000 0.0000000
## Class: KO/TKO 0.307909605 0.2422316 0.3375706
## Class: Submission 0.187853107 0.1355932 0.1998588
## Class: TKO - Doctor's Stoppage 0.009887006 0.0000000 0.0000000
## Balanced Accuracy
## Class: Decision - Majority 0.5000000
## Class: Decision - Split 0.5000000
## Class: Decision - Unanimous 0.7376577
## Class: DQ 0.5000000
## Class: KO/TKO 0.8244711
## Class: Submission 0.8213370
## Class: TKO - Doctor's Stoppage 0.5000000
## McFadden R2
null_model_men <- multinom (method ~ 1 , data = train_men, trace = FALSE )
LL_full_men <- as.numeric (logLik (multi_model_men))
LL_null_men <- as.numeric (logLik (null_model_men))
McFadden_R2_men <- 1 - (LL_full_men / LL_null_men)
cat (" \n Men McFadden R2:" , McFadden_R2_men, " \n " )
##
## Men McFadden R2: 0.332038
## Predicted vs Actual plot
comparison_df_men <- data.frame (
actual = test_men$ method,
predicted = pred_test_men
)
ggplot (comparison_df_men, aes (x = actual, fill = predicted)) +
geom_bar (position = "dodge" ) +
theme_minimal () +
labs (
title = "Predicted vs Actual Methods (Men Test Set)" ,
x = "Actual Method" ,
y = "Count" ,
fill = "Predicted"
)
Code
save_last_plot ("men_predicted_vs_actual" )
## Saved: figures/Fig05_men_predicted_vs_actual.png
## Coefficient plot
coef_df_men <- tidy (multi_model_men) %>%
filter (term != "(Intercept)" ) %>%
mutate (
term_pretty = recode (
term,
winner_kd = "Knockdowns" ,
winner_str_acc = "Striking Accuracy" ,
winner_sub_att = "Submission Attempts" ,
winner_td_acc = "Takedown Accuracy" ,
winner_ctrl_sec = "Control Time (sec)" ,
winner_height = "Height" ,
winner_reach = "Reach" ,
.default = term
)
)
ggplot (coef_df_men, aes (x = term_pretty, y = estimate, fill = y.level)) +
geom_col (position = "dodge" ) +
coord_flip () +
labs (
title = "Coefficient Estimates by Win Condition (Men)" ,
x = "Predictor" ,
y = "Log-odds Coefficient" ,
fill = "Win Condition"
) +
theme_minimal ()
Code
save_last_plot ("men_coefficients" )
## Saved: figures/Fig06_men_coefficients.png
## Coefficient heatmap
heat_df_long_men <- coef_df_men %>%
select (y.level, term_pretty, estimate)
heat_df_wide_men <- dcast (
heat_df_long_men,
term_pretty ~ y.level,
value.var = "estimate"
)
heat_df_melt_men <- melt (
heat_df_wide_men,
id.vars = "term_pretty" ,
variable.name = "win_condition" ,
value.name = "coef"
)
ggplot (heat_df_melt_men, aes (x = win_condition, y = term_pretty, fill = coef)) +
geom_tile () +
scale_fill_gradient2 (
low = "blue" ,
mid = "white" ,
high = "red" ,
midpoint = 0
) +
labs (
title = "Predictor Importance Heatmap (Men)" ,
x = "Win Condition" ,
y = "Predictor" ,
fill = "Coef"
) +
theme_minimal () +
theme (axis.text.x = element_text (angle = 45 , hjust = 1 ))
Code
## ==========================================================
## Unified Comparison Plots
## ==========================================================
## 1. Combined Coefficient Plot
coef_combined <- bind_rows (
mutate (coef_df, gender = "Women" ),
mutate (coef_df_men, gender = "Men" )
)
ggplot (coef_combined, aes (x = term_pretty, y = estimate, fill = y.level)) +
geom_col (position = "dodge" ) +
facet_wrap (~ gender) +
coord_flip () +
labs (
title = "Coefficient Estimates: Men vs Women" ,
x = "Predictor" ,
y = "Log-odds Coefficient" ,
fill = "Win Condition"
) +
theme_minimal ()
Code
save_last_plot ("combined_coefficients" )
## Saved: figures/Fig07_combined_coefficients.png
## 2. Combined Heatmap
heat_combined <- bind_rows (
mutate (heat_df_melt, gender = "Women" ),
mutate (heat_df_melt_men, gender = "Men" )
)
ggplot (heat_combined, aes (x = win_condition, y = term_pretty, fill = coef)) +
geom_tile () +
facet_wrap (~ gender) +
scale_fill_gradient2 (low = "blue" , mid = "white" , high = "red" , midpoint = 0 ) +
labs (
title = "Predictor Importance Heatmap: Men vs Women" ,
x = "Win Condition" ,
y = "Predictor" ,
fill = "Coef"
) +
theme_minimal () +
theme (axis.text.x = element_text (angle = 45 , hjust = 1 ))
Code
save_last_plot ("combined_heatmap" )
## Saved: figures/Fig08_combined_heatmap.png
Background
The United Fighting Championship (UFC) is a competitive mixed martial arts organization capable of generating up to 1.4 billion USD alone in 2024. To maintain this high amount of global revenue, addressing issues with nurturing new talent across different weight classes for future fights and concerns about boxer pay percentage for televised fights is paramount to the continued success of UFC. Tabling these issues could lead to a drastic decline in revenue and stall the future progression of this globally popular combat sport.
Introduction
To this point, this project seeks to analyze boxer performance data across (male and female) boxers that could provide solid metrics that coaches and prospective boxers can use for: (i) scouting, (ii) performance improvement, (iii) and metrics that can be used to argue for higher boxer pay. The data for this project was obtained and scrubbed from the UFC Website and compiled on Kaggle; with fights in recent history being cataloged (1996 to 2024).
Even though the dataset is quite robust, the following independent variables were chosen for analysis:
Amount of Knockdowns
Amount of times a boxer is knocked onto ring canvas, hangs on the ropes to prevent his/her fall, or any part of the body other than the soles of the feet touches the ring canvas
Striking Accuracy
Total % of strikes that connect to opponent boxer
Takedown Accuracy
Total % of successful takedowns over total takedown attempts
Submission Attempts
Number of attempts when fighter applies a hold to force an opponent to “tap out”
Control Time
Amount of time spent in dominant cinch or ground positions
Height
Overall height of boxer
Arm Reach
Measured arm length of boxer
Given that there are multiple ways to win in a UFC fight, the win conditions (dependent variables) in this dataset were coded as follows:
KO/TKO
Knockout / Technical Knock-Out
TKO – Doctor’s Stoppage
Doctor ruling of TKO
Decision – Split
Split decision from judges
Decision – Majority
Majority ruling from judges
Decision – Unanimous
Judge consensus of match win
Submission
Successful “tap-out” of opponent
DQ
Disqualification
Methods
Using the sequential steps of the SEMMA (Sampling, Exploring, Modify, Model, Assess) analysis, a multinomial logistic regression was developed for use with this data. All data processing was done within R using the following libraries: dplyr, readr, ggplot2, nnet, broom, reshape2, GGally, and caret.
After loading the UFC dataset, the overall document was accepted as a 7439 x 95 Matrix (n = 7439 observations). To ease data calculation and get a representative sample, the UFC data was filtered to only include women. This now reduced the matrix to 733 x 95. From here, the specified chosen variables above were selected as this dataset had variables and differences between the winning and losing boxer; these variables were separated to produce new datasets for our men and women fighters. These results of each method were also graphed via histogram to showcase the spread of each win condition.(FIGURE INSERTION HERE)
SAMPLING
The seed function was used and set to 123231 for replicability when deciding to plot the datasets for training and test validation. An 80/20 training split was utilized creating a result of 586 training matches and 147 validation matches for women and training matches and validation matches for men.
EXPLORING
A covariance matrix was made to visualize and confirm dataset relationships by using the GGally:: ggpairs() command. These results are displayed and showcase covariance between our specified variables. There seemed to be some promise with the predictive power of these variables and also showcased some of our other independent variables sharing significant correlations with each other.
Distrubution Plots (Men & Women) This is just a test to visualize the plots.
MODIFY
As most variables do not show any sign of conflicting or duplicate measures, there was no need to reduce dimensionality or number of variables in analysis. Given the robustness of data and specification of variables, there should be no need for modification to the data.
MODEL
A multinomial regression command (nnet: multinom) was used with this data to assess the predictive power of these variables with this model. Deviance and AIC was also assessed for these variables (Figure 3). These values were then taken to calculate their respective z-scores and p-values, to identify which of these variables were significant predictors (Figure 3). These were put into a correlational heatmap to be able to visualize which variables are the most important across the different methods of winning.
Predictions vs. Actual (Men & Women)
This is just a test to visualize the predictive model & actual data plots.
Coefficeint Plots (Men & Women)
This is just a test to visualize the coefficient plots.
Heatmaps (Men & Women)
This is just a test to visualize the heatmap models.
Discussion
(NEED TO MODIFY FOR BOTH MEN AND WOMEN NOW)
This study attempts to identify key predictors across UFC win conditions and if these predictors are consistent. From these chosen variables, the amount of knockdown attempts, takedown accuracy, striking accuracy, submission attempts, and control time were all significant predictors. However, submission attempts was only significant in the submission win condition which was expected. When examining the multinomial logistic model’s ability to predict win condition with the confusion matrix, there was no loss in accuracy.
The generalizability in this model prediction may be affected by the smaller sample pool that was utilized and low amount of win conditions. While certain win conditions like Submission and Unanimous decisions seemed to be predicted fairly well, there seems to also be issues in overall predictability. First off, choosing women may have limited the amount of win conditions observed across these matches. In total there was only 1 disqualification (DQ), 5 technical knockouts due to Doctor Stoppage (TKO – Doctor Stoppage), and 5 Majority Decision (Decision – Majority) rulings. This limited training of the model may explain the incorrect identification of these matches as Unanimous Decision. This leads to the second limitation which may stem from not enough variables to explain variance in these matches. While there may be some variables like submission amount and knockdowns that seem to have inherent relatability to these different win conditions, these span across win rulings causing this variability to range. Using the submission win as the example, while submissions were the lead prediction, there are not enough variables to differentiate it from the unanimous decision ruling which may explain why the other model incorrectly predicted Submission wins as a unanimous win condition. This inclusion of more match specific data or including a more varying measurements can help to explain the variance seen in these matches. To reiterate the important predictors from this model, having higher amount of takedown accuracy and higher amount of knockdowns during a match will increase the likelihood of a win across all conditions. Interestingly enough, this also presents striking accuracy as a negative effect on KO/TKO and Submission Win conditions; illustrating that more dynamic or aggressive moves that result in Knockdowns or submissions could be more important in a match. From this, merely connecting punches may not be enough and negatively impact the method of winning in a UFC match.
To summarize, this exploratory analysis into UFC fighter data provides more context into predictors needed to determine a win across the different methods of winning a match. This methodology provides a proof of concept that could be adapted to accommodate a wider sample size and enhance the predictability of these variables across matches, providing ample opportunity to examine or swap variables of interest to increase the predictive power of this model and the applicability of these findings to UFC coaches, players, and enthusiasts.
Conclusion
Based off of this, …