Machine Learning and Multivariate Statistics models to predict the position of the first division teams
Abstract
This research aims to find which models of Machine Learning and Multivariate Statistics have a greater predictive capacity when deciding what the team's classification will be at the end of the season. The teams that competed in the first division of the Bundesliga, Premier League, LaLiga, Ligue 1 and Serie A throughout the 2018-2019 season have been studied. The badly classified teams by the best of the models, the Random Forest with balanced data, were analyzed in-depth to determine the game's actions that caused the classification error. The results indicate that, generally, the effectiveness in front of goal and the possession of the ball are the statistics in which badly classified teams differ the most with the average of their real position. In conclusion, this research shows how Machine Learning and Multivariate Statistical techniques can be used successfully to discriminate between Top and Bottom teams competing in the best leagues in the world
Keywords
Full Text:
PDF (Español)References
Ajadi, T., Burton, Z., Dwyer, M., Hammond, T., & Ross, C. (2020). Deloitte Football Money League 2020 - Eye on the prize. Recuperado de: https://www2.deloitte.com/bg/en/pages/finance/articles/football-money-league-2020.html
Altman, N. S. (1992). An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician, 46(3), 175-185.
Barua, S., Islam, M. M., Yao, X., & Kazuyuki, M. (2014). MWMOTE—majority weighted minority oversampling technique for imbalanced data set learning. IEEE Transactions on knowledge and data engineering, 26(2), 405-425.
Boscá, J. E., Liern, V., Martínez, A., & Sala, R. (2009). Increasing offensive or defensive efficiency? An analysis of Italian and Spanish football. Omega, 37(1), 63-78.
Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123-140.
Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5-32.
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Monterrey, CA: Routledge.
Brito de Souza, D., López-Del Campo, R., Blanco-Pita, H., Resta, R., & Del Coso, J. (2019). An extensive comparative analysis of successful and unsuccessful football teams in LaLiga. Frontiers in Psychology, 10, 2566.
Budsaba, K., Smith, C. E., & Riviere, J. E. (2000). Compass plots: a combination of star plot and analysis of means to visualize significant interactions in complex toxicology studies. Toxicol Methods, 10(4), 313-332.
Castellano, J., Casamichana, D., & Lago, C. (2012). The use of match statistics that discriminate between successful and unsuccessful soccer teams. Journal of Human Kinetics, 31, 139.
Cordón, I., García, S., Fernández, A., & Herrera, F. (2018). Imbalance: Oversampling algorithms for imbalanced classification in R. Knowledge-Based Systems, 161, 329-341.
Driblab. (2020). Player Analysis. Recuperado de: https://www.driblab.com/servicios-driblab/player-analysis/
Espitia-Escuer, M., & Garcia-Cebrian, L. I. (2008). Measuring the efficiency of spanish first-division soccer teams. European Sport Management Quarterly, 8(3), 229-246.
Höskuldsson, A. (1988). PLS regression methods. Journal of Chemometrics, 2(3), 211-228.
Hormozi, H., Hormozi, E., & Nohooji, H. R. (2012). The classification of the applicable machine learning methods in robot manipulators. International Journal of Machine Learning and Computing, 2(5), 560-563.
Japkowicz, N. (2000, julio). Learning from imbalanced data sets: a comparison of various strategies. En AAAI workshop on learning from imbalanced data sets. (Vol. 68, págs. 10-15). Menlo Park: AAAI Press.
Knutson, T. (2020). StatsBomb presenta sus nuevas visualizaciones. Recuperado de: https://statsbomb.com/es/2020/01/statsbomb-presenta-sus-nuevas-visualizaciones/
Kolence, K. W., & Kiviat, P. J. (1973). Software unit profiles & Kiviat figures. ACM SIGMETRICS Performance Evaluation Review, 2(3), 2-12.
Kuhn, M. (2020). Caret: classification and regression training. R package version 6.0-86.
Lago, C. (2009). The influence of match location, quality of opposition, and match status on possession strategies in professional association football. Journal of Sports Sciences, 27(13), 1463-1469.
Lago‐Ballesteros, J., & Lago‐Peñas, C. (2010). Performance in team sports: identifying the keys to success in soccer. Journal of Human Kinetics, 25(1), 85-91.
Lago-Peñas, C., Lago-Ballesteros, J., Dellal, A., & Gómez, M. (2010). Game-related statistics that discriminated winning, drawing and losing teams from the Spanish soccer league. Journal of Sports Science & Medicine, 9(2), 288-293.
Lê, S., Josse, J., & Husson, F. (2008). FactoMineR: a package for multivariate analysis. Journal of Statistical Software, 25(1), 1-18.
Liu, H., Gómez, M.-A., Gonçalves, B., & Sampaio, J. (2016). Technical performance and match-to-match variation in elite football teams. Journal of Sports Sciences, 34(6), 509-518.
Liu, H., Yi, Q., Giménez, J.-V., Gómez, M.-A., & Lago-Peñas, C. (2015). Performance profiles of football teams in the UEFA Champions League considering situational efficiency. International Journal of Performance Analysis in Sport, 15(1), 371-390.
Lucey, P., Oliver, D., Carr, P., Roth, J., & Matthews, I. (2013, agosto). Assessing team strategy using spatiotemporal data. En Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (págs. 1366-1374). Chicago: Association for Computing Machinery.
Mark, C., & Sormaz, M. (2019). Clustering playing styles in the modern day full-back. Recuperado de: https://www.statsperform.com/resource/clustering-playing-styles-in-the-modern-day-full-back/
Maron, M. E. (1961). Automatic indexing: an experimental inquiry. Journal of the ACM (JACM), 8(3), 404-417.
Matthews, B. W. (1975). Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et Biophysica Acta (BBA)-Protein Structure, 405(2), 442-451.
Nisbet, R., Elder, J., & Miner, G. (2009). Handbook of statistical analysis and data mining applications. Academic Press.
Oberstone, J. (2009). Differentiating the top english premier league football clubs from the rest of the pack: identifying the keys to success. Journal of Quantitative Analysis in Sports, 5(3).
Opitz, D., & Maclin, R. (1999). Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Research, 11, 169-198.
Pérez, D. (2021). Radares en fútbol: para qué sirven y por qué están de moda. Recuperado de: https://objetivoanalista.com/radares-futbol/
Saary, M. J. (2008). Radar plots: a useful way for presenting multivariate health care data. Journal of Clinical Epidemiology, 61(4), 311-317.
Santos, M. S., Soares, J. P., Abreu, P. H., Araujo, H., & Santos, J. (2018). Cross-validation for imbalanced datasets: Avoiding overoptimistic and overfitting approaches. IEEE Computational Intelligence Magazine, 13(4), 59-76.
Sievert, C. (2020). Interactive web-based data visualization with R, plotly, and shiny. Florida: Chapman and Hall/CRC.
Stone, M. (1974). Cross validatory choice and assessement of statistical predictions. Journal of Royal Statistical Society B (Methodological), 36(2), 111-147.
Szymanska, E., Saccenti, E., Smilde, A. K., & Westerhuis, J. A. (2012). Double-check: validation of diagnostic statistics for PLS-DA models in metabolomics studies. Metabolomics, 8(1), 3-16.
Taylor, J. B., Mellalieu, S. D., James, N., & Shearer, D. A. (2008). The influence of match location, quality of opposition, and match status on technical performance in professional association football. Journal of Sports Sciences, 26(9), 885-895.
Team, R. C. (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
Westerhuis, J. A., Hoefsloot, H. C., Smit, S., Vis, D. J., Smilde, A. K., van Velzen, E. J., . . . van Dorsten, F. A. (2008). Assessment of PLSDA cross validation. Metabolomics, 4(1), 81-89.
Wold, S., Esbensen, K., & Geladi, P. (1987). Principal component analysis. Chemometrics and Intelligent Laboratory Systems, 2(1-3), 37-52.
Wold, S., Johansson, E., & Cocchi, M. (1993). PLS: partial least squares projections to latent structures. En H. Hubinyi (Ed.), 3D QSAR in Drug Design: Theory, Methods and Applications. (págs. 523-550). Leiden, Paises Bajos: ESCOM Science Publishers.
Zambom-Ferraresi, F., García-Cebrián, L. I., Lera-López, F., & Iráizoz, B. (2017). Performance evaluation in the UEFA Champions League. Journal of Sports Economics, 18(5), 448-470
Refbacks
- There are currently no refbacks.