Vorträge & Veranstaltungen
Alle Termine und weitere Informationen finden Sie hier.
Alle Termine und weitere Informationen finden Sie hier.
Monday, June 23, 2025 Time: 4:15 pm to 5:45 pm Location: Lecture Hall of the Institute of Medical Biometry and Statistics at Stefan-Meier-Str. 26
The talk will introduce the new PrInDT (Prediction and Interpretation in Decision Trees) approach for optimizing decision trees for classification and regression problems (cf. Weihs & Buschfeld 2021 for first ideas and Weihs & Buschfeld 2023 for the first version of a corresponding R-package). In the PrInDT approach, the model space is randomly searched for the tree with the highest accuracy on the full sample by means of repeated subsampling. Interpretability of the resulting trees can be controlled by restricting the model size.
Originally, the approach was developed for applications in linguistics and music data analysis. For first publications in Linguistics, see Buschfeld & Weihs 2024 and Buschfeld et al. 2024.
This talk deals with the application of the idea to distinguishing Parkinson's disease from healthy people as a classification example and the determination of the severity of Parkinson’s disease as a regression example. In both cases, voice properties indicating dysphonia are used as explanatory variables.
The classification dataset (cf. UCI repository, dataset 174) is composed of voice measurements from 32 people, 24 with Parkinson's disease. The dataset includes 22 different voice characteristics for 195 voice recordings from the individuals. The trees resulting from the PrInDT approach are easily interpretable and show only one classification error: one Parkinson’s diagnosed person was erroneously predicted as healthy.
The regression dataset (cf. UCI repository, dataset 189) deals with the prediction of two different UPDRS (Unified Parkinson’s Disease Rating Scale) scores by age, sex, and 16 different voice characteristics from 42 people, all with the diagnosis Parkinson’s disease. The dataset comprises 5875 voice recordings from the individuals. The trees resulting from the PrInDT approach predict the given UPDRS values with very small mean absolute deviation lower than 4.1 UPDRS points.
References
Weihs, C., Buschfeld, S. 2021. Combining Prediction and Interpretation in Decision Trees (PrInDT) -a Linguistic Example. arXiv: http://arxiv.org/abs/2103.02336.
Weihs, C., Buschfeld, S. 2023. PrInDT: Prediction and Interpretation in Decision Trees for Classification and Regression, R package version 1.01, url = https://CRAN.R-project.org/package=PrInDT
Buschfeld, S., Weihs, C. 2024. Statistical Modeling of Current Linguistic Realities Around the World: The Case of Singapore. In C. Weihs, W. Krämer, S. Buschfeld (eds.), Statistics today, Springer: 213-223
Buschfeld, S., Weihs, C., Ronan, P. 2024. Modeling Linguistic Landscapes: the case of St. Martin, Linguistic Landscape 10.3, 302-334, https://doi.org/10.1075/ll.23070.bus.