Zu den Inhalten springen

Multivariable Model-building

A pragmatic approach to regression analysis based on fractional polynomials for modelling continous variables

Patrick Royston and Willi Sauerbrei, Wiley Series in Probability and Statistics, Wiley, 2008

Additional material including datasets, programs and teaching material:

Book Description:

Multivariable regression models are widely used in all areas of science in which empirical data are analysed. Using the multivariable fractional polynomials (MFP) approach this book focuses on the selection of important variables and the determination of functional form for continuous predictors. Despite being relatively simple, the selected models often extract most of the important information from the data. The authors have chosen to concentrate on examples drawn from medical statistics, although the MFP method has applications in many other subject-matter areas as well.

Multivariable Model-Building:

  • Focuses on normal-error models for continuous outcomes, logistic regression for binary outcomes and Cox regression for censored time-to-event data.
  • Concentrates on fractional polynomial models and illustrates new approaches to model critisism and stability.
  • Provides comparisons with and discussion of other techniques such as spline models.
  • Features new strategies on modelling interactions with continuous covariates which are important in the context of randomized trials and observational studies
  • Does not consider high-dimensional data, such as gene expression data.
  • Is illustrated throughout with working examples from 23 substantial real datasets, most  data sets and programs in Stata are available on a website enabling the reader to apply techniques directly
  • Is written in an accessible and informal style making it suitable for researchers from a range of disciplines with minimal mathematical background.

This book provides a readable text giving the rationale of, and practical advice on, a unified
approach to multivariable modelling. It aims to make multivariable model building  simpler, transparent and more effective. This book is aimed at graduate students studying regression modelling and professionals in statistics as well as researchers from medical, physical, social and many other sciences where regression models play a central role.

Table of Contents:

1. Introduction
2. Selection of variables
3. Handling categorical and continous predictors
4. Fractional polynomials for one variable
5. Some issues with univariate FP models
6. MFP: multivariable model-building with fractional polynomials
7. Interactions
8. Model stability
9. Some comparisons of MFP with splines
10. How to work with MFP
11. Special topics involving fractional polynomials
12. Epilogue
Appendix A: Data and software resources 
Appendix B: Glossary of Abbreviations


For more details about the data see the Appendix A of the book.

Datasets used once in our book:

No. Name Outcome Obs Events Vars
01 Myeloma Survival 65 48 16
02 Freiburg DNA breast cancer Survival 109 56 1
03 Cervix cancer Binary 899 141 21
04 Nerve conduction Cont. 406 N/A 1
05 Triceps skinfold thickness Cont. 892 N/A 1
06 Diabetes Cont. 42 N/A 2
07 Advanced prostate cancer Survival 475 338 13
08 Quit smoking study Cont. 250 N/A 3
09 Breast cancer diagnosis Binary 458 133 6
10 Boston housing Cont. 506 N/A 13
11 Pima Indians Binary 768 268 8
12 Rotterdam breast cancer Survival 2982  1518 11
13 Fetal growth Cont. 574 N/A 1
14 Cholesterol (not available) Cont. 553 N/A 1

Datasets used more than once in our book:

No. Name Outcome Obs Events Vars
15 Research body fat Cont. 326 N/A 1
16 GBSG breast cancer             Survival 686 299 9
17 Educational body fat Cont. 252 N/A 13
18 Glioma Survival 411 274 15
19 Prostate cancer Cont. 97 N/A 7
20 Whitehall 1 Survival 17260 2576 10
  Whitehall 1 Binary 17260 1670 10
21 PBC Survival 418 161 17
22 Oral cancer Binary 397 194 1
23 Kidney cancer Survival 347 322 10

Simulated data set from chapter 10:

ART Study Cont. 250 N/A 10

Extended to 10 replicates of 500 observations, altogether 5000 obervations.

Dataset references, background or analyses:

 1. Myeloma
     Krall, J. M., Uthoff, V. A. and Harley, J. B. (1975). A step-up procedure for selecting variables
     associated with survival, Biometrics 31: 49-57.

2. Freiburg DNA breast cancer
     Pfisterer, J., Kommoss, F., Sauerbrei, W., Menzel, D., Kiechle, M., Giese, E., Hilgarth, M. and
     Pfleiderer, A. (1995). DNA flow cytometry in node positive breast cancer: Prognostic value
     and correlation to morphological and clinical factors, Analytical and Quantitative Cytology and
     Histology 17: 406-412

3. Cervix cancer   
     Collett, D. (2003). Modelling binary data, second edn, Chapman & Hall/CRC, Boca Raton.

4. Nerve conduction (no reference)

5. Triceps skinfold thickness
     Cole, T. J. and Green, P. J. (1992). Smoothing reference centile curves: the LMS method and penalized
     likelihood, Statistics in Medicine 11: 1305-1319.

6. Diabetes
     Sockett, E. B., Daneman, D., Clarson, C. and Ehrich, R. M. (1987). Factors affecting and patterns
     of residual insulin secretion during first year of Type I (insulin-dependent) diabetes mellitus in
     children, Diabetologia 30: 453–459.

7. Advanced prostate cancer
      Byar, D. P. and Green, S. B. (1980). The choice of treatment for cancer patients based on covariate information:
      application to prostate cancer, Bulletin du Cancer 67: 477–490.

8. Quit smoking study
      Cohen, J., Cohen, P., West, S. G. and Aiken, L. S. (2003). Applied Multiple Regression/Correlation
      Analysis for the Behavioral Sciences, third edn, Lawrence Erlbaum Associates, New Jersey.

9. Breast cancer diagnosis
      Sauerbrei, W., Madjar, H. and Prömpeler, H. J. (1998). Differentiation of benign and malignant breast
      tumors by logistic regression and a classification tree using Doppler flow signals, Methods of
      Information in Medicine 37: 226–234.

10. Boston housing
       Harrison, D. and Rubinfeld, D. L. (1978). Hedonic house prices and the demand for clear air, Journal
       of Environmental Economics and Management 5: 81-102.

11. Pima Indians
      Royston, P. (2005). Multiple imputation of missing values: update of ICE, Stata Journal 5: 527-536.

12. Rotterdam breast cancer
      Sauerbrei, W., Royston, P. and Look, M. (2007). A new proposal for multivariable modelling
      of time-varying effects in survival data based on fractional polynomial time-transformation,
      Biometrical Journal 49: 453-473.

13. Fetal growth
      Altman, D. G. and Chitty, L. S. (1993). Design and analysis of studies to derive charts of fetal size,
      Ultrasound in Obstetrics and Gynecology 3: 378-384

14. Cholesterol dataset (not available)
     Mann, J. I., Lewis, B., Shepherd, J.,Winder, A. F., Fenster, S., Rose, L. and Morgan, B. (1988). Blood
     lipid concentrations and other cardiovascular risk factors: distribution, prevalence and detection in
     Britain, British Medical Journal 296: 1702–1706.

15. Research body fat
      Luke, A., Durazo-Arvizu, R. and others (1997). Relation between body mass index and body fat in
      black population samples from Nigeria, Jamaica, and the United States, American Journal of
      Epidemiology 145: 620-628.

16. GBSG breast cancer
      Sauerbrei, W. and Royston, P. (1999). Building multivariable prognostic and diagnostic models:
      transformation of the predictors using fractional polynomials, Journal of the Royal Statistical
      Society, Series A 162: 71-94.

17. Educational body fat
      Johnson, R. W. (1996). Fitting percentage of body fat to simple body measurements, Journal of
      Statistics Education 4(1).

18. Glioma
      Sauerbrei, W. and Schumacher, M. (1992). A bootstrap resampling procedure for model building:
      application to the Cox regression model, Statistics in Medicine 11: 2093–2109.

19. Prostate cancer
      Stamey, T. A., Kabalin, J. N., McNeal, J. E., Johnstone, I. M., Freiha, F., Redwine, E. A. and Yang, N.
      (1989). Prostate specific antigen in the diagnosis and treatment of adenocarcinoma of the
      prostate. ii. radical prostatectomy treated patients, Journal of Urology 141: 1076–1083.

20. Whitehall 1
     Royston, P., Ambler, G. and Sauerbrei, W. (1999). The use of fractional polynomials to model
     continuous risk variables in epidemiology, International Journal of Epidemiology 28: 964-974.

21. PBC
     Fleming, T. R. and Harrington, D. P. (1991). Counting Processes and Survival Analysis, JohnWiley &
     Sons, Ltd/Inc., NewYork.

22. Oral cancer
      Rosenberg, P. S., Katki, H., Swanson, C. A., Brown, L. M., Wacholder, S. and Hoover, R. N. (2003).
      Quantifying epidemiologic risk factors using nonparametric regression: model selection remains the
      greatest challenge, Statistics in Medicine 22: 3369-3381.

23. Kidney cancer
      Royston, P., Sauerbrei, W. and Ritchie, A. W. S. (2004). Is treatment with interferon-α effective in
      all patients with metastatic renal carcinoma? A new approach to the investigation of interactions,
      British Journal of Cancer 23: 794–799.

Programs (only Stata programs are available):


This website was last updated 2011-02-28.

In 2016 we released the MFP website http://mfp.imbi.uni-freiburg.de/.