The goals of this paper are to review the most popular methods of predictor selection in regression models, to explain why some fail when the number P of explanatory variables exceeds the number N of participants, and to discuss alternative statistical methods that can be employed in this case. We focus on penalized least squares methods in regression models, and discuss in detail two such methods that are well established in the statistical literature, the LASSO and Elastic Net. We introduce bootstrap enhancements of these methods, the BE-LASSO and BE-Enet, that allow the user to attach a measure of uncertainty to each variable selected. Our work is motivated by a multimodal neuroimaging dataset that consists of morphometric measures (volumes at several anatomical regions of interest), white matter integrity measures from diffusion weighted data (fractional anisotropy, mean diffusivity, axial diffusivity and radial diffusivity) and clinical and demographic variables (age, education, alcohol and drug history). In this dataset, the number P of explanatory variables exceeds the number N of participants. We use the BE-LASSO and BE-Enet to provide the first statistical analysis that allows the assessment of neurocognitive performance from high dimensional neuroimaging and clinical predictors, including their interactions. The major novelty of this analysis is that biomarker selection and dimension reduction are accomplished with a view towards obtaining good predictions for the outcome of interest (i.e., the neurocognitive indices), unlike principal component analysis that are performed only on the predictors' space independently of the outcome of interest.
ASJC Scopus subject areas
- Cognitive Neuroscience