2019 PharmSci 360
Whenever a solitary model is presented without reference to its development, for all practical purposes said model is effectively correct *by assumption*, whether it's actually a good model or not. Hence discriminating between models is an essential part of model development. For decades modelers and statisticians have used random variables such as P-values and penalized likelihoods (e.g., AIC), based only on the original data and some models of interest, as the predominant basis for model selection. Such approaches to model development precede the modern era of powerful computing but have nevertheless persisted in everyday practice. Scientists behave as if only a single realization of said random variable had large sample asymptotic generality regarding the unobserved, underlying population of interest suitable for making trustworthy decisions regarding models. Unfortunately this is just not so; P-values for example are not at all robust to sampling variation and are basically only valid to an order of magnitude. Furthermore, model development in pharmacometrics usually follows the forward addition, backward elimination paradigm, where the goal is to improve the fit between a data set and a working model under development. The problem is that working to improve the fit can easily, and not surprisingly, lead to over-fitting, which degrades predictive performance. Fortunately cross-validation offers an objective and robust way to assess, and therefore rank, models based on their predictive abilities. Furthermore the nonparametric bootstrap can be used as a basis for cross-validation; such bootstrap cross-validation (BS-CV) lies between the extremes of leave-one-out cross-validation and 2-fold cross-validation, and as such has overall nice bias versus variance properties. BS-CV allows model selection to be made on the basis of predictive ability by comparing the median values of ensembles of summary statistics of testing data. BS-CV is herein demonstrated using using real data and several summary statistics, including a new one termed the simple metric for prediction quality (SMPQ). Of note the two best PK models by AIC had the worst predictive ability, underscoring the danger of using single realizations of a random variable (such as AIC) as the basis for model selection.