Salford Systems logo white space
Navigation
white space
white space
white space
white space
white space
Products > CART > Technical Overview > Frequently Asked Questions > Cross Validation Breakdown
Cross-Validation Breakdown


V-fold cross validation works by partitioning your data into equal-sized segments and holding out one segment at a time for test purposes. If certain classes of the target variable have very small sample sizes it may not be possible to subdivide each class into v subsets. Since CART requires at least one case in each class to run, the cross-validation trees cannot be constructed.


SOLUTION 1: Fewer Cross Validations

By reducing the number of cross validations you may be able to generate the test runs. However, Breiman, Friedman, Olshen and Stone warn that using fewer than 10 cross validations can seriously overestimate the error rate of any tree.


SOLUTION 2: Dedicated Test Samples

Define your own test subsamples with a random number generator in a data step. The best way is to define a new variable, say TEST1, set to 1 for test cases, and 0 otherwise. By adjusting the random number seed you can repeat the process until the desired balance of cases within each class is obtained. The process can be repeated with separate random test samples identified with variables TEST2, TEST3, etc. Running half a dozen CART trees, each with a different test set, will let you know if your results are stable.


Steinberg, Dan and Phillip Colla. CART--Classification and Regression Trees. San Diego, CA: Salford Systems, 1997.
white space
© Copyright 2003-2004 Salford Systems - Print this page white space