Q1. What is MARS?
A1. Multivariate Adaptive Regression Splines was developed in the early 1990s by world-renowned Stanford physicist and statistician Jerome Friedman. It is an innovative, flexible modeling tool that automates the building of accurate predictive models for continuous and binary dependent variables. It excels at finding optimal variable transformations and interactions, the complex data structure that often hides in high-dimensional data. This approach to regression modeling effectively uncovers important data patterns and relationships that are difficult, if not impossible, for other methods to reveal.
Return to top
Q2. How does MARS help analysts with regression modeling?
A2. The major advantage of MARS is that it automates aspects of regression modeling that are difficult and time-consuming. These include:
- selecting which predictor variables to use
- handling missing values
- transforming variables, accounting for non-linear relationships
- detecting interactions
- self-testing, ensuring that the model will perform well on future data
Return to top
Q3. How does MARS differ from conventional regression?
A3. Conventional regression models typically fit straight lines to data. MARS approaches model construction more flexibly, allowing for bends, thresholds, and other departures from straight-line methods. MARS builds its model by piecing together a series of straight lines with each allowed its own slope. This permits MARS to trace out any pattern detected in the data.
Return to top
Q4. How does MARS construct its models?
A4. MARS starts from the premise that most relevant variables affect the outcome in a complex way. Therefore, when MARS considers whether to add a variable, it simultaneously searches for appropriate break points – knots. Models are constructed in a two-phase procedure. Phase I tests variables and potential knots, resulting in an overfit model. Phase II eliminates redundant factors and components that do not stand up to testing.
Return to top
Q5. What control over modeling does MARS provide the user?
A5. MARS offers the user extensive control over the model development process. A number of user-defined parameters are available, including:
- selecting variables to have straight line effects – no knots
- specifying a minimum distance between knots
- permitting interactions between select variables only
- permitting interactions only to a specified degree of complexity
- controlling model complexity
Return to top
Q6. How does MARS handle missing values?
A6. MARS automatically creates a missing value indicator – a dummy variable – that becomes one of the available predictors. These dummy variables represent the absence or the presence of data for the predictor variables in focus.
Return to top
Q7. How does MARS ensure that a model will perform as claimed on future data?
A7. Almost all modeling technologies can track training data accurately. MARS protects users from misleading results through its two-stage modeling process. MARS overfits its model initially but then prunes away all components that would not hold up with new data. MARS provides assessments through use of one of two built-in testing regimens: cross validation or reference to independent test data. Using these tests, MARS determines the degree of accuracy that can be expected from the best predictive model.
Return to top
Q8. How can MARS models be implemented for predictive purposes?
A8. A MARS predictive model can be implemented in two ways. First, new databases can be scored directly by identifying the MARS model and the data to be scored. MARS will perform all the required data transformations and calculations automatically and output the predicted scores. Second, the MARS predictive equation can be exported as ready-to-run C and SAS®-compatible code that can be deployed in the user’s application framework.
Return to top
Q9. How does MARS compare with neural nets?
A9. MARS is not a black box. It is faster, more interpretable, and more accurate than neural nets.
Return to top
Q10. Why is MARS better than a decision tree for regression?
A10. MARS is capable of predicting with much higher resolution and accuracy, typically producing unique scores for every record in a database. In this way, MARS expands on the capabilities of decision trees for regression.
Return to top

