Salford Systems logo white space
Navigation
white space
white space
white space
white space
white space
Support > White papers > Artificial Intelligence and Data Mining
Artificial Intelligence and Data mining

Data mining, sometimes referred to as "machine learning," is based on automated methods for pattern discovery and general learning from data. Modern data mining technology can handle all three primary learning tasks, classification, regression, and clustering. Classification works by labeling an object: Is something a bus or a truck (robotic vision), a high risk or low risk borrower (finance), an ordinary piece of plastic or an explosive (security)? Regression is used to predict a continuous measure. Examples from our machine learning consulting experience include predicting nitrogen oxide levels produced by an engine or forecasting how much a credit card member will charge to a card.



Clustering, also known as "unsupervised learning," is used to find groups in data. Often such investigations attempt to find a small to moderate number of prototypes that can represent a much larger volume of data. For example, market researchers may summarize customers by thinking of each customer as a member of a type, ignoring individual differences. Clustering methods recently have been used in gene research as well. By examining the DNA in the tumors of a number of cancer patients, common patterns of gene expression have been found. Practitioners of image compression call clustering vector quantization; an image is approximated by replacing a large number of unique pixel patterns with a small number of judiciously chosen prototypical pixel patterns.

Salford Systems has been working with leading machine learning researchers from UC Berkeley and Stanford University for the past 11 years. Two of those researchers, Leo Breiman and Charles Stone, have been elected to the National Academy of Sciences, and Jerome Friedman has served both as a Director of the Stanford Linear Accelerator Center (SLAC) and as chairman of the Stanford Statistics Department. Based on their extraordinary research and source code, we have released what we believe are the finest classification and regression tools available -- CART(R) and MARS(R). This site contains white papers, FAQs, and fully functional evaluation versions of the software. Please start with our home page or our site map for further information.

TreeNet® ™ is the latest addition to our suite of machine learning methods developed by Jerome Friedman, and is expected to release in December, 2001. A brief description appears in our TreeNet® FAQ.

We expect V-CART™ (vector CART) and other clustering methods to become available in early 2004.
white space
© Copyright 2003-2004 Salford Systems - Print this page white space