# Statistics

## Scientific Data Analysis

If you have not done much experimental scientific work, start by reading the references on scientific data analysis in the basic laboratory procedure pages. These will explain simple statistical procedures, including simple least-squares regression. These techniques may prove sufficient for your purposes. Furthermore, you will have an easier time understanding the statistics texts if you have mastered this basic background.

## Basic statistics

There are many elementary introductions to statistics. For computer vision, choose a text which claims to be written for students in science and engineering. Such texts have a more practical (sometimes computational) approach than texts for mathematics and statistics majors. They have a better choice of topics than texts for students in the social sciences. A particularly readable book is

• Chatfield, Christopher (1983) Statistics for Technology: a Course in Applied Statistics, Chapman and Hall, London.

Two introductory books which require somewhat more mathematical maturity and/or more effort on your part:

• Papoulis, Athanasios (1991) Probability, Random Variables, and Stochastic Processes (third edition) McGraw-Hill, NY.
• Peter J. Bickel and Kjell A. Doksum, Mathematical Statistics: Basic Ideas and Selected Topics, Holden-Day, Oakland CA, 1977.

The book by Papoulis has been particularly popular in computer vision. Therefore, you may find its choice of topics is particularly appropriate when trying to understand computer vision publications.

## Robust Statistics

Researchers in "robust statistics" have extended standard statistical techniques to work (a) in the presence of outliers (occasional extremely wrong values) and (b) when the real data may be generated by a mechanism which is similar to, but does not belong to, the class of theoretical models used to analyze the data. The robust methods are better able to cope with the types of data found in real scientific data sets.

Good introductions to robust statistics can be found in

• Hoaglin, D.C., Mosteller, F., and Tukey, J.W., eds. (1983) Understanding Robust and Exploratory Data Analysis, John Wiley, New York.
• the first chapter of Frank R. Hampel, Elvezio M. Ronchetti, Peter J. Rousseeuw, and Werner A. Stahel, Robust Statistics: The Approach Based on Influence Functions, John Wiley, New York, 1986.

Robust methods for regression are described in

• Peter J. Rousseeuw and Annick M. Leroy, Robust Regression and Outlier Detection, John Wiley, New York, 1987.

A similar method apparently worked out independently, with somewhat less analysis, by Fischler and Bolles

• Fischler, Martin A. and Robert C. Bolles (1981) "Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography," CACM 24/6, pp. 381--395.

## Grouping and Cluster Analysis

Grouping and cluster analysis algorithms are still a black art. For a nice survey of the state of play, including problems with existing techniques, from the point of view of robust statistics, see

• Kaufman, Leonard and Peter J. Rousseeuw (1990) Finding Groups in Data: an Introduction to Cluster Analysis, John Wiley, NY

Do not, however, expect such techniques to work miracles. In particular, if your data is low-dimensional and you can't see the cluster divisions in suitable scatterplots, you should not expect the algorithm to magically find the distinctions. Consider whether you really have a reliable cluster structure, with clear separation between the clusters!

## Multivariate Analysis

Multivariate analysis techniques analyze how scalar output values depend on many input variables. For example, principal component analysis attempts to locate the input variables, or combinations of input variables, which have the greatest influence on the output values. A very nice book on multivariate analysis is

• Dillon, William R. and Matthew Goldstein (1984) Multivariate Analysis: Methods and Applications, John Wiley, New York.

## Classification

Classification algorithms are given a set of model distributions, and a test value, and asked to determine which model distribution the test value is most likely to belong to. Two good references are

• Therrien, Charles W. (1989) Decision, Estimation, and Classification, John Wiley, New York.
• Dillon, William R. and Matthew Goldstein (1984) Multivariate Analysis: Methods and Applications, John Wiley, New York.

## Random number generation

It is frequently necessary to generate random numbers. Or, more precisely, pseudo-random numbers. A fun, readable book on how to do this, and how not to do this, is

• Ripley, Brian D. (1987) Stochastic Simulation, John Wiley, New York.

## Spherical data

Most statistical techniques work only if values come from a linear space. In computer vision, we must occasionally analyze values from a circular or spherical space, such as 2D or 3D orientations. The only references I have seen on the subject, which fortunately seem to be fairly readable, are

• Fisher, N. I., (1993) Statistical Analysis of Circular Data, Cambridge University Press, Cambridge, UK.
• Fisher, N. I., T. Lewis, and E. J. J. Embleton (1987) Statistical Analysis of Spherical Data, Cambridge University Press, Cambridge, UK.