Background

A Bayesian network is a particular type learning network well-suited to classification problems, based on Bayesian statistical methods. Each node represents a variable, and associated with each node we keep information about the probability of that the variable has a particular value given any known information about the system. Edges between nodes indicate conditional dependence between the associated variables. Because the network encodes the probabilities for each variable conditioned on every other variable, the network as a whole represents a joint distribution for the variables in question.

We wish to use these networks to solve classification problems. To do this, we must somehow alter the node weights to reflect the true probabilities, rather than the prior probabilities. Then, given the state of the input variables, we can accurately predict the value of the output variables, as well as giving a measure of confidence in our choice.

The techniques for doing this come from Bayesian statistics. Given a prior probability distribution and new information, Bayes's rule allows us to calculate updated probabilities, known as posterior probabilities, that reflect our new data. The process of updating the node weights is known as Bayesian inference.

Unfortunately, the general problem of Bayesian inference is NP-Hard. Therefore, for large networks we must either perform an approximation to Bayesian inference or we must constrain the structure of our graph to simplify the inference process. There are many inference techniques in use today, each with its own advantages.