Chain rule (probability)

In probability theory, the chain rule[1] (also called the general product rule[2][3]) describes how to calculate the probability of the intersection of, not necessarily independent, events or the joint distribution of random variables respectively, using conditional probabilities. This rule allows one to express a joint probability in terms of only conditional probabilities.[4] The rule is notably used in the context of discrete stochastic processes and in applications, e.g. the study of Bayesian networks, which describe a probability distribution in terms of conditional probabilities.

Chain rule for events

edit

Two events

edit

For two events   and  , the chain rule states that

 ,

where   denotes the conditional probability of   given  .

Example

edit

An Urn A has 1 black ball and 2 white balls and another Urn B has 1 black ball and 3 white balls. Suppose we pick an urn at random and then select a ball from that urn. Let event   be choosing the first urn, i.e.  , where   is the complementary event of  . Let event   be the chance we choose a white ball. The chance of choosing a white ball, given that we have chosen the first urn, is   The intersection   then describes choosing the first urn and a white ball from it. The probability can be calculated by the chain rule as follows:

 

Finitely many events

edit

For events   whose intersection has not probability zero, the chain rule states

 

Example 1

edit

For  , i.e. four events, the chain rule reads

 .

Example 2

edit

We randomly draw 4 cards (one at a time) without replacement from deck with 52 cards. What is the probability that we have picked 4 aces?

First, we set  . Obviously, we get the following probabilities

 .

Applying the chain rule,

 .

Statement of the theorem and proof

edit

Let   be a probability space. Recall that the conditional probability of an   given   is defined as

 

Then we have the following theorem.

Chain rule —  Let   be a probability space. Let  . Then

 
Proof

The formula follows immediately by recursion

 

where we used the definition of the conditional probability in the first step.

Chain rule for discrete random variables

edit

Two random variables

edit

For two discrete random variables  , we use the events  and   in the definition above, and find the joint distribution as

 

or

 

where   is the probability distribution of   and   conditional probability distribution of   given  .

Finitely many random variables

edit

Let   be random variables and  . By the definition of the conditional probability,

 

and using the chain rule, where we set  , we can find the joint distribution as

 

Example

edit

For  , i.e. considering three random variables. Then, the chain rule reads

 

Bibliography

edit
  • René L. Schilling (2021), Measure, Integral, Probability & Processes - Probab(ilistical)ly the Theoretical Minimum (1 ed.), Technische Universität Dresden, Germany, ISBN 979-8-5991-0488-9{{citation}}: CS1 maint: location missing publisher (link)
  • William Feller (1968), An Introduction to Probability Theory and Its Applications, vol. I (3 ed.), New York / London / Sydney: Wiley, ISBN 978-0-471-25708-0
  • Russell, Stuart J.; Norvig, Peter (2003), Artificial Intelligence: A Modern Approach (2nd ed.), Upper Saddle River, New Jersey: Prentice Hall, ISBN 0-13-790395-2, p. 496.

References

edit
  1. ^ Schilling, René L. (2021). Measure, Integral, Probability & Processes - Probab(ilistical)ly the Theoretical Minimum. Technische Universität Dresden, Germany. p. 136ff. ISBN 979-8-5991-0488-9.{{cite book}}: CS1 maint: location missing publisher (link)
  2. ^ Schum, David A. (1994). The Evidential Foundations of Probabilistic Reasoning. Northwestern University Press. p. 49. ISBN 978-0-8101-1821-8.
  3. ^ Klugh, Henry E. (2013). Statistics: The Essentials for Research (3rd ed.). Psychology Press. p. 149. ISBN 978-1-134-92862-0.
  4. ^ Virtue, Pat. "10-606: Mathematical Foundations for Machine Learning" (PDF).
  NODES
Note 2