You are here

Probability

20 January, 2016 - 10:05

Mathematically, the probability that something will happen can be specified with a number ranging from 0 to 1, with 0 representing impossibility and 1 representing certainty. If you flip a coin, heads and tails both have probabilities of 1/2. The sum of the probabilities of all the possible outcomes has to have probability 1. This is called normalization.

media/image5.png
Figure 4.5 Normalization
the probability of picking land plus the probability of picking water adds up to 1.

So far we’ve discussed random processes having only two possible outcomes: yes or no, win or lose, on or off. More generally, a random process could have a result that is a number. Some processes yield integers, as when you roll a die and get a result from one to six, but some are not restricted to whole numbers, e.g., the height of a human being, or the amount of time that a uranium-238 atom will exist before undergoing radioactive decay. The key to handling these continuous random variables is the concept of the area under a curve, i.e., an integral.

media/image6.png
Figure 4.6 Probability distribution for the result of rolling a single die.
 

Consider a throw of a die. If the die is “honest,” then we expect all six values to be equally likely. Since all six probabilities must add up to 1, then probability of any particular value coming up must be 1/6. We can summarize this in a graph, f. Areas under the curve can be interpreted as total probabilities. For instance, the area under the curve from 1 to 3 is 1/6+1/6+1/6 = 1/2, so the probability of getting a result from 1 to 3 is 1/2. The function shown on the graph is called the probability distribution.

media/image7.png  
Figure 4.7 Rolling two dice and adding them up.
 

Figure 4.7 shows the probabilities of various results obtained by rolling two dice and adding them together, as in the game of craps. The probabilities are not all the same. There is a small probability of getting a two, for example, be- cause there is only one way to do it, by rolling a one and then another one. The probability of rolling a seven is high because there are six different ways to do it: 1+6, 2+5, etc.

If the number of possible outcomes is large but finite, for example the number of hairs on a dog, the graph would start to look like a smooth curve rather than a ziggurat.

What about probability distributions for random numbers that are not integers? We can no longer make a graph with probability on the y axis, because the probability of getting a given exact number is typically zero. For instance, there is zero probability that a per- son will be exactly 200 cm tall, since there are infinitely many possible results that are close to 200 but not exactly two, for example 199.99999999687687658766. It doesn’t usually make sense, therefore, to talk about the probability of a single numerical result, but it does make sense to talk about the probability of a certain range of results. For instance, the probability that a randomly chosen person will be more than 170 cm and less than 200 cm in height is a perfectly reasonable thing to discuss. We can still summarize the probability in- formation on a graph, and we can still interpret areas under the curve as probabilities.

media/image8.png
Figure 4.8 A probability distribution for human height.
 

But the y axis can no longer be a unitless probability scale. In the example of human height, we want the x axis to have units of meters, and we want areas under the curve to be unitless probabilities. The area of a single square on the graph paper is then

\textrm{(unitless area of a square)}= \textrm{(width of square with distance units)}\times \textrm{(height of square)}

If the units are to cancel out, then the height of the square must evidently be a quantity with units of inverse centimeters. In other words, the y axis of the graph is to be interpreted as probability per unit height, not probability.

Another way of looking at it is that the y axis on the graph gives a derivative, dP/dx: the infinitesimally small probability that x will lie in the infinitesimally small range covered by dx.

Example

A computer language will typically have a built-in subroutine that produces a fairly random number that is equally likely to take on any value in the range from 0 to 1. If you take the absolute value of the difference between two such numbers, the probability distribution is of the form dP/dx=k(1-x). Find the value of the constant k that is required by normalization.

\begin{align*} 1 &=\int_{0}^{1}(1-x)dx \\ &=kx-\frac{1}{2}kx^2\mid ^1_0 \\ &=k-k/2 \\ k&=2 \end{align*}

Self-Check.

Compare the number of people with heights in the range of 130-135 cm to the number in the range 135-140.

Answers to self-checks for chapter 4

Figure 4.9
The average can be interpreted as the balance point of the probability distribution.
 

When one random variable is related to another in some mathematical way, the chain rule can be used to relate their probability distributions.

Example


 

media/image9.png
Figure 4.10

 

A laser is placed one meter away from a wall, and spun on the ground to give it a random direction, but if the angle u shown in Figure 4.10 doesn’t come out in the range from 0 to \pi/2, the laser is spun again until an angle in the desired range is obtained. Find the probability distribution of the distance x shown in the figure. The derivative \textrm{dtan}^{-1}z/dz=1/(1+z^2) will be required (see Example).

Since any angle between 0 and \pi/2 is equally likely, the probability distribution dP/du must be a constant, and normalization tells us that the constant must be dP/du=2/\pi.

The laser is one meter from the wall, so the distance x, measured in meters, is given by x= \textrm{tan }u. For the probability distribution of x , we have

\begin{align*} \frac{dP}{dx}&=\frac{dP}{du}\cdot \frac{du}{dx} \\ &= \frac{2}{\pi}\cdot \frac{\textrm{dtan}^{-1}x}{dx}\\ &= \frac{2}{\pi(1+x^2)} \end{align*}

Note that the range of possible values of x theoretically extends from 0 to infinity. Problem 6.7 deals with this.

If the next Martian you meet asks you, “How tall is an adult hu- man?,” you will probably reply with a statement about the average human height, such as “Oh, about 5 feet 6 inches.” If you wanted to explain a little more, you could say, “But that’s only an average. Most people are somewhere between 5 feet and 6 feet tall.” Without bothering to draw the relevant bell curve for your new extraterrestrial acquaintance, you’ve summarized the relevant information by giving an average and a typical range of variation. The average of a probability distribution can be defined geometrically as the horizontal position at which it could be balanced if it was constructed out of cardboard, i. This is a different way of working with averages than the one we did earlier. Before, had a graph of y versus x, we implicitly assumed that all values of x were equally likely, and we found an average value of y. In this new method using probability distributions, the variable we’re averaging is on the x axis, and the y axis tells us the relative probabilities of the various x values.

For a discrete-valued variable with n possible values, the average would be\bar{x}=\sum_{i=0}^{n}xP(x)

and in the case of a continuous variable, this becomes an integral,
\bar{x}=\int_{a}^{b}x\frac{dP}{dx}dx

Example

For the situation described in Example, find the average value of x.

\begin{align*} \bar{x} &=\int_{0}^{1}x\frac{dP}{dx}dx \\ &=\int_{0}^{1}x\cdot 2(1-x)dx \\ &=2\int_{0}^{1}(x-x^2)dx \\ &=2\left ( \frac{1}{2}x^2-\frac{1}{3}x^3 \right )\mid ^1_0 \\ &= \frac{1}{3} \end{align*}

Sometimes we don’t just want to know the average value of a certain variable, we also want to have some idea of the amount of variation above and below the average. The most common way of measuring this is the standard deviation, defined by

\sigma =\sqrt{\int_{a}^{b}(x-\bar{x})^2\frac{dP}{dx}dx}

The idea here is that if there was no variation at all above or below the average, then the quantity (x-\bar{x}) would be zero whenever dP/dx was nonzero, and the standard deviation would be zero. The reason for taking the square root of the whole thing is so that the result will have the same units as x.

Example

For the situation described in exam- ple 59, find the standard deviation of x .

The square of the standard deviation is

\begin{align*} \sigma ^2 &=\int_{0}^{1}(x-\bar{x})^2\frac{dP}{dx}dx \\ &=\int_{0}^{1}(x-1/3)^2\cdot 2(1-x)dx \\ &=2\int_{0}^{1}\left ( -x^3+\frac{5}{3}x^2-\frac{7}{9}x+\frac{1}{9} \right )dx \\ &= \frac{1}{18} \end{align*}  so the standard deviation is
\begin{align*} \sigma &=\frac{1}{\sqrt{18}} \\ &\approx 0.236 \end{align*}