10.1 Probability spaces.
10.2 Stochastic variables. Distribution and independence. Distribution functions. The normal distribution.
10.3 Expectation and variance. Expectation values. Characteristic functions.
10.4 Sums of stochastic variables, the law of large numbers, and the central limit theorem.
10.5 Probability and statistics, sampling.
10.6 Probability in physics.
10.7 Document. Jacob Bernoulli on the law of large numbers.


The word probability has to do with probing the truth. We live in a world full of uncertainties which we try to master by making conjectures about the true state of affairs and about the future. This process is an important part of our analysis of the world around us. It is natural that we should prefer certainty. Normally, situations are classified as absolutely dangerous or absolutely harmless and danger is avoided. We move cautiously on rough ground and, as pedestrians and drivers, keep large margins of security. But this kind of classification involves risks. After two or three similar experiences of the same phenomenon we are inclined to think that  it always occurs in the same way.

Insecurity is both a strain and a challenge. Forced to choose between alternatives whose consequences are not fully known, we may react with a feeling of pleasure and arousal provided the choice really means something. But the consequence of a wrong choice must not be too serious.
Then the situation is critical and we may perhaps have to mobilize all our mental resources, intellectual and emotional, to meet it, and failure may be destructive. The fascination of the unknown is so great that man has invented innumerable games where he can play with it under orderly conditions and without danger to his life.

Probability theory is a mathematical model of chance. It started as an analysis of games of chance and is now an extensive mathematical theory with applications to the social sciences, biology, physics, and chemistry. We shall give a short review of its foundations with an eye to the law of large numbers and the central limit theorem, and then touch upon some applications.

10.1 Probability spaces

In a textbook on the art of computing printed in Italy in 1494, its author Paciuolo says that if 6 plays are needed to win a game and two players interrupt the game when one of them has 5 plays and the other 2, then the sum at stake should be divided between the players in the proportion 5 to 2. This may seem reasonable, but the principle to share in proportion to the number of plays won is certainly not reasonable. Suppose, for instance,that we change the figures to 15 and 12 plays and that 16 are needed to win. The players would then get almost equal shares---but there is something wrong with this. In fact, he who has 15 plays just needs one more play to get everything while the other one needs 4 in a row. Some years later Cardano treated a similar problem. He understood that it is the remaining plays that have to be analyzed, not the ones already played. In Paciuolo’s problem one player needs one play to take everything and the other one needs four. Hence the rest of the game has five possible outcomes. The first player may win in the first, second, third, fourth play or not at all. Cardano wanted the sum at stake to be divided in the proportion $(1 + 2 + 3 + 4) : 1 = 10 : 1$. His motivation for this is obscure. The correct result is 15 : 1 and it follows by applying the principles of probability theory as formulated by Pascal and Fermat 100 years later. Both of them dealt with interrupted games, arriving at the same result but with different methods. Fermat’s solution of Paciuolo’s problem was to consider all possible outcomes of 4 plays. There are $2 . 2 . 2 . 2 = 16$ of them and the first player wins in all cases except one, namely, when all four plays go to his opponent. This kind of reasoning was immediately criticized on the ground that not all outcomes have to be played to the end, for instance, not when the first player wins the first game. This was countered by the argument that nothing changes if all outcomes are played to the end.

These episodes from the birth of probability theory follow a pattern that was to be repeated many times: a practical question is asked, wrong answers are proposed, a simple mathematical model reduces the answer to a triviality, and the discussion turns to the applicability of the model.

Fermat did not use the word probability, but he could have defined the probability that the first player wins as 15/16, i.e., the number of favorable cases divided by the number of all possible cases. In this definition it is, of course, assumed that all cases are equally possible. This condition is very often satisfied in combinatorial problems, e.g., favorable and possible deals in games of cards, throws of dice, and drawings from urns. There are simple and also very complicated combinatorial problems of this kind, but in principle they offer no difficulties. They fit beautifully into a universally accepted mathematical model for probability, the probability space, proposed by Kolmogorov in 1933. The simplest probability space is a finite set U equipped with a function $u \to P(u) \geq 0$ such that $\sum P(u) = 1$, where we sum over all u in U. The elements u of U are called elementary events, and P stands for probability. The probability $P(A)$ of a subset A of U is defined as $\sum P(u)$, where u runs over A. The subset A is called an event and should be thought of as the occurence of one of the elementary events of A. The function P, now extended to all subsets of U, has the property that

$$P(A\cup B) = P(A) + P(B) - P(A \cap B)\cdots \cdots (1)$$

In fact, every $P(u)$ with u in  A B appears precisely once in both $P(A)$ and $P (B)$. The empty set 0 is supposed to have probability 0. We have supposed that U is finite, but there is nothing that prevents U from being infinite provided $P(u) > 0$ for at most countably many u.

A function $P (A) \text{ $>$=} 0$ from subsets of a set U with the property (1) is called a measure and, if $P( U) = 1$, a probability measure. The general definition of a probability space is now simply a set U equipped with a probability measure P. In this definition, the elementary events have disappeared and it may happen that $P(u) = 0$ for all u in U. An example of this is U = R equipped with the measure

$$P(A) = \int _Af(x) \text{ dx}$$

where $f(x)\geq 0$ is such that $P(R) = 1$. With this we have slid into integration theory where, unfortunately, not all functions f and sets A are permitted to appear. We really ought to have added some technical reservations to our definition of a probability space. But we did not since we trust the reader to stand for some fogginess at this point.

Most games of chance and many other things can be thought of as probability spaces:
Throws with a coin. U has two elements, heads and tails, each with the probability 1/2.
Throws with a die. U has six elements 1, 2, 3, 4, 5, 6 each with the probability 1/6.
Win or lose. U has two elements, gain and loss, with probabilities p and q, $p + q = 1$.

Roulette. U has n elements $F_1,\cdots , F_n$ with the probabilities $P_1,\cdots , P_n$. We may think of $F_1,\cdots , F_n$ as sectors of a circular spinning disk, a roulette wheel, the area of $F_k$ being h times the area of the disk.

Bernoulli sequences. Playing win or lose n times, we get a probability space U consisting of $2^n$ sequences $u = \left(u_1,\cdots , u_n\right)$, where each $u_k$ is either a gain or a loss and $P (u) = p^rq^s$ where r is the number of gains and $s = n - r$ the number of losses of the sequence u. Note that the P(u) are the $2^n$ terms of the product $(p + q)^n$ when written as a sum. It follows that, for all k, $P \left(u_k = \text{ gain}\right) = p$ where the right side is the probability of the subset of U consisting of all u such that $u_k=\text{ gain}$ . We also get $P\left(u_k = \text{ loss}\right)= q = 1 - p$. Here the number n has disappeared and one can show that these equalities also define a probability measure on the set of infinite sequences $u = \left(u_1,u_2,\cdots \right)$ of gains and losses. The objects which we have now introduced are called Bernoulli sequences, after Jacob Bernoulli, who studied them in a book, Ars Conjectandi (1713).

Weather. U has two elements, beautiful and foul, with the probabilities 0.1 and 0.9.

Lotteries. U consists of all possible drawings of n tickets from$N > n$ numbered ones, all with the same probability. If the order between the n tickets counts, this probability is $(N - n)!/N!$ for then U has $N!/(N - n)!$ elements. If the order does not count, U has $\left(\begin{array}{c} N \\ n \\\end{array}\right) = N!/(N - n)!n!$ elements, and the probability is 1 divided by this number.

Races. U consists of the outcomes of 10 horse races with one winning horse out of 5 in each race. Every outcome has the same probability, $5^{-10}$.

Most of these examples are firmly anchored in the real world. Casinos and lotteries are stable enterprises built on reliable probability spaces. But the example with the weather is almost meaningless, and anyone betting on horse races according to the model above will go broke in no time.

The concept of a probability space is a radical axiomatization of our intuitive idea of probabilities. Finite probability spaces may seem unduly trivial from a purely mathematical point of view. But they become marvelous toys once we introduce stochastic variables and their expectation values.

10.6 Probability in physics

In the eighteenth century, Euler and Lagrange invented a simple model for the movements of fluids and gases. It combines Newtonian mechanics with simple properties of pressure and density, and it has been very successful. Somewhat later Fourier made a model of heat flow based on the fact that it is proportional to the temperature gradient. In these models the medium---a gas, a fluid, or a heat conductor---is considered to be homogeneous and its equilibrium states are simple to describe. A fluid or a gas not influenced by outer forces is in equilibrium when pressure and density are constant, and there is no heat flow when the temperature is constant. But through the progress of chemistry in the beginning of the nineteenth century it became clear that gases, fluids, and solids consist of more or less free-moving molecules and that heat is a form of mechanical energy. The movements of molecules follow the laws of Newton but it is hopeless to keep track of them one by one. On the other hand, it is possible to study them statistically, for instance, the distribution of energy at various states of equilibrium and how this distribution may change with time. This is done in statistical mechanics, founded by Clausius, Maxwell, and Boltzmann. Combining the laws of Newton with various versions of the law of large numbers, they succeeded in deducing some known macroscopic laws, for instance Boyle'S Law about the connection between pressure, temperature, and density of a gas and the laws of heat conduction. Statistical mechanics also has a branch in quantum mechanics which we cannot go into.
As an example of probability in physics we shall now describe a probability space with applications to heat conduction and Brownian motion, the motion of small particles in a fluid caused by impacts from the molecules of the fluid. This probability space consists of all continuous curves $t\to \xi (t)=\left(\xi _1(t),\xi _2(t),\xi _3(t)\right)$ in three-dimensional space. Here t0 is time and we suppose that $\xi (0)=0$. These curves correspond to all possible movements of a particle starting from the origin at time 0. The probability measure is such that the stochastic variables $\xi \to \xi _j\left(t_2\right)-\xi _j\left(t_1\right)$ where$j=1,2,3$ are independent and normally distributed with means 0 and variances $c^2\left(t_2-t_1\right)^2$ where c > 0 is a constant. Hence the frequency function of $\xi (t) = \xi (t) - \xi (0)$ is $f(t, x) = (2\pi c t) ^{-3/2} e^{-\left|x|^2/2c t\right.}$ where $\left|x|^2\right. =x_1^2+x_2^2+x_3^2$ and we can interpret c as the velocity of the movement in every direction. Putting $d x = d x_1d x_2d x_3$, the number $P (\xi (t)) \in A) = \int _Af(t, x) d x$ is the probability that our stochastic particle shall be in the region A at time t. In the classical macromodel of heat conduction, $f(t, x)$ is the temperature at time t and at the point x of a three-dimensional heat conductor with the heat conduction coefficient c into which, at time t = 0 and at x = 0, one unit of heat was introduced. We can also connect our stochastic variable with potential theory, for it is not difficult to verify that $\int _0^{\infty }d t\int _A f(t, x) d x= (2\pi c)^{-1}\int _A|x|^{-1}- d x$.
Here the left side is the time which, on an average, the particle spends in A and the right side is the Newtonian potential at the origin of a uniform mass distribution on A. This kind of a connection has sometimes been used to guess and prove results in potential theory.

Finally, there is a loose connection between our probability space and quantum mechanics that tickles the imagination. The frequency function $f(t, x)$ is a solution of the equation of heat conduction, $2\partial _tf(t, x) = c\left(\partial _1^2f (t, x) + \partial _2^2f(t, x) + \partial _3^2f(t, x)\right)$ where $\partial _t,\partial _1,\partial _2,\partial _3$ are the partial derivatives with respect to $t, x_1,x_2,x_3$ Changing t to it in the equation of heat conduction, we get one of the basic items of quantum mechanics, the Schrodinger equation. At the same time, the frequency function $f(t, x)$ turns into the complex function $(2\pi c t) ^{-3/2} e^{-\left|x|^2/2i c t\right.} ,(t>0)$. In this way our probability space of curves gets a complex measure which we can try to use to construct integrals. These so-called history integrals were invented by Feynman around 1950. So far they are not a bona fide mathematical tool, but they have played a considerable part in the intuitive arguments that are an indispensable part of quantum mechanics.

10.7 Document

Jacob Bernoulli on the law of large numbers
In his book Ars Conjectandi (1713), Jacob Bernoulli notes that probabilities are known a priori in games of dice or drawings from urns, but goes on to say:“But, I ask you, who among mortals will ever be able to define as so many cases, the number, e.g. , of the diseases which invade innumerable parts of the human body at any age and can cause our death? And who can say how much more easily one disease than another---plague than dropsy, dropsy than fever---can kill a man, to enable us to make conjectures as to what will be the future state of life or death? Who, again, can register the innumerable cases of changes to which the air is subject daily, to derive therefrom conjectures as to what will be its state after a month or even after a year? Again, who has sufficient knowledge of the nature of the human mind or of the admirable structure of the body to be able, in games depending on acuteness of mind or agility of body, to enumerate cases in which one or another of the Jacob Bernoulli 1654-1705 participants will win? Since such and similar things depend upon completely hidden causes, which, besides, by reason of the innumerable variety of combinations will forever escape our efforts to detect them, it would plainly be an insane attempt to get any knowledge in this fashion .”

Noting that unknown probabilities can be determined by repeated experiments and that many repetitions seem to increase the precision, he continues as follows, announcing the law of large numbers and advocating the use of statistics in medicine and meteorology.

    "Although this is naturally known to anyone, the proof based on scientific principles is by no means trivial, and it is our duty to explain it.
However, I would consider it a small achievement if I could only prove what everybody knows anyway. There remains something else to be considered, which perhaps nobody has thought of. Namely, it remains to inquire, whether by thus augmenting the number of experiments the probability of getting a genuine ratio between numbers of cases, in which some event may occur or fail, also augments itself in such a manner as finally to surpass any given degree of certitude; or whether the problem, so to speak, has its own asymptote; that is, there exists a degree of certitude which never can be surpassed no matter how the observations are multiplied; for instance that it never is possible to have a probability greater than 1/2, 2/3 or 3/4 that the real ratio has been attained. To illustrate this by an example, suppose that, without your knowledge, 3000 white stones and 2000 black stones are concealed in a certain urn, and you try to discover their numbers by drawing one stone after another (each time putting back the stone drawn before taking the next one, in order to change the number of stones in the urn) and notice how often a white or black stone appears. The question is, can you make so many drawings as to make it 10 or 100 or 1000 etc. times more probable (that is, morally certain) that the ratio of the frequencies of the white and black stones will be 3 to 2, as is the case with the number of stones in the urn, than any ratio different from that? If this were not true, I confess nothing would be left of our attempt to explore the number of cases by experiments. But if this can be attained and moral certitude can finally be acquired (how that can be done I shall show in the next chapter) we shall have cases enumerated a posteriori with almost the same confidence as if
they were known a priori. And that, for practical purposes, where “morally certain” is taken for “absolutely certain” by Axiom 9, Chap. II. is abundantly sufficient to direct our conjecture in any contingent matter not less scientifically than in games of chance.
    For if instead of an urn we take air or the human body, that contain in themselves sources of various changes or diseases as the urn contains stones, we shall be able in the same manner to determine by observations how much likely one event is to happen than another in these subjects.”

There is an overflow: of textbooks on probability and statistics. Willy Feller’s An Introduction to Probability and Its Applications (Wiley, 1968) faces mathematics squarely, is well written, and contains a lot of material.