2. Axioms of Probability.

An "experiment" is something we do that reveals something we didn't
know, that lessens our uncertainty.  Sometimes we'll call them "random
experiments", especially when the "experiment" is something like drawing
a card from a well- shuffled deck or rolling a die or tossing a coin,
but that's really just an attitude thing--- we can view presidential
elections, sporting events, or even looking to see how many pages the
textbook has as "random experiments", too, where we decrease (or
"resolve") uncertainty by observing something.  Some of the main things
we'll study in probability include:

  Outcome:          One of an exhaustive exclusive list of everything
                    that might happen in a random experiment.

     Examples:      3 coin tosses:  {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}
                    One roll die:   {1,2,3,4,5,6}
                    3 coin tosses:  {0,1,2,3}
                    Count pages:    {1,2,3,...,999,1000,...}     (514)
                    Wait for bus:   [0,oo)

  Event:            A thing that might happen, and then again might not
     Examples:      2 heads:        {HHT, HTH, THH}
                    Even number:    {2,4,6}
                    2 heads:        {2}
                    < 2 minutes:    [0,120)

  Random Variable:  A number that depends somehow on chance
     Examples:      # of heads:     {3,   2,   2,   1,   2,   1,   1,   0}
                    Initial heads:  {3,   2,   1,   1,   0,   0,   0,   0}
                    2^die:           2,   4,   8,  16,  32,  64}
                    Pages/class day: 514/28 = 18.36 (or 514/23 = 22.35)

----------
Apology:
    Probability and statistics are useful and interesting for all sorts
of real-world problems-- but when we're just beginning, it's easier to
think about artificially simple situations like rolling dice, tossing
coins, or dealing cards than it is to work out the details of realistic
problems like estimating pollution, predicting elections, or managing
stock portfolios.  I'll try to be more realistic sometimes, but please
bear with me through the early stages with unrealistic examples.  Thanks!   
----------

Mathematical objects:

    Sample Space     S            Set of all possible "outcomes"

    Events           E, F, ...    Subsets of S

    Random Variable  X, Y, ...    Functions  S -> |R (or -> |N or |R^2)


More about events:

    E "and" F:       Intersection    (Ross:  EF)
    E "or"  F:       Union           (Ross:  E u F)
      "not" E:       Complement      (Ross:  E^c)
    0:               Empty Set       Impossibility
    S:               Sample Space    Certainty
    
    "at least one of E_i":  (Infinite) union
    "all of          E_i":  (Infinite) intersection

    DeMorgan's Rules (obvious from Venn (MasterCard) Diagram):

    not (A and B)  =  (not A) or  (not B)
    not (A or  B)  =  (not A) and (not B)

Probability assignment rules:

(1)    0 <= P(E) <= 1
(2)    P(S)  =  1
(3)    P(E u F) = P(E) + P(F)  **IF** EF = 0... more generally,
       P(U{i=1..oo} E_i) = Sum{i=1..oo} P(E_i) **IF** EiEj = 0

Note that there are lots of events--- if S has n elements then there
are 2^n possible events (illustrate this).  SO, it's usually too hard
to give a probability assignment by listing the probabilities of every
event--- need an easier way.  When S is finite we can just give the
probability p(i) of each "elementary outcome" i in S, and the write

    P(E) = sum{p(i): i in E};

this can even work for some infinite sets S (like the integers) if you
can list the elements e_1, e_2, ... of S but not for "bigger" infinite
sets (like the real numbers); if we want to "pick a point at random from
the unit square" or to "pick a number between zero and one", then we
might want P(E) to be related to *area* or *length*, which cannot be
computed by summing up any function of the infinitely-many points.
We'll come back to this kind of problem later (they're fun).

First let's do some examples of probability assignments following the
rules.  For each question give:

   (i)   S
   (ii)  A rule for computing P(E) for every event E in S

a)  Toss a thumbtack that falls Up with probability 52% (note 52% of
    live births are boys; this could instead be a gender question)

b)  Roll two fair dice.

c)  Toss a coin until first Head; count # of tails (only):
        S = {0,1,2,3,...}
    P(E)= ????  
    P(Even # of tails precede 1st head) = ????

==============

Why rules?????????

    Try:  P(E) = lim{n->oo} #(E {1,2,...,n}) / n

Does this make sense for all sets E??????  Try to find one for which it
doesn't.

---------

Suppose we define P() ONLY for the sets it Does make sense for.  Are we 
okay now???

Well, no.  Alas there are sets E, F for which P(EF) is not defined....

============ ============ Tue Ends, Thu Begins ============ ============

Consequences of the rules ("Propositions"), or How to compute probabilities:

    4.1  P(E^c) = 1 - P(E)
    4.2  P(E) <= P(F)  if  E subset F  ("E implies F", E->F)
    4.3  P(EuF) = P(E) + P(F) - P(EF)  (extension of Rule #3)
    4.4  P(UEi) = Sum{i}(P(Ei) - Sum{i<j} P(Ei Ej)
                + Sum{i<j<k} P(Ei Ej Ek) - ... + ... - ...
============

Equally Likely Outcomes:

    P(E) = #(E)/#(S)  =  sum{1/#(S) : i in E}

Examples:

    a) Prob (each player has 1 ace) = 4! * (13/52)*(13/51)*(13/50)*(13/49)
       = 2197/20825 = 10.55%
       = (48:12,12,12,12) * 24 / (52:13,13,13,13)

    b) Birthday problem:  Ignoring leap years, and assuming b'days are
       equally likely to be any day of the year, what is the chance of
       a tie in birthdays among the 40 students in this class?

       Answ:  Prob of NO tie is:
              (365/365) * (364/365) * (363/365) * ... * (326/365)
              = 365! / 325! 365^40
              = 0.10877, about 10.9%
              Prob of NO tie among n people is
              (365/365) * (364/365) * (363/365) * ... * (366-n/365)
              = 365! / (365-n)! 365^n
              = prod{i=0:n-1} ( 1-i/365 )
 
    c) A matching problem.... socks?  Couples?  

       Computer can help w/exact answer:
------------------------------------------------------------
S-Plus: 
>  prod((366-(1:40))/365) or prod(seq(365,366-40)/365) or ...
[1] 0.1087682
------------------------------------------------------------
MatLab:
>> prod((366-(1:40))/365) or prod(linspace(365,366-40,40)/365)...
ans =
    0.1088
------------------------------------------------------------
Mathematica:
In[1]:= N[Product[(366-i)/365,{i,1,40}]]
Out[1]= 0.108768
------------------------------------------------------------
c:

#include <stdio.h>
main(int ac, char **av) {
  int i, n;
  double p;

  n = (ac>1) ? atoi(av[1]) : 22;
  if(ac>1) n = atoi(av[1]); 
    else exit(fprintf(stderr, "Usage: %s <n>\n", *av));
  for(i=0,p=1; i<n; ++i) p *= (365.0-(double)i)/365.0;
  printf("Prob of no tie among %d students is %f.\n", n, p);
  return 0;
}
------------------------------------------------------------

       Approximation:
       prod{i=0:n-1} exp( -i/365 )
              = exp(-sum{i=0:n-1} i/365 )
              = exp(-(n-1)*n/730);
                exp(-39*40/730) = 0.11801

     > prod((366-40):365/365) =  0.1087682

     > prod((366-22):365/365) =  0.5243047   ( <---S-Plus  )
     > prod((366-23):365/365) =  0.4927028

Note around 22 or 23 students in class guarantee 50% chance of tie.

       Q:  In fact some b'days are more common than others (say, 9 mo.
           after 1st weekend in spring, or maybe after a 3-day power
           outage from a hurricane....); how will this affect the
           B'day Problem answers?  Will that make ties MORE likely
           or LESS likely???

------------------------------------------------------------

Limits:

Sometimes we need to use approximations to compute the probability of
something--- and so we'll need to know that

   P[ at least one of E_n occurs ]  =  lim P(E_n)   IF  E_n c E_{n+1}

   P[ all of F_n occur ]  =  lim P(F_n)             IF  F_{n+1} c F_n

(i.e., INCREASING unions and DECREASING intersections are okay)


EXAMPLE:
  What's the chance of at least one head if we toss a fair coin forever?

ANSW:  P[No head in n tosses] = (1/2)^n -> 
       P[at least one in n  ] = 1 - 2^{-n} -> 1

  What's the chance of all tails if we toss a fair coin forever?

ANSW:  P[all tails for n tosses] = (1/2)^n -> 0

  Put one brass ring and n silver ones in a hat, at time n; what's the
  chance we ever draw the brass one?

ANSW:  P[no luck in 1st n tries] = (1/2)(2/3)(3/4)...(n/n+1) = 1/n+1,
  so   P[NEVER get brass ring] = 0.
   
  Put one brass ring and n^2 silver ones in a hat, at time n; NOW what's
  the chance we ever draw the brass one?

ANSW:  P[no luck in 1st n tries] = (1/2)(4/5)(9/10)...(n^2/n^2+1) -> 0.272...
  so   P[NEVER get brass ring] = 0.727... 
   
==========

What IS probability???

      0. Symmetry?              Try to find equally-likely outcomes.
      1. Asymptotic Frequency:  P(E) = lim_{n->oo} (#E in n tries)/n
      2. Degrees of belief:     P(E) = fraction of brass rings needed
                                       for betting on event or hat to
                                       be equally attractive
==========