Saturday, August 3, 2013

Two stories – Oakland First Fridays, Black Ravens and Pink Flamingoes

Only one of the following two stories is true.

Oakland First Fridays: Maya and Elsa were with me at work yesterday, August 2nd, and overheard me making plans to go to Oakland's First Friday – a street/gallery food/art/music/people-watching festival. Maya asked whether I was going to take them. I said, “No, you will be with your mom tonight.”.
Elsa - “Are we with you next Friday?”
I - “Yes.”
Elsa - “Can we go with you to Oakland First Friday next week?”

Black Ravens and Pink Flamingoes:
Maya, Elsa and I were walking along the bay in Redwood Shores with Heather, a colleague. It was close to dusk and we saw a number of birds: a small number of red-winged blackbirds, crows, moor hens and cormorants, all mostly black; and a few killdeer, snowy egrets, a mourning dove nesting in a flowerpot at the entrance to Heather and Chris' home, scrub jays, great blue herons and clapper rails, all non-black. We also saw nine black ravens. I casually said, “Hmm, looks like all ravens are black.”, which seemed to go un-noticed at the time. Soon after, we saw another raven, which was also black. Heather said, “Oh, that raven is black, looks like you are right that all ravens are black.”

A short while later, Maya pointed out a pink flamingo.

Maya said, “Look Dad, a pink flamingo! Looks like you are even more right that all ravens are black!”

“What in the world do pink flamingoes have to do with black ravens?”

“Daa-ad! They do, I read it in my 'Great Philosophers' book and searched for it online! You said that 'All ravens are black.' That is logically equivalent to it's contrapositive, 'All non-black objects are non-ravens.' When Heather saw an additional raven and it was black, she pointed out (and you did not disagree) that it was evidence in favor of your original proposition that 'All ravens are black.' The contrapositive I just mentioned is logically equivalent. So when we see a non-black object (it is pink) and it is not a raven (since it is a flamingo), you should agree that it is evidence in favor of 'All ravens are black.'!”

“Oh god! Can't you just read some normal book about tribes of cats or that girl Parsnip or something who shot her little sister with an arrow? I suppose you are right!”
“Dad, okay, it seems to make sense logically, but I don't know if it is true in some statistical sense, all that “SQL, SQL, standard error and p-value” stuff you keep talking about now.”

“Let's take a crack at it. Let's denote
Proposition 1: “All ravens are black.” For the sake of simplicity let's just say that we are talking about birds and that we are trying to see if (the set of individual birds that are) ravens have some special property that distinguishes them from all birds (as individuals, not species). Then, practically speaking, the contrapositive and logical equivalent of Prop. 1 is
Proposition 2: “All non-black birds are non-ravens.”
Prior to Heather's observation, the evidence we had looked like
All birds of all colors Black non-Black All colors
Ravens 9 0 9
non-Ravens 50 8 58
All birds 59 8 67

Then she saw a black raven, and our evidence changed to
All birds of all colors Black non-Black All colors
Ravens 10 0 10
non-Ravens 50 8 58
All birds 60 8 68
And we agreed that this helped verify that prop 1 is true, in the sense that it allowed us to be more confident that Prop. 1 is true.

Then you saw a pink flamingo, and our evidence became
All birds of all colors Black non-Black All colors
Ravens 10 0 10
non-Ravens 50 9 59
All birds 60 9 69
And you are saying that this also allows us to be more confident that Prop. 1 is true. Let's forget about Prop.2 for a while. So for verifying that “All ravens are black.”, a pink flamingo is worth as much as a black raven. I can't think with concrete numbers, so let's use some abcedra:

All birds of all colors Black non-Black All colors
Ravens a 0 a
non-Ravens b c b+c
All birds a+b c a+b+c

On the basis of this evidence, the probability of any bird being non-black is
p = c/(a+b+c).

How much evidence is there to reject the hypothesis that some ravens are non-black? The specific null hypothesis in this case is
H : “Some raven-birds are non-black.”,
since we are hypothesizing that raven-birds are just any other individual bird, some of which are non-black.

Now, if the above H were true, and we make 'a' observations of birds which happen to be ravens, the probability P of getting 0 non-black is the product of the probabilities that each observed bird in this set of ravens is non-black, i.e. P = (1-p)*(1-p)*... a times = (1-p)a.
(Or you can use B[p,a](i) = C(a,i)*(1-p)^(a-i)*p^i for i = 0.)

This is the P-value, the probability of the consequence of the null hypothesis in this experiment, that we can use to reject the null hypothesis with 1-P degree of confidence. Recall that the smaller that P is, the more confidence we have in rejecting the Null Hypothesis, or “accepting the validity of proposition 1”.

So now we can ask about the relative merits of observing a black raven or a pink flamingo. Which additional observation reduces the P-value more? Let's use calclueless, since I am not secretive enough to do discrete math. What that means is we want to compare the marginal change (partial derivative) in the P-value when we make an additional observation of a black raven a → a + 1 vs. when we see a pink flamingo c → c + 1.

Since ln(P) = a*ln(1-p),
(d/d a) ln(P) = ln(1-p) and some work shows that the change in the P-value by observing a black raven, adding 1 to a:
(d/d a)P = P * ln(1-p),
which is less than 0, meaning that observing a black raven does reduce the P-value and increases the confidence in “All ravens are black.”!

(d/d c)p = (1-p)/(a + b+ c) ( > 0)
(d/d p)P = -a*P/(1 – p) (< 0) .

Simplifying, the change in the P-value by observing a pink flamingo, adding 1 to c:
(d/d c)P = -a*P/(a+b+c),
which is less than 0, meaning that observing a pink flamingo also reduces the P-value and increases the confidence in “All ravens are black.”!

The question that now remains is wether pink flamingoes are more valuable evidence than black ravens, i.e. which change decreases the P-value more:
|(d/d c)P| ?> |(d/d a)P|, which is equivalent, since P > 0, to
|(d/d c)ln(P)| ?> |(d/d a)ln(P)|
working through algebra
a/(a+b+c) ?> -ln(1-p).

the condition we are looking for is

e^(a/(a+b+c)) ?> 1+ c/(a+b)

For a = 10, b = 50 and c = 8, it turns out that this is just marginally true!
A pink flamingo is just as valuable as a black raven in verifying that all ravens are black!
See the plot below

Since most birds are actually non-black, had we seen already a very large number of non-black non-ravens:

All birds of all colors Black non-Black All colors
Ravens 10 0 10
non-Ravens 50 90 140
All birds 60 90 150
Then the incremental value of seeing a pink flamingo would have been much less than that of seeing a black raven, for two reasons, one since we would have had a large proportion of non-black birds, the expected proportion of non-black ravens would have been correspondingly higher, making it all the more unlikely to see no non-black ravens amongst the additional ravens. Second, as the graph above shows, the incremental value of each non-black non-raven when we've already seen a lot does very little to increase our confidence that all ravens are black. For example, in the Rann of Kutch “Another pink flamingo, ho hum!
So, Maya, does it now make statistical sense that your pink flamingo sighting was just as important as Heather's black raven sighting for verifying that all ravens are black?”

Yes, Dad, I think I want to be a doctor when I grow up.”

The next day, we went out for lunch, and suddenly Elsa piped up, “Dad! More black ravens! I just saw an orange chicken!”

Can they fly?”

No comments: