Archive of posts filed under the Probability category.

Conditional probability is one of my favorite topics to teach.  Whereas normal probability calculations simply compare favorable outcomes to total outcomes, conditional probability allows us to consider the impact of certain knowledge on the likelihood of those outcomes.

For example, the probability of rolling a 6 on a six-sided die is 1/6, but if it is known that the number showing is greater than 3, then the conditional probability that a 6 is rolled is 1/3.

There are many applications of conditional probability, but a recent “Math Encounter” from the Museum of Math made me aware of an application of conditional probability that all of us see on a regular basis:  Google search autocomplete.

Suppose I type in the search term “under”:

Here, Google is trying to autocomplete my search query.  In essence, Google is trying to guess the next word I’m going to type.  How does it make its guess?  It computes a conditional probability!

Google has a lot of data on when words follow other words.  When I enter “under” into the search bar, Google looks for the word/phrase with the highest conditional probability of being next.  Here it turns out to be “armour”; the word with the second highest conditional probability is “world”, and so on.

A fascinating, and perhaps surprising, application of a powerful mathematical idea!

Joe Girardi, Probability, and Expected Value

During last night’s Yankees-Twins baseball game, the commentators were discussing the Yankees’ increased use of defensive shifts.

A “shift” is a defensive realignment of the infield to guard against a particular player’s hitting tendencies.  For example, if a player is much more likely to hit the ball to the right side of the infield (as, say, a strong left-handed hitter might be), a team may move an infielder from the left side to the right side to increase the chance of defensive success.

Dramatic infield shifting was once a rarity in the game, employed against only a few hitters in the league.  It is now being used with increasing frequency.  ”All the data is out there,” said the announcers when discussing Yankees’ manager Joe Girardi’s explanation of why he was using it more.  (Which sounded remarkably like what Rays’ manager Joe Maddon, a pioneer in increased defensive shifting, had to say when asked about it some time ago).

The essential idea is that, given the reams of data now recorded on player performance, teams have a much more refined understanding of what a player will do.  No longer is the projection “The player has a 30% of getting a hit”; now, it’s “The player pulls 83% of ground balls to the left side of the infield”.  Naturally, teams try to use such information to their advantage.

It’s good that Joe Girardi is demonstrating an increased appreciation for, and understanding of, probability.  But as last night’s game suggests, he may need to learn more about the principle of expected value.

Early in the game, the bases were loaded with two outs, and a left-handed batter came to the plate.  Girardi put the defensive shift on, responding to data on this player that suggested he was extremely likely to ground out to the right side of the infield.  But probability considerations should be only one part of the analysis.  By leaving so much of the left side of the infield undefended, a situation was created where a weakly hit ground ball that would usually be an easy out actually produced two runs for the Twins.

In short, although the probability of that event (ground ball to the left side) was low, the risk (giving up two runs) was high.  Considering both the probability and the payoff is essential to long-term success.

I’d be surprised if the Yankees’ employ the shift again in that situation.  And if the Yankees need a special quantitative consultant, I am available during the summer.

www.MrHonner.com

Leap Day Birthdays

In my Leap Day contribution to the New York Times Learning Network, “10 Activities for Learning About Leap Year and Other Calendar Oddities,” I calculated the odds of a person having a Leap Day birthday.

Assuming each day of the year is an equally likely birthday, and noting that there is one Leap Day every four calendar years, I calculated the probability to be

(Leap Day Birthday) = $\frac{1}{4*365 + 1} = \frac{1}{1461} \approx 0.0068$

or around 0.7%.

So how many people with Leap Year birthdays do you know?

A One-in-a-Million Baseball Play

As the 2011 MLB season winds down, there is a slim chance of something very unusual happening:  a three-way tie for the wild card playoff birth!

http://goo.gl/ECUIT

It seems highly unlikely that the Red Sox, Rays, and Angels will actually all finish in a dead-heat, but if they do, it will pose a lot of problems for playoff scheduling.

This is a fun, if complicated, math question to think about:  what are the chances that after a 162-game season, three of the eleven teams ultimately vying for the wild card end up with identical records?

To investigate, the first thing I’d do is simplify the situation.  I’d reduce the number of teams and the number of games, give every team a 50/50 chance to win every game, and then see what happens.  After I’d explored a bit, I’d then consider complicating matters by using more teams, more games, and more realistic winning percentages.

A math challenge that any Strat-o-matic player could love!

www.MrHonner.com

Un-Random Shufflers

This is a great story about how statisticians at Stanford audited a new automatic shuffling machine and determined that the cards weren’t distributed randomly enough.

http://goo.gl/sVU4b

If a deck of cards is dealt one at a time, a knowledgeable observer, in theory, should be able to predict the next card dealt around 4.5 times per 52-card deck.  For example, by remembering which cards have been dealt, the observer will definitely know the final card, as it’s the only one that hasn’t been dealt.  Similarly, the observer will have a 1 in 2 chance of guessing the second-to-last card, and so on.  Calculations involving probability and expected value will give you the theoretical result.

For this particular shuffler, however, the statisticians from Stanford determined that an observer should be able to predict the next card 9.5 times per 52-card deck!  The shuffling machine manufacturer that hired them must have been pretty upset to hear this, but redesigning the machine is probably not as costly as selling casinos hundreds of predictable shufflers and then dealing with the consequences.

It should come as no surprise that Persi Diaconis is the lead author on the paper.  Diaconis is a living legend in the world of mathematics, having left home at an early age to become a sleight-of-hand artist, then returning to earn a PhD from Harvard in mathematical probability.  One of Diaconis first major results was proving that seven shuffles are necessary to “randomize” a standard 52-card deck.

The full paper from Stanford can be found here:

http://statistics.stanford.edu/~ckirby/techreports/GEN/2011/2011-08.pdf