Scattered Popularity in Baseball

I enjoyed reading through this marketing document from Harris Interactive about the popularity of Major League Baseball.  Lots of interesting facts about who watches baseball (higher percentage of people in the East versus other regions; percentages rise with income), and a nice ranking of the league’s most popular teams.

Putting the popularity rankings together with team salary information, I made myself a nice little scatter plot.

MLB Popularity Regression

Team payroll along the horizontal is in millions, and the popularity is out of 30 teams, with 30 being “most popular”.

Not too hard to guess the red triangle in the upper right:  first in popularity and team payroll, your New York Yankees!  The World Series Champion San Francisco Giants are the big red circle around (93,23).  And kudos to the Atlanta Braves, the nice red square in the top middle, as they seem to be getting the best popularity return for their payroll dollars.  The sad yellow diamond in the bottom left is for one particular reader:  maybe next year!

There does seem to be a positive relationshp between the amount spent on salary and the team’s overall popularity, but there are probably a lot of reasons for that.

Lots of other interesting ways to slice and dice this data.  Take a look at the document and try it yourself!

Facebook Formulas

peak breakup timesThis graph on the right represents break ups per day, as determined by an analysis of Facebook status changes.  The data suggests that break-ups seem to occur most frequently in mid-February and late November.

Drawing conclusions from data is always dicey, and there are probably a lot of holes to poke in the methodology here, but it certainly is fun trying to attach meaning to these numbers!

This graph was featured in a TED Talk given by David McCandless, who runs the wonderful website www.informationisbeautiful.net.

The whole talk can be found here; this chart comes up at around the 6:50 mark.

The amount of data available through social networking sites is mindblowing, and it can’t be long before it will be used in some significant way.  Indeed, a group of MIT students has already devised a system, cleverly titled Project Gaydar, that, with some accuracy, identifies the sexual orientation of a Facebook user based on friends, likes, and other connections.

What will they compute about us next?

Proofiness

ProofinessThis is a short interview in the NYT with Charles Seife, the author of “Proofiness:  The Dark Art of Mathematical Deception”.

http://well.blogs.nytimes.com/2010/10/29/the-dark-art-of-statistical-deception/

Trading on Colbert’s clever coinage–Truthiness–Seife’s book apparently address the myriad ways that the misrepresentation and misinterpretation of statistics negatively affects medicine, economics, politics, justice, and other aspects of society.

It’s not clear that this book is covering any ground that hasn’t already been covered in, say, How to Lie With Statistics (an amusing classic!) or the engaging and readable work of John Allen Paulos, but hopefully the more the issue is raised, the more seriously it will be taken.  The consequences of innumeracy, and general scientific illiteracy, are profound and far-reaching, and they affect us all.

Benford’s Law

Benfords LawThis is an article about the discovery of new sets of data that seem to obey Benford’s Law–a curious mathematical characteristic of the numbers we collect from the world that is really more conjecture than law.

http://www.newscientist.com/article/mg20827824.700-curious-mathematical-law-is-rife-in-nature.html

It seems that in scores of data sets collected from natural phenomena, the numbers we see tend to start with the digit 1 far more often than, say, with the digit 6.  Indeed, statistical analysis shows that when you look at population numbers, death rates, street addresses, lengths of rivers, stock prices, and more recently, depths of earthquakes and brightness of gamma rays, the observed numbers start with the digit 1 about 30% of the time.  The occurences of other digits as the leading digit fall as you go up the scale.

Apart from being a natural curiosity, Benford’s Law has proven to have some very useful applications.  Scientists can use Benford’s Law to help predict phenomena and look for trends in data, as the rule gives number-crunchers an idea of what they might be looking at from the start.

Additionally, Benford’s Law has been successfully used to identify all kinds of numerical fraud–tax fraud, voter fraud–because when people are faking numbers, they tend to evenly distribute leading digits.  Benford’s Law tells the data-police that if approximately 1/9 of the numbers they are looking start with 1, then something fishy is going on.

Keep that in mind next April.

Who Tests the Testers?

bell curveIt’s tricky business, curving state exams.

An audit by Harvard researchers compared student results on NY State exams (Regents, et al) with corresponding national exams, and it seems that much of the “progress” made by NY students over the past few years was probably illusory.

There are several telling statistics in the report, but none clearer than this:  in 2007, the minimum score on the NY state math exam corresponded to the 36th percentile nationwide.  In 2009, the minimum score on the NY state math exam corresponded to the 19th percentile nationwide.  This effectively defined proficiency as “do better than 19 percent of students across the country”.

In theory, curves for tests can drop if exams get harder, but no one with any knowledge of NY State math exams would make that argument.  Indeed, these exams have been getting easier and easier to pass.  For example, to pass the Integrated Algebra Regents Exam in 2009, a student only needed 30 raw points out of 88.  A passing score of 34% seems pretty low to begin with, but keep in mind that a student guessing randomly on the multiple choice questions alone should get about 1/4 of the questions right, which amounts to 15 points.  Halfway to proficiency.

Follow

Get every new post delivered to your Inbox

Join other followers: