Here is another installment in my series reviewing the NY State Regents exams in mathematics.

Elementary statistics plays an increasing role in high school math curricula, but the ways these concepts are often tested raises some concerns. After all, the manner in which ideas are tested can reflect how the ideas are being taught.

Here’s a question from the 2014 Integrated Algebra exam: which of the following is not a *causal* relationship?

Causality is notoriously difficult to establish, but I’ll set aside my philosophical objections for the time being. My primary concern here is with (2) being the correct answer.

First, correlation is a relationship between two quantities. What quantity is population correlated with in answer choice (2)? “The taking of the census” is an event, not a quantity. This may seem like nitpicking, but what quantity are we supposed to assume in its place? It seems natural to assume “the census taken” to mean “the number of people recorded on the census”, but then how could there be no causal relationship? What causes a number to be written down for “population”, if not the actual population?

Here’s another question from the 2014 Integrated Algebra exam.

It’s important to talk about bias in surveys, but no substantial thought is required to answer this question: three of the answer choices have absolutely nothing to do with campsites. And for the record, the question should really be phrased like “which group is *most likely* to be biased against the increase?”.

And this is a problem typical of the Algebra 2 / Trig exam.

I know it’s pretty much standard usage, but no finite data set can be *normally distributed*. The correct terminology here would be something like “the heights can be approximated by the normal distribution”.

I’m aware that some may see these complaints as minor, but as I’ve argued before, I think it is extremely important to model precision and rigor in mathematical language for students. We expect this from our teachers and our textbooks; we should expect it, too, from our tests.

That bias question really bugs me. I don’t understand what it’s doing on an algebra exam, and the wording is pretty terrible. The question uses “bias” in the colloquial sense when it is unnecessary (could even ask “which group is most likely to be opposed to the increase”). I typically think of “bias” in a statistical sense, where the goal is to measure something, and the results are biased in some particular systematic way. In the context of the question, one could make the argument that the goal is to survey the general population, but it would totally make more sense to survey campers when deciding whether to implement a policy that affects campsites. Why do we even care about what teachers, soccer players, and postal workers think about campsite fees?

Data analysis is often too tricky for high school. It is even too tricky for the writers of state assessments. Are writers of state assessments required to pass a Math exam?

I will call your New York State statistics questions and raise it with the following assessment item on Data Analysis from the Maryland High School Assessment [MD HSA] on [Some concepts from] Functions, Algebra, Probability and Data Analysis.

Item. “In a small town, 250 randomly sampled registered voters were asked to state whether they would vote “Yes” or “No” on Measure A in the next local election. The table below shows the results of the survey.

VOTER SURVEY RESULTS

Yes No Undecided

96 34 120

There are 5,500 people expected to vote in the next election. Based on the data, how many people will vote “No” on Measure A in the next election?”

(This is 2007 Public Release Algebra/Data Analysis Item #38 of the Maryland High School Assessment [MD HSA] on [Some concepts from] Functions, Algebra, Probability and Data Analysis at http://mdk12.org/assessments/high_school/look_like/2007/algebra/ftri38.html This is also Item #37 at http://www.mdk12.org/assessments/high_school/look_like/2007/algebra/hsaAlgebra.pdf)

To obtain the expected “correct” answer of 2,112, students are expected to make a number of unwarranted and usually incorrect assumptions. Students who answer 2,112, on a college political science exam will likely be marked wrong.

I’ll stipulate the causality and bias comments. They are notoriously difficult in statistical studies at any level, and such nuances can NEVER be effectively captured in ANY standardized multiple choice test.

I’m most bothered by your last normal distribution item & Jerome’s survey question. I agree, Patrick, that there is a fundamentally flawed assumption by the NY question in its claim that the finite, discrete set of high school girls’ heights were normally distributed. Jerome is also spot on with his warning about the “number of unwarranted and usually incorrect assumptions.”

Independent of the actual intended answers, the bigger problem, I think, is the assumption by both the NY and MD questions that exact, definite outcomes (“how many of the girls ARE shorter? & “how many people WILL vote “no”) can be derived from survey results and assumed normal distribution. It’s likely a small and subtle point, but no questions like these can ever give absolute, definitive predictions. The language may be standard, but statistical questions are by their nature probabilistic (duh). We should be stunned if any prognostications for any population of reasonable size were met by experimental outcomes exactly mirroring theoretical predictions.

I interpreted the normal distribution question as stating that there is a normal distribution from which the students are drawn. The 450 student heights will not be normally distributed, but they could be a sample from a random distribution.

I don’t like the question because it assumes that one can say with certainty how many students will have particular heights. The question should ask for an estimate, or a best estimate, or a confidence interval.

In everyday language, words like correlation & bias often have a different meaning than they do in statistics. The test writer seeems to be using the everyday meaning which does not apply in these questions.

#27: It would be better to use the word association. In some more formal settings (AP Stat for example), correlation is reserved for linear relationships. 27 (4) would not be linear.

Agree that the wording is terrible. Also, are they talking about multiple countries or multiple census for one country? Is (2) considered non-causal because the census is a snapshot that does not capture fluctuations in the intervening years? If so, sigh…

#7: Yikes. It doesn’t make sense to say a GROUP is biased in this context. Bias would means the survey is done in a way that is likely to error on one direction. If the survey is done in a way that will tend to over-represent a group (like campers), then there is bias (assuming campers opinions differ from the population as a whole). Bias in this context does not mean “tend not to like” as in everyday language.

The wording is poor as well. Again, it doesn’t make sense to ask about the bias of a group in this case. But, what is the point of the “if” part: “If a survey were taken, which group…”. Does taking a survey affect the “bias” of the group?

#28 doesn’t bother me so much, though it would be nice to include approximately.

Patrick, you know I take issue with virtually all the stats problems on the NYS Regents. It’s painfully obvious the question-writers know only a meager amount of statistical vocabulary. That correlation question… I don’t even know where to begin.