The following problem appeared on the June 2016 Common Core Algebra 2 Regents exam.

This is a straightforward and reasonable problem. What’s unreasonable is that it is only worth two points.

The student here is asked to construct a representation of a mathematical object with six specific properties: it must be a cosine curve; it must be a single cycle; it must have amplitude 3; it must have period ; it must have midline *y = -1*; and it must pass through the point (0,2).

That seems like a lot to ask for in a two-point problem, but the real trouble comes from the grading guidelines.

According to the official scoring rubric, a response earns one point if “*One graphing error is made”. *Failure to satisfy any one of the six conditions would constitute a graphing error. So a graph that satisfied five of the six required properties would earn one point out of two. That means a response that is 83% correct earns 50% credit.

It gets worse. According to the general Regents scoring guidelines, a combination of two graphing errors on a single problem results in a two-point deduction. That means a graph with four of the six required properties, and thus two graphing errors, will earn zero points on this problem. A response that is 66% correct earns 0% credit!

The decision to make this six-component problem worth two points creates a situation where students are unfairly and inconsistently evaluated. It makes me wonder if those in charge of these exams actually considered the scoring consequences of their decision, especially since there are two obvious and simple fixes: reduce the requirements of the problem, or increase its point value.

This is another example of how tests that are typically considered objective are significantly impacted by arbitrary technical decisions made by those creating them.

**Related Posts**

- Regents Recaps
- Regents Recap — June 2016: Scale Maintenance
- Regents Recap — June 2014: Common Core Scoring
- 9th Grade Questions on 10th and 11th Grade Exams

I think you’re absolutely right that this is a problem, and and even that it is based on a decision. However, it’s not a purely technical decision (though the psychometric devotion to reliability over validity is certainly at play here) — it’s also political, in at least three ways.

First, I detect a lingering effect of the “Mathematically Correct” side of the “math wars”. What’s awarded points is answers, and there’s a big emphasis on “objectivity.” I’ve noticed that Regents rubrics sometimes take a tentative step towards more holistic evaluation of the degree of students’ understanding, but seem to come back to this kind of language that’s very black-and-white — perhaps more “reliable”, but not very sensitive to the understanding being assessed.

Second, items that admit more nuanced scoring mean more expense to score. (In the case of the Regents, the budgetary impact may be more direct, but this is very explicit for large-scale assessments typical in other states, including PARCC, SBAC, and SAT, among others.) Testing is unpopular enough already (see below) and so is generally under-budgeted. This means we get less nuanced assessment items or tasks — we get what we pay for.

Third, the primary motivation of these assessments is to evaluate students (and teachers, and schools, and districts), not to enhance students’ learning. (In the language that Dylan William uses, this is all about quality control, rather than quality assurance.) This has led to assessment items that look like more of the same (as Regents and other state exams, or the SAT, ACT, etc.) rather than what some of us were hoping for with PARCC and SBAC (something — at least in part — more like the performance assessments from the MARS and Shell Centre teams). This test-and-punish framework makes assessment (understandably) unpopular with many students, parents, and educators.

Thanks for the comment, Sendhil. I appreciate your insights on these matters.

I meant that the decision was “technical” in a broad sense, as in, a decision about structure and administration as opposed to content. Perhaps “logistical” would have been a better word. I seriously doubt that a technical “expert” made this decision, and frankly, I have doubts about such technical expertise in general.

You raise an interesting point about Regents rubrics, which are typically quite vague, often to the point of meaninglessness. When scoring is done locally, that means there is lots of room for interpretation. As a teacher/grader, I appreciated that. But now in NYC scoring is done centrally (unless, of course, you’re a charter school), and the design, organization, and implementation of the centralized grading may actually be more problematic than the exams themselves. And part of the problem, I think, is that there is an attempt to make grading more black-and-white, which is not supported by the rubrics or the exams themselves.

I understand your remarks about cost, but I don’t accept it as an excuse. It is the state’s decision to attach high-stakes outcomes to these exams, so it is the state’s obligation to ensure their quality. This is perhaps the primary point of my ongoing analysis.

And I think it’s fantasy to expect high-stakes, standardized testing that looks like high-quality performance tasks. No aspect of that — design, implementation, grading — seems realistic to me. I, personally, never expected anything more from our high-stakes testing program than what we’ve got.

The pre-standards exams did not have open-ended questions worth so few points. The very concept of a two point open ended question is exasperating. The best rubric in the world cannot fix this.

Jonathan