Are These Tests Any Good?

When it comes to educational testing, the stakes are higher than ever.  For a student, tests might determine which public schools you can attend, if and when you graduate, and which colleges are available to you.   For schools and districts, aggregate test scores and the “progress” they show might determine what kind of state and federal aid is available.

As a means of evaluating teachers, student test scores are playing an increasing role.  Indeed, state laws have been re-written to mandate the use of standardized test data as a substantial factor in rating teacher performance.

There is controversy regarding the value of standardized tests, even as measures of student achievement (in most cases, their purported purpose).  A very public debate has emerged as politicians attempt to make education more “data-driven” and hold teachers and schools more “accountable”.   But one fundamental question is rarely raised in this conversation:  are these tests any good?

If the tests we use to evaluate students, schools, and now teachers, are ill-conceived, sloppy, and erroneous, how legitimate a measure of teaching and learning could they possibly be?  The issue of test quality and relevance seems like an important one, but it gets very little attention.

In this series, I address the question “Are These Tests Any Good?” by looking at a collection of questions from the 2011 New York State Math Regents Exams.  My cursory analysis reveals many significant issues with how these tests are created:  mathematical errors; poorly-worded questions; the de-emphasis of knowledge; and misalignment with course curricula.

If we can’t create legitimate, relevant, appropriate tests, should we really be using them to evaluate teachers?  Or students?

Are These Tests Any Good?

Part I:  Mathematically Erroneous Questions

Part II:  Ill-Conceived Questions

Part III:  Underrepresented Topics

Part IV:  The Worst Math Regents Question of All Time

Part V:  9th Grade Questions on 10th and 11th Grade Exams

Here are some other resources on this topic.

This blog by JD2718 offers a similar critique of Regents exams from 2009.

Here’s some fun I had with one of this year’s Regents questions involving the famous 13-14-15 triangle.

Related Posts

Are These Tests Any Good? Part 5

This is the fifth entry in a series examining the 2011 NY State Math Regents exams.  The basic premise of the series is this:  if the tests that students take are ill-conceived, poorly constructed, and erroneous, how can they be used to evaluate teacher and student performance?

In this series, I’ve looked at mathematically erroneous questions, ill-conceived questions, under-represented topics, and what is perhaps the worst question in Regents history.  In this entry, I’ll use questions from two exams to discuss duplication, lowered-expectations, and poor test construction.

Number 37 from the 2011 Geometry Regents exam is a 4-point question which asks students to solve the following system of equations graphically:

2x^2 -4x = y + 1

x + y = 1

Number 39 from the 2011 Algebra 2 / Trigonometry Regents exam is a 6-point question which asks students to solve the following system of equations algebraically:

5 = y-x

4x^2 = -17x + y + 4

These two systems of equations are roughly equivalent in terms of difficulty.  Why is a question suitable for the Geometry exam appearing on a the Alg 2/Trig exam, and as the highest-valued question (6 points) to boot?  In New York state, the Alg 2/Trig course follows Geometry in the standard sequence, so it is strange to see the same kind of problem on two state exams that are designed to be taken at least a year apart.

It’s true that the Alg 2/Trig test question asks for an algebraic solution, as opposed to a geometric solution, but that is essentially the only difference between the two.  This being the case, this speaks to a serious problem in how these tests are conceived and designed.

Looking at these two tests, one might conclude that learning to solve this kind of system algebraically is an important part of the Alg 2/Trig course:  why else would the official exit exam require the use of this technique in solving a problem that could have been solved last year?

Solving systems algebraically is definitely is a fundamental skill; so fundamental, in fact, that it is part of the Integrated Algebra curriculum (see the Integrated Algebra Pacing guide on the official schools.nyc.gov website).  Integrated Algebra is the course students take before they take Geometry!  Since many students take IA in 9th grade and take Alg 2/Trig in 11th or 12th grade, this means that a 6-point question on the Alg 2/Trig exam is testing the student’s ability to solve a problem they should have been able to solve two math courses ago.

Students should be able to solve this kind of problem at all mathematical levels, but why is material from two courses ago playing such a prominent role on an advanced exit exam?  What Alg 2/Trig course material is being shortchanged in order to re-test more elementary skills?  And to the point, how can this be considered a legitimate assessment of what a student learned in an Alg 2/Trig course?

Furthermore, in each case the scoring guide allows for half credit if the problem is solved using a method different than the one specified.  This is a reasonable policy, but what then is the purpose of a question specifically designed to test knowledge of a technique?  On the Alg 2/Trig test, a student can earn half credit for solving the system graphically; that means a student can get 3 of the 6 points by simply doing exactly what they did on the same problem on last year’s Geometry exam.

This example highlights how some questions on these exams aren’t directly connected to the content of their respective courses.  If a test isn’t legitimately designed around the curricula and content of the course, how can teachers and students effectively prepare?  How could such tests be valid assessments of what a student learns in that class?  Or how effectively a teacher teaches?  These are all questions that aren’t asked enough in the debate about standardized tests, student performance, and teacher accountability.

Related Posts

Are These Tests Any Good? Part 4

This is the fourth entry in a series examining the 2011 NY State Math Regents exams. The basic premise of the series is this: If the tests that students take are ill-conceived, poorly constructed, and erroneous, how can they be used to evaluate teacher and student performance?

In this series, I’ve looked at mathematically erroneous questions, ill-conceived questions, and under-represented topics. In this entry, I’ll look at a question that, when considered in its entirety, is the worst Regents question I have ever seen.

Meet number 32 from the 2011 Algebra II / Trigonmetry Regents exam:

If f(x)=x^2 - 6, find f^{-1}(x).

This is a fairly common kind of question in algebra: Given a function, find its inverse. The fact that this function doesn’t have an inverse is just the beginning of the story.

In order for a function to be invertible it must, by definition, be one-to-one. This means that each output must come from a single, unique input. The horizontal line test is a simple way to check if a function is one-to-one. In fact, this test exists primarily to determine if functions are invertible or not.

The above function f(x) fails the horizontal line test and thus is not invertible. Therefore, the correct answer to this question is “This function has no inverse”. And now the trouble begins.

Let’s take a look at the official scoring guide for this two-point question.

[2]   \pm \sqrt{x+6}, and appropriate work is shown.

This is a common wrong answer to this question. If a student mindlessly followed the algorithm for finding the inverse (swap x and y, solve for y) without thinking about what it means for a function to have an inverse, this is the answer they would get. According to the official scoring guide, this wrong answer is the only way to receive full credit.

It gets worse. Here’s another line from the scoring guide.

[1]  Appropriate work is shown, but one conceptual error is made, such as not writing \pm with the radical.

In summary, you get full credit for the wrong answer, but if you forget the worst part of that wrong answer (the \pm sign), you only receive half credit! So someone actually scrutinized this problem and determined how this wrong answer could be less correct. The irony is that this conceptual error might actually produce a more sensible answer. The further we go, the less the authors seem to know about functions.

And it gets even worse. Naturally, teachers were immediately complaining about this question. A long thread emerged at JD2718’s blog. Math teachers from all over New York state called in to the Regents board, which initially refused to make any changes. A good narrative of the process can be found at JD2718’s blog, here.

The next day, the state gave in and issued a scoring correction: Full credit was to be awarded for the correct answer, the original incorrect answer, and two other incorrect answers. By accepting four different answers, including three that were incorrect, you might think the Regents board would have no choice but to own up to their mistake. Quite the opposite.

Here’s the opening text of the official Scoring Clarification from the Office of Assessment Policy:

Because of variations in the use of f^{-1} notation throughout New York State, a revised rubric for Question 32 has been provided.

There are no variations in the use of this notation, unless they wish to count incorrect usage as a variation. I understand that it would be embarrassing to admit the depth of this error, which speaks to a lack of oversight in this process, but this meaningless explanation looks even worse. This is a transparent attempt to sidestep responsibility, or, accountability, in this matter.

It’s not just that an erroneous question appeared on a state exam. First, someone wrote this question without understanding its mathematical consequences. Next, someone who didn’t know how to solve the problem created a scoring rubric for it, and in doing so demonstrated even further mathematical misunderstanding. Then, all of this material made it through quality-control and into the hands of tens of thousands of students in the form of a high-stakes exam. And in the end, facing a chorus of legitimate criticism and complaint, those in charge of the process offer up the lamest of excuses in an attempt to save face and eschew responsibility.

It might not seem like such a big deal. But what if your graduation depended on it? Or your job? Or your school’s very existence? Then it’s a big deal. At least, it should be.

Related Posts

Fun With a Favorite Triangle

In a post examining the quality of New York State Math Regents exams, I considered the following problem from the 2011 Algebra 2 / Trigonometry exam:

In triangle ABC, we have a = 15, b = 14, and c = 13.  Find the measure of angle C.

This problem is designed to test the student’s knowledge of the Law of Cosines.  The Law of Cosines is an equation relating the three sides and one angle of the triangle; knowledge of any three of those four quantities allows you to determine the fourth.  Substitute the three sides into the equation, perform some algebra and simple trigonometry, and you’ll get the angle.

This isn’t just any triangle, though:  this is the famous 13-14-15 triangle.  The 13-14-15 triangle has some special properties that allow you to solve this problem without using the Law of Cosines!

For example, when you drop the altitude the side of length 14, something amazing happens.

13-14-15 Triangle with altitude

Altitudes are perpendicular to bases, so two applications of the Pythagorean Theorem and a little algebra show that the foot of the altitude, H, divides AC into segments of length 5 and 9.  This means that triangle AHB is a right triangle with sides 5, 12, and 13 and triangle CHB is a right triangle with sides 9, 12, and 15.  As it turns out, our 13-14-15 triangle is just two famous right triangles glued together along a common side!

This makes finding the measure of angle C easy:  since C is an angle in a known right triangle, just use right triangle trigonometry!  Much easier than using the Law of Cosines.

And for the record, this was a multiple choice question.  A clever student had yet another opportunity to eschew the Law of Cosines.

In any triangle, the smallest angle is opposite the shortest side.  This allows us to immediately conclude that angle C is less than 60 degrees and thereby eliminate two of the four answer choices, 67 and 127.  Similarly, the longest side of a triangle is opposite the largest angle, which means that angle A is greater than 60 degrees.  Using a straight-edge and compass, we can construct the following equilateral triangle with side AB.

13-14-15 Triangle with equilateral

The two remaining choices for the measure of angle C are 53 and 59.  Our diagram suggests that angle B is very close to 60 degrees.  Since A is bigger than 60 degrees, C must be less than 60 degrees by roughly that same amount.  So the question is now ‘Is the measure of angle A 7 degrees more than 60, or 1 degree more than 60?”.  If the diagram is to scale (mine is; I’m not sure about the diagram included in the Regents exam), a 7-degree difference seems more likely.  It’s admittedly not a rigorous solution, but it’s not a bad way to navigate to the correct answer.

It’s ironic that there are two reasonable ways to approach this problem without using the Law of Cosines, as this was the only problem on this Trigonometry exam that tested the student’s knowledge of this important relationship.

Lecturing and Teaching

This article by David Bressoud from the Mathematical Association of America summarizes some interesting research about “lecture-style” teaching.

https://www.maa.org/columns/launchings/launchings_07_11.html

An experiment conducted in an introductory physics course at the University of British Columbia compared students taught by traditional lecture with those taught by a clicker-based peer instruction system.  The two groups of students were closely controlled at the beginning of the semester, both receiving lecture-style instruction.  Then after 12 weeks, the instructional approach toward one group changed dramatically.

While the control group continued to receive traditional instruction, the experimental group began receiving clicker-based peer instruction.  The experienced professor was replaced by two graduate students knowledgeable in physics and trained in this particular instructional methodology, but otherwise lacking in teaching experience.  The results were dramatic:  by the end of the semester, the average test score of the experimental group was 2.5 standard deviations above the average in the control group.

The peer instruction relied heavily on student-to-student and whole-group discussion of material during class, which is largely credited for the gains in performance.  Bressoud has some interesting things to say about what this means for math instruction, inviting us to read more about how to shut up and teach.

Follow

Get every new post delivered to your Inbox

Join other followers: