When I moved to Nova Scotia in 1999, I had ten years of experience in Special Education. 10. But I was told that I could not work in Special Education because the neediest students deserved the best educated teachers, and I didn’t have an MEd. As a sub, I could not afford a fourth degree. I subbed for several years and eventually was able to enrol in a Masters of Education in Diverse Learners at Acadia.
I like learning, so I actually enjoyed my degree, tuition aside. One of my favourite courses was the one about Assessment. I’m into statistics (raise your hand if you’re a nerd), so I was REALLY into this one. One of our assignments was to write a pretend article for the kind of woman’s magazine you find in waiting rooms. Mine was about large-scale assessment. Here follows my submission, with some updates:
When I was in my mid-30’s, I got mono. Silly me. I was one year past the age that the Centre for Disease Control recommends that you not get mononucleosis. When the doctor told me that I had mono, I croaked my protestations (“I haven’t been kissing any teenagers!”), and she sent me off for a blood test.
While sitting in the waiting room during my next visit (one gland the size of the tennis ball, the other the size of a golf ball), I asked the receptionist about the test results.
“Negative,” she said.
“What does that mean?” I asked.
“You don’t have mono.” And she went back to her filing.
When I finally saw the doctor, I wailed “What’s wrong with me?” (Mono affects one’s emotions. Really.)
“You have mono.”
“No, I don’t!”
“Who told you that?”
“The receptionist! She said the blood test was negative, and that means I don’t have mono.”
She jumped up. “Wait right here.”
I never knew what she said to the receptionist, but she wasn’t working there much longer. When the doctor came back, she said these words that have resonated with me ever since:
“Tests don’t diagnose. Doctors do.”
Turns out that mono has little to do with kissing teenagers. And the blood test, a “mono spot,” doesn’t always work.
I’ve included this anecdote from my medical history to illustrate how far we have come as a society in trusting tests, and what the pitfalls of this trust might be. The fact is we trust tests because, aware as we are of the shortcomings of human subjectivity, we turn to objective measures for confirmation. This strategy is flawed in its core. No test created by humans avoids human judgement. Someone invented the mono-spot. People manufacture the microscopes and glass slides, and people conduct the tests.
In the analogous education world, the teacher is the doctor. The teacher has opportunities every day to “diagnose” how well your child reads, writes, speaks, listens, figures, predicts, synthesizes, revises and questions. Many forms of assessment are used, such as projects, journals, tests and oral presentations. Let’s say your child loves cheetahs; they research (read), take notes (write), and make a presentation (speak). They might even write a song about cheetahs to accompany their slide show. The teacher reinforces their strengths, and when they make errors, provides corrections. Not only are the students demonstrating multiple skills, but they are also using them in a way that people actually do in the real world. And isn’t that why we want literate children? Because we want literate adults.
During the school year, your child has multiple opportunities to demonstrate their skills. Maybe they were away the day their classmates chose their animals, and your kid got stuck with ants. Let’s say that they’re afraid of ants, and weren’t too motivated to do a good job on that topic. The next project is around the corner–maybe they’ll shine during the readers’ theatre about the expulsion of the Acadians. Maybe they forget periods at the ends of sentences. The teacher will reteach that skill, and the child will be marked on what they eventually learn, not the mistakes they made along the way. As they work through these activities, teachers provide ongoing feedback. Research shows us these practices—these formative assessments—are the best way to increase academic achievement.
Let’s contrast this process with large-scale testing. We have been led to believe that these tests are the most objective, therefore most accurate, measurements of student achievement. Unlike classroom assessments, they are not “tainted” by human subjectivity, in the form of the teacher. Centralized authorities (people, in fact) compose the tests, which are sent to schools to be administered according to precise instructions, complete with scripts that accompany them. (Almost every single teacher apologizes to the kids for the dorky scripts.)
However consistent these procedures may be, innumerable things can go wrong when testing is administered to thousands of children. Empty stomachs, bad moods and coughs are just the obvious ones. If a teacher has taught a concept using examples and vocabulary that differ from the test, or if students aren’t comfortable with multiple-choice testing, they may perform poorly on material they actually know. An actual standardized test, administered one-on-one by a qualified psychologist, takes these factors into account; large-scale testing doesn’t.
The mono-spot is judged by a human—the highly-educated doctor—who use their skills to account for the factors that affect a testing situation. While a person competent to make these judgments in a school—the highly educated teacher—is present on testing day, they can’t use those skills. Even if they want to—and they really, really want to—because however much teachers encourage students to learn independence, they hate to leave them hanging bereft, confused and ashamed. Which many are. During provincial assessments, I’ve seen kids cry, throw books, leave the room. One year I had six Gr. 6 kids crying over a math word problem. And I can’t help them. Not if I’m following the rules. All I could do was say, “You’re done; it’s over” and take away the offending booklets.
Teachers find it very difficult not to provide support during the provincial assessments, and not always just out of pure compassion; the teachers who administer the assessments are stakeholders. The mono-spot is a “low-stakes tests” for the testers; their jobs don’t depend on the results. They have no incentive to “fudge.” But large-scale testing is almost always “high-stakes.” What are those stakes? Worldwide, where these large-scale assessments are done, every jurisdiction metes out different consequences, but here are some of the possibilities. For the student: program placement or not, extra tutoring or not; for the teacher: reprimands or accolades, pay raises or not; for the school: funding cuts or extra funding; for the community: your school’s ranking in the newspaper and on the Internet. On real estate agents’ websites, no less.
We like to think that all teachers are professional, but the testing-day pressures on them are enormous. So I’ll leave you with this anecdote. I worked in a school (in another province) where test results were used by the rather nasty principal to punish teachers he didn’t like. On the day of the provincial literacy tests, one of the Gr. 3 teachers stood up in the staffroom and announced that she intended to read the test out loud to the her students. Any alteration in a testing situation affects results, but reading a literacy test out loud renders them pretty useless. Let me reassure you that this behaviour is unusual, although not unknown. While the rest of the staff were picking their jaws off the floor, I hurried down to the other Gr. 3 classroom, where my best friend sat at her desk weeping. She knew the principal didn’t like her; now her class’s results would almost certainly be lower, and this would be reflected in her evaluation. But as we discussed it, we realized that she could not bring herself to do something unprofessional; she would administer the test as instructed. And she did.
Although, this may seem to prove the point—you can’t trust teachers—look at it this way: because we don’t trust teachers—because we don’t trust the multiple tasks that teachers and students spend all year working on and documenting—we establish a single summative evaluation point, in which we have so much confidence that we’re happy to publish the results on the Internet. But do large-scale, high-stakes tests deserve that trust? Or have we created a situation in which results are not reliable, teachers act unethically, other teachers get shafted, students are frazzled and misunderstood. Fallible or not, we’ve got to trust humans; they’re all we’ve got. Your child’s teacher spends five hours a day, five days a week, ten months a year with your child. Who better to diagnose and treat your child’s academic health?
By the way, I got an A on that paper. But now I want to put the provincial assessment program to the test of consequential validity. Consequential validity is the analysis of how tests affect the people involved; do they do more harm than good, so to speak. Does the data gathered justify the effects of testing? What indeed, are the effects of testing: Cost: Printing. I have run my hands over the lovely paper in the testing booklets many times, and wished my school could afford that quality of paper. Or that we could print in colour. Postage. Each student’s name, provincial number and birthdate are printed on the front. Guess who checks all those—me! The unused booklets for children who have left the school must not be used for new children who have shown up. Instead, we order new ones. Postage. And return the unused ones, I assume, to be shredded. Postage.
I’m a resource teacher at my school. I spent one afternoon with one of the brightest students in my school because she had missed the morning administration of the assessment for religious reasons. She needed a quiet place to write that assessment and access to a member of the teaching staff. Me. Those are the rules. During the math assessment, she asked me to check the translation of a question (allowed). The question was so ridiculous, she couldn’t believe it. We both fell over laughing when we realized, that, yup, that’s what it says! (the confidentiality document I signed prevents me from telling you what it said.)
As I mentioned, I’m a resource teacher. My job is to support students with needs and their teachers. This year, because of the many required assessments and other documentation, I did not start doing my actual job until mid-October. I cannot think of a better example of consequential validity. And it’s going to start again in May, and will probably go deep into June. Our school year comprises ten months. I can already wave good-bye to three of them.