Wednesday, October 7, 2009

My Dissertation Study

My dissertation study

A lot of people have wondered what kind of dissertation someone does in English composition, or specifically what my dissertation is about. So here goes. I gave mine a nice lofty title, mostly as a way of positioning myself for a job someday—I know, but it's just a title. “Applying Modern Validity Theory to College Writing Assessment” or something like that.

The simple version:
I am trying to find out the experience students go through after they have been placed in Basic Writing (the lower level first year composition class). So I have gathered about 10 participants and I'll be interviewing them, reading some of their work—including the essay that got them placed into BW—and just trying to get inside the overall experience, the more global effects of being placed into BW.

The larger implications:
The larger issues here are educational tracking and assessment (or test) validity. People all over the world are tracked according to their “abilities” or “aptitudes,” ideally all done for their best interest, but the bigger question is whether tracking people this way is actually good or bad for them—and of course, if it's bad for them, who is benefiting..? In other words, if a student “belongs” in BW, we say this because we think they will be better off in the long run for having taken the course; we believe that are not yet ready for College Writing (the mainstream composition class), either because they don't have certain “skills” or perhaps they just need more time writing as they develop and mature. Also, we fear that putting them in CW will be bad because they will get low grades, or maybe fail; they will get discouraged because they will feel behind, all that kind of stuff. In this way, it could be the best decision.

But what if not? There's also the chance that this person needed that little boost saying, You're a better writer than you may realize or than your schools have let you know; you belong in the more advanced class; maybe they are the type of learner who rises to a challenge and responds more to a little added belief in themselves than in a kick-in-the-pants type of placement. We don't know. And the placement method itself won't tell us. What's more, the student themselves might not really know. To make it worse, this course may separate them from their friends, is usually non-credit-bearing (but still costs money) which means they will be a semester behind for graduation credits; often this course will not transfer—and since there may be a high correlation between BW placement and low socioeconomic status (SES), this can put undue burdens on someone who has to transfer because of financial constraints and now had to pay for a class that they can't even use toward graduation... The list goes on. One of the larger points is that this is a high stakes decision. Yes, there are educational matters at hand—being ready, “ready,” for CW, etc., but there are much more global issues of social identity, perception of self as a student, economic burden—things that may greatly outweigh the educational benefits of the placement.

So how to we find out? And why did I call this a matter of validity theory?

Validity, what it is.

Validity is “scientific inquiry into score meaning” (Samuel Messick, 1989). The old way of looking at it (pre mid-1950s) was as follows: a test score is reliable to the extent that the same results would happen over and over again (whether the student took the same test over, or whether several graders graded it); but its only valid if the test actually measures the thing it was supposed to test. We no longer quite see it this way. But just for an example, if you measure water temperature with a thermometer and it reads 80 degrees, it might be right, but lots of things could be wrong. So you could try it a few more times with the same thermometer, and you could try several different thermometers—if they all say 80 degrees, you have a very reliable measure. But, if the water is forming ice crystals, or is about to boil over, then clearly, the thermometer did not give a valid measure of the temperature. So reliability is the extent to which measures are consistent, and validity is (or used to be) the extent to which they are “correct.”

But the educational sciences outgrew this definition of validity for a few reasons. First, when you're talking about a psychological/mental construct—can this kid write well enough to take CW?--the issue of a “correct” answer just may not exist. In these matters, such as BW writing placement, there really is no correct or incorrect answer, but the answers are more or less plausible. But there's a bigger issue at stake. The measurement tool—the assessment procedure—is here to help us obtain evidence for the students' placement; it is not here to place them. This is huge. WE place them. The test is a tool that, if designed will (if it measures what it's supposed to measure) give us good evidence, but we the humans are the ones who make the decision. Validity, therefore, resides not in the measurement tool, but in the decision that we the humans make. So if the thermometer reads 32.5 degrees but we still go swimming and freeze to death? It's wasn't the thermometer's fault.

It has been about fifty years since we took validity out of the hands of the test itself and but the burden upon ourselves. What this means is that, in order to make the best possible decision based on an assessment, we need to accrue all possible evidence for and against the decision; thus, validity is seen now as a type of argument, not just as a matter of given fact. In terms of my study, I am investigating possible reasons for or against the placement of these particular students in order to examine possible issues in the larger matter of educational tracking.

One other major shift in validity theory is that the validity of the decision made and the social consequences of the decision are now seen as one, not two imperatives. (This was a major contribution of Samuel Messick, 1989; the validity as an argument idea is credited to Cronbach, 1988, and Kane, 1992, for those interested.) Again, my study seeks to fulfill this imperative. I have read tons of literature about the need to incorporate social consequences in validation inquiry, but I have never seen a study that actually does it. (Again, for the nerdier, Shepard, 1993, talks about it, but I don't think she goes far enough.)

So what happens now is that we've really opened a debate about science and scientific method in educational and social research. If we are to examine validity as a interpretive argument, with the assessment scores being the evidence and the decisions made being the conclusion—and if we are to take in social consequences as part of the evidence for or against the validity of this argument, we have graduated from the scientific paradigm that looks for the “correct” score as the valid score. There just isn't such thing in these matters. Our scientific inquiry into the score meaning of these placement essays must—if it is to follow the major ideas of validity theorists—examine phenomena that are not representable through quantitative means, and certainly not through a paradigm that even believes that there is objectivity is the measurement of this type of phenomena. Objectivity and correctness just don't come into play here because there just is no objectively correct answer to the big questions in life, such as, Did we place that student in the right class? The traditional view of validity would allow for a glance at transcripts and retention rates perhaps and be satisfied that the assessments were valid. But no longer. We know too much more now, and modern theorists do not but into the notion of excluding any possible evidence from the validity argument. (Pamela Moss is another major theorist to check out.)

Another reason for this study is that I see a lot of wondering in composition literature about what happens to basic writers as a result of their placement, but again, I don't recall ever seeing a study that sought to find out. That's what I'm trying to do. And I'm approaching this from a paradigm of phenomenology, which means that rather than giving surveys to thousands of BW students, and analyzing the data I receive on that level, I am attempting to get inside (and even help co-construct) the “lived experience” of these few students. This type of paradigm seeks insight, rather than objectivity or generalizability. And this this something I would love to fight about! The question is, Is the aggregate of common experiences of thousands of people more telling that the vivid life stories (in relation to our question) of a handful? Clearly the two inquiries are after different things, and each is good for something that the other isn't. But anyone who has read this and who sees no value in a phenomenological approach, as I've outlined here—let's fight it out! Maybe you're right. But maybe I'm right...

So far I have interviewed ten participants, and I've found some really interesting themes emerging—themes I could not have predicted ahead of time. I'll let you know as I gather more data. The interviews have been very open-ended, and I've decided to find times to let them know what I think I'm hearing thematically, to let them reflect and even construct experience right there with me. I'll be glad to talk more about this method and methodology in another post! It's solid...

JL

DOE hates Educational Research

Having spent a few years now in schools of education/educational research, I have often bemoaned the fact that all the work done within this group of fields seems to go entirely uncommunicated to the public and entirely ignored by those in educational policy—the ones who make the decisions. I remember being at a talk by David Berliner at BC where he spoke of the evils of No Child Left Behind (NLCB), some of which I hadn’t even thought of before. It struck me that he might travel all over the country, to schools of education, and meet similar audiences—all in agreement with him and outraged by the injustices being perpetrated on our youth, all in the name of saving them. But what’s up with that? Why is the field of educational research so out of communication with the policymakers who actually effect change, and with the public at large who elect the policymakers? Maybe it’s because like any field of scientific research, the researchers tend really to write to and for one another, not for the consumption of the public. But, then what are we doing in educational research?

Yesterday I had the chance to attend a talk by Elizabeth Williamson, educational program specialist from the Department of Education (DOE). This would be interesting. I’ll give some highlights. (As I relive the horror, please excuse progressively worsening typos.)
The overall frustration is that there is educational research, and then there are the decision makers who don’t know or care to know what knowledge has been created by that field. It seems as if they would rather make decisions based on instinct, and just start spending the $100 billion stimulus dollars (Williamson’s number) without any foundation upon which to weigh their decisions. What’s more ironic is that they constantly call for more "scientific" educational research*, but they don’t seem to want to consult what’s already there. Somehow, people have begun to see the word “theory” as a bad word, no, an “unscientific” word, which is about the craziest thing I can imagine. I’ll get to the guy who told me I was “just talking theory” in a minute…

There is this constant use of the phrase “XXXs that work,” which is completely bereft of meaning and definition. Williamson even mentioned that they are looking for programs that work, and that if you come up with one, they’ll give you money. Sounds good. But again, they want things that work but they hate research? I don’t even think they have a research department—though they must—but Williamson went into a blank stare any time I asked who was looking for (researching) these programs. So, in their search for what works, they seem content to spend $100 billion on aspirin without looking into an MRI first—that’s just theory. We know that aspirin helps headaches; it works. Don’t confuse us with theory. And the analogy goes from there. Is this unfair? Let’s fight. I think it’s not.

One professor of school psychology brought up this issue of what works: he said something like, Look, we know what works; it’s just that teachers aren’t doing it—they have too much freedom to do whatever they want, and not just what works. Oh yes, he did. Williamson seemed to agree. But this is crazy. I’m sorry. It’s crazy. The word on the street about teachers and their freedom to choose what and how to teach? Yeah, that’s not happening. This is roughly the opposite of truth. In fact, from all I gather, the brightest most well meaning teachers are downright handcuffed by the onslaught of shallow assessment procedures and impoverished accountability standards, to the point where no matter what they know about their students and how they learn, they are left with no choice but to train them for this decontextualized curriculum and its equally abstract assessments. The “theory” that teachers have too much freedom these days is, from all I have gathered, bullshit. And I would love to fight about it if I’m wrong!

So that’s the first mistake in what this guy said. The other is this notion that we "know what works." This has a few problems laced within it: first, that there is even a defined notion of “works” for us to know what is doing it; and second that we have adequate assessment methods that can tell us if this "what works” is being achieved—which of course is impossible if we don’t even know what “what works” means. Bad assessments can tell you anything you want them too; all you need is a desire of an answer and no regard for whether or not it has any truth value (i.e., no ethics). For example, if we want an easy way to assess standards for literacy, we could just set up a super shallow construct—say, how fast a student can read 1,000 words—and then we’d have a very easy measure for who is and who isn’t literate. And we could rank them, and we can report our stats to the DOE, AND we could easily figure out which techniques “work” to get kids to perform this stupid task. But it’s all because we are assessing bullshit; we have randomly decided upon a definition of one of the many many things that make a person literate, and no coincidence, we have decided upon one that is quickly and concretely measurable (that is, NON-scientific). So if we’re okay with that, then yes, we do know what works.

But we’re not okay with this. While we’re selecting one abstract skill—out of all the things that make up literacy—that is easy to measure with precision, why not choose students’ ability to create rhymes out loud for the longest amount of time, while still making sense. (If you've even see people freestyle live, you can't deny that its a highly skilled literacy practice, though not one prized in school.) That sounds like a strong indicator of literacy, easily as strong as the silly example above. It’s just as measurable too. We can get into this later, but this clearly is not a valued literacy practice in most middle class communities, but is a very highly valued practice in many nonmainstream communities. The fact that this issue of “what works” is laced with ethnocentrism is another point that would be fun to fight about. I’ll leave it at that. But it is no coincidence that in the pursuit of stripping out abstract, decontextualized skills out of the web of practices that is a person’s literacy, that the skills chosen are going to reflect the community values of the people making the decision. But also that they won’t know it; they will be convinced that what they have isolated is literacy. Thus the problems begin…

So I needed at one point to interrupt this guy and ask him, DO we know what works? What do you even mean by “works”? And he gave me that Bill O’Reilly, Oh come on look. He replied “reading instruction works.” Aghast, I asked him, what type of reading instruction? There is no agreement upon what works in terms of the different approaches**, to which he replied that I was “just talking theory [read, bullshit].” But… what? Really, what did he mean? What does work? And what does it work for? We can ignore these things, but why would we? For whose benefit? Ours or the students'? We have all this money, and we have this whole field(s) of educational research that have for years been way ahead of even the most current popular conceptions of good educational programs. Why the disconnect? Can anyone speak to this? Why would we ignore this and just make decisions based on the way the world appears rather than research that investigate the way it really operates (read, scientific)?

I specifically asked Williamson to speak to this, but I think I was too diplomatic, because she totally missed my point. After the talk, I shook her hand and thanked her for speaking with us because I told her how, so often in educational research contexts, we speak amongst ourselves, but never get to hear from the policy side (a la the Berliner talk). She secretly called me over for a private word. But what she said was that she had heard the “same thing” from her son, who had just received his master’s in education. He “agreed” that all people in education do is bitch and whine… Uh… So I said something like, yeah, I guess it can seem like whining when all the research and theory we have developed stays within our own circles and never reaches the people who can make changes. But I don’t think we were having the same conversation. Yet, this speaks SO very loudly to the problem at hand! People in the policy business see people in researchers as whiners, bitchers, complainers! Of course they don’t listen to us. If they, like prof. school psych, can politicize “theory” into a buzz word for left wing, liberal, intellectual, complaining, etc., then of course they have established cause to just make $100 billion dollar decisions without consulting us! What is going on here? Why do we even have entire bodies of research if nothing we learn or discover is considered real because it’s just theory, whereas they want “science.” It’s only now occurring to me why the current call for more “scientific” educational research! What they mean by scientific isn’t scientific; they mean concrete and predictable and objective, not scientific. This will be a topic for another post. Wow.
Ok, a little more. NCLB was (!) known for, among all of its ridiculous misconceptions of scientific educational measurement, conflating the assessment of the individual with that of the school/system, which is not sound. Validity must be establish through different means for each application of a an assessment procedure, and using one test to attempt to establish (with validity) the progress of a student and the progress of his or her school is just not scientifically sound.*** Yes, this is just theory. Just science. That is all. You can’t do it. If your study has no validity is, get ready… has no validity. I know, what an egg head, eh? But the fact is that Williamson said that the “new” act—the renewal of the Elementary and Secondary Education act****--would seek new and “flexible” ways of assessing both student and school progress. Really good to hear.

But when I asked her how they were looking for such procedures, I got whatever is less than a blank stare. When I told her that we in the field of educational assessment had been working on such procedures and had been roughly fifty years ahead of the current methods, I got nothing. Who is in charge of research for the DOE? How can they be looking for new assessment procedures, all while we have had “new” procedures for all this time? I put new in quotes, because the procedures used are probably about a full generation behind the “theory” and research of the field. But again, the point is, what is going on with this disconnect? Where are they looking?? WHY aren’t they looking in the place where the research already exists? Do they like reinventing wheels? I mean, invention is wonderful, but when the research is right there, when the wheel has already been reinvented, what is the possible reason for not using it? The only thing I can think is partly the fault of educational researchers themselves, and it’s the problem of the ivory tower, of every academic field. In order to establish credibility, scientists are expected to speak only to an audience of their colleagues; if they speak to a lay audience, they are traditionally looked down upon. This isn’t knew. Thomas Kuhn recounts this throughout the history of modern science in The Structure of Scientific Revolutions. But it’s a problem with a field that should be having impact on policy. I think I’ll leave it at that. For now.
*See Howe, K. R. (2009). Epistemology, methodology, and educational sciences; Positivist dogmas, rhetoric, and the education science question. Educational Researcher, 38(6), 428-440.

**There are many sources that go over two major conflicting view of reading instruction. Basically, there is a “top-down” approach which suggests that reading is about meaning, so an emphasis on meaning will lead to greater eventual proficiency in word recognition; and there is a “bottom-up” approach that says the opposite, namely, that student need “phonic,” the letter to sound connection in order to recognize words and to eventually get to the meaning of the text.

***A recent dissertation by Sharon Rosenberg centers on this very validity issue: Rosenberg, S., L. (2009). Multilevel validity: Assessing the validity of school-level inferences from student achievement test data. Unpublished Dissertation, University of North Carolina at Chapel Hill, Chapel Hill.

****NCLB itself was a reinstatement of ESEA, so I don’t know how much spin is involved when the DOE says that we don’t have NCLB anymore; we now have a reinstated ESEA.