Assessment bias is a subject that all American educators should be well versed in given the diverse populations of students that American schools are tasked with educating (De La Cruz, 1996). For example, if an assessment task is framed in the context of playing computer games and the students taking the test know about computer games, then they have an advantage. The context of the assessment item becomes a biasing factor because a group of students knows less about computer games than other groups of students. The purpose of this post is to review the literature on assessment bias, to present a synthesis of information gathered from an interview conducted with the Director of Assessments at San Diego County Office of Education, as well as peer-reviewed research on the subject, and present some reflections and findings regarding assessment bias in general.
What's In the Books?
Assessment bias is part of the larger debate about intelligence, race, and inequality in society. Specifically, assessment bias exists “when test items offend or unfairly penalize students for reasons related to students’ personal characteristics, such as their race, gender, ethnicity, religion, or socioeconomic status.” (Popham, 2003, p. 55). Much has transpired regarding the subject over the last 40 years. To begin, let’s consider litigation that has been influential in the field.
Several court cases have influenced how assessment bias is dealt with in American educational systems. For example, the California Department of Education agreed to test bilingual children in both English and their primary language as a result of a 1970 court case. The case caused the entire state to delete unfair verbal items from tests and to reevaluate all Mexican-American and Chinese students enrolled in classes for individuals with limited-English proficiency or educable mental retardation (Diana v. State Board of Education, 1970). The issue arose from a complaint that Mexican-American culture was never incorporated into standardized assessments administered throughout the state.
Another example may be found in Lora v. Board of Education of the City of New York. In the late 1970s, it was determined that a disproportionate number of Africa-American and Hispanic-American students were enrolled in classes for individuals with emotional disturbance. As a result, due process rights related to linguistic, cultural, or ethnic background differences of students were incorporated into the standards and procedures for nondiscriminatory assessment and decision making (Lora v. Board of Education of the City of New York, 1977). Since decisions about the existence of emotional and behavioral disorders were subjectively made, professional responsibility was emphasized and a special advisory panel of experts on placement procedures was appointed.
The most significant case law that influenced how assessment bias is dealt with in California was that of Larry P (McLoughlin & Lewis, 1994). From 1968 to 1969, African American children comprised 27% of the population with mental retardation in California (Underwood & Mead, 1995). This was considered an abnormally large number considering that only 9% of the California population was composed of African-American children. Six students of African-American decent in the San Francisco Unified School District complained about the unconstitutional standardized intelligence tests that placed a disproportionate number of African-American students in classes for students with special needs (Larry P. v. Riles, 1984). As a result of the case, the State of California was ordered to stop using any standardized intelligence test for the identification of African-American students with mental retardation unless it was proven to be free of racial or cultural bias.
As a result of the aforementioned historically significant legal cases as well as a slew of others, the social context of assessment has changed. At the heart of the issues is culture. For example, Intelligence Quotient tests have been criticized as biased against groups of students from deprived and culturally different backgrounds (De La Cruz, 1996). Non-biased assessments limit their content to material that is common to all cultures but these are difficult to develop. Murphy and Davidshofer (1991) distinguished the major types of test bias. They based these bias types on the two purposes of tests that they defined as “to measure a particular characteristic or attribute and to predict scores on some criterion or outcome measure” (Murphy & Davidshofer, 1991, p. 258). Therefore, they proposed that biases occur when tests make systematic errors in measuring a specific characteristic or attribute and errors in predicting outcomes.
In more recent years, developers of large-scale tests employ rigorous bias detection procedures. A typical approach calls for the creation of a bias-review committee, usually 15-25 members, almost all of whom are themselves members of minority groups. These committees review assessment items and, if a certain percentage of the committee members believe an item might be biased, the item is eliminated from the assessment (McNeil, 2000). Bias in educational assessments, at an abstract level, is related to cultural malpractice. Researchers in focusing on the topic of assessment bias use terms like cultural malpractice or cultural negligence because it serves to focus attention on historic and current inequities in education and psychological services available for multicultural populations (Dana, 2005).
Additionally, research has borne that assessment bias corrupts assessment validity (Popham, 2010). Because of the unfair penalization of test’s items, when certain groups of students score less well than they otherwise would have scored, educators will almost certainly arrive at invalid inferences about how well those students have mastered whatever is being assessed. A preponderance of research suggests two detection strategies as most viable and widely used in recent years.
Judgmental bias detection strategies are simple, obvious ways to tell if a test item might be biased against a particular group of students (McMillan, 2008). An assessment specialist recognizes the need to reduce potential assessment bias in a test and appoints a group of reviewers comprised of experienced teachers and representatives of the groups most likely to be impacted adversely by the presence of biased test items. Empirical bias detection strategies may be employed once there are sufficient numbers of items to work with. Empirical strategies include field-testing with large samples. Based on these field tests, simple comparisons may be made between the performances of majority students and any sizable minority groups (Linn, Miller, & Gronlund, 2008).
Opinions from the Field
The Director of the Assessment Unit in the Learning Resources and Educational Technology division of the San Diego County Office of Education in California was interviewed regarding assessment biases in California’s State Testing And Reporting (STAR) assessment suite. As the Director of the Assessment Unit, she provides information, consultation, and technical assistance in all areas of statewide, local, and classroom assessment for districts in San Diego County. She is widely considered the go-to person for the most current information, resources, and key contacts that support the work of educators interested in gathering and using assessment data to benefit teaching and learning.
Over a 45-minute period, she was posed a series of questions regarding assessment biases including:
In your opinion, do tests in the STAR system include assessment biases?
Is it fair to say that English learners in California are unfairly penalized based on their language proficiency when it comes to statewide tests?
Does the California Standards Test have disparate or adverse impacts on certain ethnic groups?
How can assessment biases be identified prior to the first installation of a test?
What accommodations may be made for students with special needs as they take statewide tests?
During the interview, the director clearly stated that the STAR system does include some bias inasmuch as it cannot contextualize assessment items for over 40 ethnicities and hundreds of differing cultures across the state. However, annually, the state conducts assessment bias identification exercises by a committee of reviewers assembled by nomination from various school districts. The committee meets in Monterey, California during the month of May each year and, over a two week period, reviews the oncoming version of the assessments in order to calibrate which items may be biased.
What was most striking about the interview was the degree to which the Director vehemently expressed her belief about the degree to which English learners are unfairly penalized. She stated that the No Child Left Behind Act of 2001 (NCLB) was primarily responsible for this penalization and that if it were not for the need of the monies that accompany compliance with NCLB, the state would have long ago taken action to adequately assess limited-English proficient students without casting them as a subgroup which continuously causes schools to be stigmatized as needing improvement. However, in response to the final question, she did point out that the state has taken extensive measures to provide students with special needs with accommodations that may help them perform better on STAR tests including the use of primary language glossaries, extended periods for test taking, large print assessment forms, as well as a completely modified version of the standards test written entirely in Spanish. Despite these accommodations, the Director was adamant in expressing her concerns about disparate effects on populations such as students with disabilities as well as English learners. By extension, because nearly 80% of the English learner population in California is of Hispanic decent, Latino students are also adversely affected. According to her, the accommodations do not make up for a total lack of English skills or cognitive impairments that prevent some students from being successful on given assessments. For example, shockingly, the fact that the state offers a modified assessment presented in Spanish is trumped by the fact that the state does not count the results as valid measures because of the modification rendering the accommodation useless.
The context and content of an assessment tasks should be familiar to the students expected to complete the task if an assessment item is to be considered free of biases. The values and experiences of the particular group of students taking a test should be reflected in the assessment items. Although stereotypes based in gender, culture, or ethnicity are sourced in general society, assessment items must be free of these biases. This can be achieved through thoughtful judgment-based and empirical bias detection strategies as previously discussed. Additionally, assessment bias is sometimes a function of how a rater evaluates students’ performances—not just how assessment items are developed and administered. As such, evaluators must continually reflect on their values and attitudes about certain students as well as certain groups of students because beliefs held unconsciously by educators may cause them to rate students based on preconceived notions. As stated previously, the purpose of assessing students is to determine what course of action should be put in place in order to better educate them. If assessment biases are present, the validity of assessments are subsequently rendered weak if not nil. These biases, therefore, pose a significant threat to the entire purpose of assessment proceedings.
Dana, R. H. (2005). Multicultural assessment: Principles, applications, and examples. Mahwah, NJ: Lawrence Erlbaum Associates.
De La Cruz, R. E. (1996). Assessment bias in special education: A review of literature (Information Analysis ED 390 246). Washington, D.C.: ERIC.
Diana v. State Board of Education, 70 Civic Act CA 37 (N.D.Cal 1970).
Larry P. v. Riles, 71 RFP Statute 495 (N.D. Cal 1984).
Linn, R. L., Miller, D., & Gronlund, N. E. (2008). Measurement and assessment in teaching (10th ed.). Upper Saddle River, NJ: Prentice-Hall.
Lora v. Board of Education of the City of New York, 74 Resolution, F. 565 (New York 1977).
McLoughlin, J. A., & Lewis, R. B. (1994). Assessing special students (4th ed.). New York: Merrill.
McMillan, J. H. (2008). Assessment essentials for standards-based education (2nd ed.). Thousand Oaks, CA: Corwin.
McNeil, L. M. (2000, June). Creating new inequalities: Contradictions of reform. Phi Delta Kappan, 81(10), 728-734.
Murphy, K. R., & Davidshofer, C. O. (1991). Psychological testing: Principles and applications (2nd ed.). Englewood Cliffs, NJ: Prentice Hall.
Popham, W. J. (2003). Test better, teach better: The instructional role of assessment.
Alexandria, VA: Association for Supervision and Curriculum Development.
Popham, W. J. (2010). Everything school leaders need to know about assessment. Thousand Oaks, CA: Corwin.
Underwood, J. K., & Mead, J. F. (1995). Legal aspects of special education and pupil services. Boston: Allyn and Bacon.