Reading assessment principles and practices for elementary teachers. second edition
User icon An illustration of a person's head and chest. Sign up Log in. Web icon An illustration of a computer application window Wayback Machine Texts icon An illustration of an open book. Books Video icon An illustration of two cells of a film strip.
Video Audio icon An illustration of an audio speaker. Audio Software icon An illustration of a 3. Software Images icon An illustration of two photographs.
Images Donate icon An illustration of a heart shape Donate Ellipses icon An illustration of text ellipses. Reading assessment : principles and practices for elementary teachers : a collection of articles from the reading teacher Item Preview.
EMBED for wordpress. Want more? Advanced embedding details, examples, and help! Discover practical solutions for overcoming the boundaries and labels that standardized tests place on students, teachers, and schools.
Paris, Robert C. Calfee, Nikola Filby, Elfrieda H. Hiebert, P. David Pearson, Sheila W. Valencia, and Kenneth P. Students who take the test have their performance compared to that of students from the norm sample to make meaning of the score.
An extremely low score may indicate a learning problem, or, it may signal a lack of motivation on the part of the student while taking the test. Perhaps a low score could even be due to a scoring error made by the tester. Even though a score from a diagnostic assessment may be quite precise, understanding why a student scored at a particular level requires additional information.
Did observations during testing show that the student was distracted, uncooperative, or was squinting at items? It is often a combination of assessment information that helps identify why a student may have scored a certain way and is why testers often use their observations during testing to interpret the meaning of scores. Like screeners, they are administered to all students at a particular grade level, but unlike most screeners, they take more time to complete and are administered to entire classrooms rather than having at least some sections administered individually.
Like diagnostic tests, they tend to produce scores that are norm-referenced. For many diagnostic literacy tests, reviews are available through sources such as the Mental Measurements Yearbook MMY. Versions of the MMY are available in hard copy at many libraries, as well as online for free for students at colleges and universities whose libraries pay a fee for access.
Reviews are typically completed by experts in various fields, including literacy and measurement experts. Reviews also include complete descriptions of the test or assessment procedure, who publishes it, how long it takes to administer and score, a review of psychometric properties, and a critique of the test in reference to decisions people plan to make based on findings.
It is important for teachers and other educators who use tests to understand the benefits and problems associated with selecting one test over another, and resources such as the MMY offer reviews that are quick to locate, relatively easy to comprehend when one has some background knowledge in assessment , and are written by people who do not profit from the publication and sale of the assessment.
Alternatively, a single low score does not necessarily signal a lack of ability to learn, since with a change in instruction, the student might begin to progress much faster and eventually catch up to his or her typical age-based peers. These screeners are not only designed to measure the extent to which students are at risk for future literacy-related problems at the beginning of the school year but also to monitor changes in progress over time, sometimes as often as every one or two weeks, depending on individual student factors.
Being able to work with key details in a text could also be informally assessed by observing students engaged in classroom activities where this task is practiced. Unlike assessments that are completed only one time, progress-monitoring assessments such as DIBELS Next and AIMSweb feature multiple, equivalent versions of the same tasks, such as having 20 oral reading fluency passages that can be used for reassessments.
Using different but equivalent passages prevents artificial increases in scores that would result from students rereading the same passage. Progress-monitoring assessments can be contrasted with diagnostic assessments, which are not designed to be administered frequently. Administering the same subtests repeatedly would not be an effective way to monitor progress. Some diagnostic tests have two equivalent versions of subtests to monitor progress infrequently—perhaps on a yearly basis—but they are simply not designed for frequent reassessments.
This limitation of diagnostic assessments is one reason why screeners like DIBELS Next and AIMSweb are so useful for determining how students respond to intervention and why diagnostic tests are often reserved for making other educational decisions, such as whether a student may have an educational disability. Progress-monitoring assessments have transformed how schools determine how a student is responding to intervention. Jaime was given oral reading fluency passages from a universal literacy screener, and then his progress was monitored to determine his response to a small group literacy intervention started in mid-October.
Data points show the number of words Jaime read correctly on each of the one-minute reading passages. If his progress continued at this same rate, by the end of the school year, he would be even farther behind his peers and be at even greater risk for future reading problems. When interpreting the graph in Figure 2, it becomes clear that intensive reading intervention was needed. Based on this information, Jaime is not likely to reach the level of reading 90 words correctly by the end of second grade and will probably only reach the benchmark expected for a student at the beginning of second grade.
It is also likely that Jaime will need to continue receiving intervention into third grade, and progress monitoring can determine, along with other assessment information, when his oral reading fluency improves to the point where intervention may be changed, reduced, or even discontinued.
You may wonder how the intervention team would determine whether Jaime is progressing at an adequate pace when he is in third grade. If his slope shows a lack of adequate progress, his teachers can revisit the need for intervention to ensure that Jaime does not fall behind again. Computer-adapted assessments are increasing in popularity in schools, in part, because they do not require a lot of time or effort to administer and score, but they do require schools to have an adequate technology infrastructure.
Or it could be that the student does not know the meaning of many vocabulary words and needs to build background knowledge to read fluently Adams, , which would require the use of different assessment procedures specifically designed to assess and monitor progress related to these skills.
Even more vexing is when low oral reading fluency scores are caused by multiple, intermingling factors that need to be identified before intervention begins. When the problem is more complex, more specialized assessments are needed to disentangle the factors contributing to it.
For example, a student reading 10 correct words per minute on an oral reading fluency measure whose growth is at the 5th percentile is improving much more slowly compared to the other children who also started out reading only 10 words correctly per minute.
Preliminary research shows some promise in using growth percentiles to measure progress as an alternative to slope, and teachers should be on the lookout for more research related to improving ways to monitor student progress. How can teachers figure out the details of what a student needs in terms of intervention? You may be starting to recognize some overlap among different types of assessments across categories. For example, state tests are usually both formal and summative.
Literacy screeners and progress-monitoring assessments are often formal and formative. And some assessments, such as portfolio assessments, have many overlapping qualities across the various assessment categories e.
In bringing up portfolio assessments, this takes us back to points raised at the beginning of this chapter related to the authenticity of literacy assessments.
So why do multiple choice tests exist if options such as portfolio assessment, which are so much more authentic, are an option? High-quality multiple choice tests tend to have stronger psychometric properties discussed in the next section than performance assessments like portfolios, which make multiple choice tests desirable when assessment time is limited and scores need to have strong measurement properties.
Multiple choice test items are often easy to score and do not require a great deal of inference to interpret i. Portfolio assessments often take longer to do but also reflect the use of many important literacy skills that multiple choice items simply cannot assess. Based on this discussion, you may wonder if portfolio assessments are superior to multiple choice tests, or if the reverse is true.
As always, an answer about a preferred format depends on the purpose of the assessment and what kinds of decisions will be made based on findings. A chapter about literacy assessment would not be complete without some discussion about psychometric properties of assessment scores, such as reliability and validity Trochim, Reliable assessment means that the information gathered is consistent and dependable—that the same or similar results would be obtained if the student were assessed on a different day, by a different person, or using a similar version of the same assessment Trochim, If these same inconsistencies in ratings arose across other items on the reading behavior scale or with other students, you would conclude that the scale has problems.
These problems could include that the scale is poorly constructed, or that there may simply be inter-rater reliability problems related to a lack of training or experience with the people doing the ratings. Reliability of formal assessment instruments, such as tests, inventories, or surveys, is usually investigated through research that is published in academic journal articles or test manuals.
This kind of research involves administering the instrument to a sample of individuals, and findings are reported based on how those individuals scored. The more stable reliability estimates are across multiple diverse samples, the more teachers can count on scores or ratings being reliable for their students. When reliability is unknown, then decisions made based on assessment information may not be trustworthy. The need for strong reliability versus the need for authenticity i.
In addition to assessments needing to be reliable, information gathered from assessments must also be valid for making decisions. A test has evidence of validity when research shows that it measures what it is supposed to measure Trochim, A weekly spelling test score may lack evidence of validity for applied spelling ability because some students may just be good memorizers and not be able to spell the same words accurately or use the words in their writing.
When assessment information is not reliable, then it cannot be valid, so reliability is a keystone for the evaluation of assessments. Sometimes, a test that seems to test what it is supposed to test will have issues with validity that are not apparent. For example, if students are tested on math applications problems to see who may need math intervention, a problem could arise if the children may not be able to read the words in the problems.
In this case, the students may get many items incorrect, making the math test more like a reading test for these students. It is research on validity and observations by astute educators that help uncover these sorts of problems and prevent the delivery of a math intervention when what may actually be needed is a reading intervention.
The validity issue described above is one reason why some students may receive accommodations e. If students with reading disabilities had the above math test read to them, then their resulting scores would likely be a truer indicator of math ability because the accommodation ruled out their reading difficulties.
This same logic applies to English language learners ELLs who can understand spoken English much better than they can read it. If a high school exam assessing knowledge of biology is administered and ELL students are unable to pass it, is it because they do not know biology or is it because they do not know how to read English? If the goal is to assess their knowledge of biology, then the test scores may not be valid.
Another example of a validity issue occurs if a student with visual impairment were assessed using a reading task featuring print in point font. If the student scored poorly, would you refer him or her for reading intervention?
Hopefully, not. The student might actually need reading intervention, but there is a validity problem with the assessment results, so that in reality, you would need more information before making any decisions. On the other hand, if the student still scored low even with appropriately enlarged print, you would conclude that the student may have a visual impairment and a reading problem, in which case providing reading intervention, along with the accommodation of large print material, would be needed.
While there is little controversy surrounding literacy assessments that are informal and part of normal classroom practices, formal assessments activate huge controversy in schools, in research communities, on Internet discussion boards, and in textbooks like this. When considering the scope of educational assessment, one thing is clear: many school districts give far too many tests to far too many students and waste far too many hours of instruction gathering data that may or may not prove to have any value Nelson , The over testing problem is especially problematic when so much time and effort go into gathering data that do not even end up being used.
Whether a school is overwhelmed with testing is not universal. School districts have a great deal of influence over the use of assessments, but all too often when new assessments are adopted, they are added to a collection of previously adopted assessments, and the district becomes unsure about which assessments are still needed and which should be eliminated. Assessments also are added based on policy changes at federal and state levels.
For example, the passing of the No Child Left Behind Act of NCLB , expanded state testing to occur in all grades three through eight, compared to previous mandates which were much less stringent. Some tests are mandated for schools to receive funding, such as state tests; however, the use of other assessments is largely up to school districts. It is important for educators and school leaders to periodically inventory procedures being used, discuss the extent to which they are needed, and make decisions that will provide answers without over testing students.
In other words, the validity of assessments is not only limited to how they are used with individual students but must be evaluated at a larger system level in which benefits to the whole student body are also considered.
When assessments provide data that are helpful in making instructional decisions but also take away weeks of instructional time, educators and school leaders must work toward solutions that maximize the value of assessments while minimizing potential negative effects.
Not liking test findings is a different issue than test findings not being valid. For example if a test designed to identify students behind in reading is used to change instruction, then it may be quite valuable, even if it is unpleasant to find out that many students are having difficulty. As a society, we tend to want indicators of student accountability, such as that a minimum standard has been met for students to earn a high school diploma. Often, earning a diploma requires students to pass high-stakes exit exams; however, this seemingly straightforward use of test scores can easily lead to social injustice, particularly for students from culturally and linguistically diverse backgrounds.
Because high-stakes tests may be inadequate at providing complete information about what many students know and can do, the International Reading Association IRA, released a position statement that included the following recommendation:.
There is no easy answer for how to use assessments to precisely communicate how well students are prepared for college, careers, and life, and we are likely many reform movements away from designing a suitable plan.
Literacy assessments can only be used to improve outcomes for students if educators have deep knowledge of research-based instruction, assessment, and intervention and can use that knowledge in their classrooms.
For this reason, information from this chapter should be combined with other chapters from this book and other texts outlining the use of effective literacy strategies, including students who are at risk for developing reading problems or who are English language learners.
Although literacy assessment is often associated with high-stakes standardized tests, in reality, literacy assessments encompass an array of procedures to help teachers make instructional decisions. Knowing about the different kinds of assessments and their purposes will allow you to be a valuable addition to these important conversations. Literacy assessments can be informal or formal, formative or summative, screenings or diagnostic tests.
They can provide data at single points in time or to monitor progress over time. Regardless of their intended purpose, it is important that assessment information be trustworthy. It is also important that teachers who use assessments understand associated benefits and difficulties of different procedures.
An assessment that is ideal for use in one circumstance may be inappropriate in another. For this reason, teachers who have background in assessment will be better equipped to select appropriate assessments which have the potential to benefit their students, and they also will be able to critique the use of assessments in ways that can improve assessment practices that are more system-wide.
Literacy assessments are an important part of educational decision making, and therefore, it is essential that teachers gain a thorough understanding of their uses and misuses, gain experience interpreting information obtained through assessment, and actively participate in reform movements designed not just to eliminate testing but to use assessments in thoughtful and meaningful ways.
Adams, M. American Educator, 34, , Afflerbach, P. The classroom assessment of reading. Kamil, P. Pearson, E. Afflerbach Eds. New York, NY: Routledge.
0コメント