Glossary of Common Assessment Terms

Assessment - Assessment comes from the Latin word assidere, which means to sit beside. In education, this "sitting beside" includes observing, collecting information, coaching, and otherwise supporting the learning process. Assessment is an integral part of learning. It invites the learner to also take a close look at his/her own learning process, to reflect on it, to build on strengths, and to work on improving.

Alternative Assessment - Alternative assessment is any assessment in which the learner creates a response to a question rather than choosing from responses that have been provided. Alternative assessments might include short answer questions, essays, performance assessments, oral presentations, exhibitions and portfolios.

Achievement Test - Achievement tests are standardized tests designed to measure the amount of skill or knowledge students in a school or district have gathered with respect to a very focused area. The "standardized" component of these tests has nothing to do with how good or complete the test is. It simply refers to the fact that all the tests are administered and scored the same way (essentially by machine), and also that the tests are designed to measure content that has (presumably) been taught to students in a fairly standardized way.

Analytic Trait Scoring - Analytic scoring identifies traits essential to success in a given performance and requires trained raters to score those traits individually. In six-trait analytic writing assessment, for instance, instead of getting just one score for "overall effectiveness," a paper receives six separate scores - for ideas, organization, voice, word choice, fluency, and conventions. Together, these scores create a profile of performance. The use of a scoring guide in which traits are defined in writing helps ensure consistency in the way writing (or any kind of performance) is assessed.

Authentic Assessment - Authentic assessment is based on tasks that mimic real life as closely as possible. A good example is a driving test, in which a would-be driver is asked to cope with many of the situations he/she will encounter in everyday driving.

Competency Test - This is a test intended to demonstrate that a student has met established standards of skills or knowledge.

Criteria - When you hear the word criteria, think language. In performance assessment - of which writing is one example - we do not have "right" or "wrong" answers as we would on, say, a multiple-choice test. Instead, we have a continuum of performance that ranges from beginning levels through developing right on up to proficient. Criteria are the language or the descriptors that define levels of performance for each trait assessed.

Criterion-Referenced Test - In a criterion-referenced test, students are not compared to each other. Instead, each student's performance is measured against criteria that define success. If every student meets the criteria (standards) considered important, each student will be regarded as successful.

Essay Test - An essay test requires students to respond to a question (prompt) by writing original text.

Evaluation - An evaluation is a judgment about whether a behavior, product, program (or whatever) is or is not producing the desired results. Evaluations are usually based on multiple sources or information that might include surveys, test scores, observations, and many other sources.

High Stakes Testing - High stakes testing occurs whenever a major, significant decision with significant consequences is made on the basis of test results. Examples include promotion, certification, graduation, and denial of or access to learning opportunities.

Multiple-Choice Testing - A multiple-choice test is one in which students select the correct or best answer from several alternatives.

Norm-Referenced Test - In a norm-referenced test, a student's or group's performance is compared to that of students who are like them - a peer group known as the "norm group."

Objective Test - The term objective is a little misleading. It is often taken to mean "fair" or "free of human judgment." Actually, both impressions are a little off the mark. An objective test is one in which scoring procedures do not depend on human judgment (quite unlike, say, a writing assessment, in which human judgment of performance is the whole point). Usually, objective tests are multiple-choice and are machine scored. It is important to keep in mind, though, that while human judgment does not influence the scoring of objective (e.g., multiple-choice) tests, it is a very large factor in test construction - that is, in determining test content or test design. On a multiple-choice test, each item must be written by someone who decides (1) which content is worth testing, (2) how it will be tested, and (3) how both "correct" and "incorrect" responses will be worded. If we take a close, scrutinizing look at a multiple-choice test, we'll likely find that some questions were important to ask and were worded clearly with well-defined correct answers. In other cases, though, we might wonder whether a given question was worth asking in the first place - or a careful reading may show that it was worded in a confusing manner or that more than one option could be considered "correct". In short, "objective" has nothing to do with fairness or quality, but only with the way in which the test is scored.

Performance Assessment - Performance assessment is based on direct observation of a student's work (a writing sample) or process (the performance itself - say, a dive or an oral presentation). The quality of the performance is judged on the basis of clearly specified criteria that define what the given performance looks like at the beginning, developing, and proficient levels. Sound performance assessment is characterized by clear targets; a well-defined sense of purpose (how will we use results?); sound, thoroughly tested criteria that are known to everyone (including students); and quality tasks that are engaging, challenging (without demanding the impossible), and relevant to what we really want students to be able to do.

Portfolio - A portfolio is a purposeful collection of significant work, carefully selected, dated, and presented to tell the story of a student's achievement or growth in well-defined areas of performance (writing, reading, math, etc.). A portfolio usually includes personal analysis in which the student explains why each piece was chosen and what it shows about his/her growing skills and abilities.

Prompt - A prompt is a picture, word, phrase, sentence, or paragraph intended to generate ideas and give a student a starting point for writing. A prompt is just that - a stimulus. In most writing assessments, therefore, students are scored on the quality of the writing, therefore, not meticulous attention to following the directions of the prompt. For instance, a prompt might ask a student to write about a favorite or memorable place. A student might, as one did, write about the inside of his own mind - his imagination. Some people might argue that this is not a "place" in the sense that "New York" is a place. But such literal interpretation rarely seems as important as giving students every opportunity to respond creatively to a prompt and to show what they can do as writers.

Rater - A rater is a person who is trained to use criteria consistently and skillfully in assessing performance. Raters are most often teachers, but can also be professional whose work is relevant to the area being assessed (e.g., editors or journalists for writing performance) or parents with teaching or content area experience.

Reliability - Reliability is a measure of consistency - over time, over similar performances, or over raters. We would not want the scores on any performance to be simply a matter of chance! Good training and sound criteria help ensure that comparable performances will receive comparable scores - regardless of when the scoring occurs or which rater does the scoring. Sound performance assessments should guarantee reliability; otherwise, results are neither meaningful or useful.

Rubric - Rubric is another word for scoring guide.

Scoring Guide - Written criteria used to judge a particular kind of performance: e.g., writing, public, speaking, math problem solving. Criteria are the language that defines how performance looks at various levels: beginning, developing, and proficient.

Task - A task is simply the activity the student is required to do as part of an assessment. Sample task include completing a chemistry lab, preparing an argument of debate, writing a paper, or solving an open-ended math problem.

Task-Specific Scoring Guide - A task-specific scoring guide is designed for use in judging performance on a particular assignment - e.g., a literary analysis of The Helen Keller Story. (Compare this highly specific, focused approach to assessing, say, performance in writing.) Such scoring guides are not time efficient since a separate one must be developed for every task assessed. Generalizable scoring guides (guides that can be used with almost any assignment in a given content areas, such as math or writing) are preferred by most teachers.

Validity - Validity is an indication of how well an assessment actually measures what it is intended to measure. For example, a valid measure of writing focuses primarily on the writing, not the student's ability to read and interpret a difficult prompt.

*Reproduced with permission from Northwest Regional Education Laboratory