Testing Assessment
1. Tests are developed or selected, administered 1. Information is collected from tests and other
to the class, and scored. measurement instruments (portfolios and
performance assessments, rating scales,
checklists and observations).
2. Test results are then used to make decisions 2. This information is critically evaluated and
about pupil (to assign a grade, recommend integrated with relevant background and
for an advanced program), instruction (repeat, contextual information.
review, move on), curriculum (replace,
revise), or other educational factors 3. The integration of critically analyzed test
results and other information results in a decision
about a pupil (to assign a grade, recommend for
an advanced program), instruction (repeat,review
move on), curriculum (replace,revise), or other
educational factors.
Types of Written Test
1. Verbal - emphasizes reading, writing, or speaking. Most tests in education are verbal tests.
2. Nonverbal - does not require reading, writing, or speaking ability. Tests composed of numerals or drawings are examples.
3. Objective - refers to the scoring of tests. when two or more scorers can easily agree on whether an answer is correct or incorrect, the test is an objective one. True-false, multiple-choice, and matching tests are the best examples.
4. Subjective - also refers to scoring. When it is difficult for two scores to agree on whether an item is correct or incorrect, the test is a subjective one. Essay tests are examples.
5. Teacher-made - tests constructed entirely by teachers for use in the teachers' classrooms.
6. Standardized - tests constructed by measurement experts over a period of years. They are designed to measure broad, national objectives, and have a uniform set of instructions that are adhered to during each administration. Most also have tables of norms, to which a student's performance may be compared to determine where the students stands in relation to a national sample of students at the same grade or age level.
7. Power - tests with liberal time limits that allow each student to attempt each item. Items tend to be difficult.
8. Speed - tests with time limits so strict that no one is expected to complete all items. Items tend to be easy.
Comparing NRTs and CRTs
Dimension NRT CRT
Average number of students
who get an item right 50% 80%
Compares a student's
performance to the performance of other
students stands indicative of mastery
Breadth of content sampled broad, covers many objectives narrow, covers a few
objectives
Comprehensiveness of content
sampled shallow, usually one or two
items per objective comprehensive, usually
three or more items per
objective
Variability since the meaningfulness of a
norm-referenced score basically
depends on the relative position
of the score in comparison with
other scores, the more variability
or spread of scores, the better the meaning of the score
does not depend on
comparison with other
scores: it flows directly
from the connection
between the items and the
criterion. Variability may
be minimal.
Item Construction Items are chosen to promote
variance or spread. Items that
are "too easy" or "too hard" are
avoided. One aim is to produce
good distractor options. Items are chosen to reflect
the criterion behavior.
Emphasis is placed upon
identifying the domain of
relevant responses
Reporting and interpreting
considerations Percentile rank and standard
scores used (relative rankings) Number succedding or
failing or range of acceptable
performance used(90%
proficiency achieved, 80%
of class reached 90%
proficiency