assessment of learning 2: June 2013

Testing Assessment
1. Tests are developed or selected, administered 1. Information is collected from tests and other
to the class, and scored. measurement instruments (portfolios and
performance assessments, rating scales,
checklists and observations).

2. Test results are then used to make decisions 2. This information is critically evaluated and
about pupil (to assign a grade, recommend integrated with relevant background and
for an advanced program), instruction (repeat, contextual information.
review, move on), curriculum (replace,
revise), or other educational factors 3. The integration of critically analyzed test
results and other information results in a decision
about a pupil (to assign a grade, recommend for
an advanced program), instruction (repeat,review
move on), curriculum (replace,revise), or other
educational factors.

Types of Written Test
1. Verbal - emphasizes reading, writing, or speaking. Most tests in education are verbal tests.
2. Nonverbal - does not require reading, writing, or speaking ability. Tests composed of numerals or drawings are examples.
3. Objective - refers to the scoring of tests. when two or more scorers can easily agree on whether an answer is correct or incorrect, the test is an objective one. True-false, multiple-choice, and matching tests are the best examples.
4. Subjective - also refers to scoring. When it is difficult for two scores to agree on whether an item is correct or incorrect, the test is a subjective one. Essay tests are examples.
5. Teacher-made - tests constructed entirely by teachers for use in the teachers' classrooms.
6. Standardized - tests constructed by measurement experts over a period of years. They are designed to measure broad, national objectives, and have a uniform set of instructions that are adhered to during each administration. Most also have tables of norms, to which a student's performance may be compared to determine where the students stands in relation to a national sample of students at the same grade or age level.
7. Power - tests with liberal time limits that allow each student to attempt each item. Items tend to be difficult.
8. Speed - tests with time limits so strict that no one is expected to complete all items. Items tend to be easy.

Comparing NRTs and CRTs

Dimension                                           NRT                                                 CRT

Average number of students
who get an item right                             50%                                             80%

Compares a student's
performance to                                   the performance of other
                                                               students                                         stands indicative of mastery

Breadth of content sampled               broad, covers many objectives         narrow, covers a few
                                                                                                                        objectives

Comprehensiveness of content
sampled                                              shallow, usually one or two
                                                            items per objective                          comprehensive, usually
                                                                                                                     three or more items per
                                                                                                                      objective

Variability                                          since the meaningfulness of a
                                                           norm-referenced score basically
                                                           depends on the relative position
                                                           of the score in comparison with
                                                            other scores, the more variability
                                                           or spread of scores, the better              the meaning of the score
                                                                                                                        does not depend on
                                                                                                                        comparison with other
                                                                                                                        scores: it flows directly
                                                                                                                        from the connection
                                                                                                                        between the items and the
                                                                                                                        criterion. Variability may
                                                                                                                        be minimal.

Item Construction                             Items are chosen to promote
                                                          variance or spread. Items that
                                                          are "too easy" or "too hard" are
                                                          avoided. One aim is to produce
                                                          good distractor options.                      Items are chosen to reflect
                                                                                                                     the criterion behavior.
                                                                                                                     Emphasis is placed upon
                                                                                                                    identifying the domain of
                                                                                                                    relevant responses

Reporting and interpreting
considerations                              Percentile rank and standard
                                                     scores used (relative rankings)            Number succedding or
                                                                                                                  failing or range of acceptable
                                                                                                                  performance used(90%
                                                                                                                  proficiency achieved, 80%
                                                                                                                   of class reached 90%
                                                                                                                   proficiency

assessment of learning 2

Sunday, June 30, 2013

THE DISTINCTION BETWEEN TESTING AND ASSESSMENT