Transforming holistic assessment and grading into a vehicle for complex learning

114Citations
Citations of this article
88Readers
Mendeley users who have this article in their library.
Get full text

Abstract

One of the themes running through my work since 1980 has been that students need to develop the capacity to monitor the quality of their own work during its actual production. For this to occur, students need to appreciate what constitutes work of higher quality; to compare the quality of their emerging work with the higher quality; and to draw on a store of tactics to modify their work as necessary. In this chapter, this theme is extended in two ways. The first is an analysis of the fundamental validity of using preset criteria as a general approach to appraising quality. The second is a teaching design that enables holistic appraisals to align pedagogy with assessment. For the purposes of this chapter, a course refers to a unit of study that forms a relatively self-contained component of a degree program. A student response to an assessment task is referred to as a work. The assessed quality of each work is represented by a numerical, literal or verbal mark or grade. Detailed feedback from the teacher may accompany the grade. For the types of works of interest in this chapter, grades are mostly produced in one of two ways. In analytic grading, the teacher makes separate qualitative judgments on a limited number of properties or criteria. These are usually preset, that is, they are nominated in advance. Each criterion is used for appraising each student's work. The teacher may prescribe the criteria, or students and teachers may negotiate them. Alternatively, the teacher may require that students develop their own criteria as a means of deepening their involvement in the assessment process. In this chapter, how the criteria are decided is not important. After the separate judgments on the criteria are made, they are combined using a rule or formula, and converted to a grade. Analytic grading is overtly systematic. By identifying the specific elements that contribute to the final grade, analytic grading provides the student with explicit feedback. The template used in implementing the process may be called a rubric, or any one of scoring, marking or grading paired with scheme, guide, matrix or grid. As a group, these models are sometimes referred to as criterion-based assessment or primary trait analysis. In holistic or global grading, the teacher responds to a student's work as a whole, then directly maps its quality to a notional point on the grade scale. Although the teacher may note specific features that stand out while appraising, arriving directly at a global judgment is foremost. Reflection on that judgment gives rise to an explanation, which necessarily refers to criteria. Holistic grading is sometimes characterised as impressionistic or intuitive. The relative merits of analytic and holistic grading have been debated for many years, at all levels of education. The most commonly used criterion for comparison has been scorer reliability. This statistic measures the degree of consistency with which grades are assigned to the same set of works by different teachers (inter-grader reliability), or by the same teacher on separate occasions (temporal reliability). Scorer reliability is undoubtedly a useful criterion, but is too narrow on its own. It does not take into account other factors such as the skills of the markers in each method, or the extent to which each method is able to capture all the dimensions that matter. The use of analytic grading schemes and templates is now firmly established in higher education. Internationally, rapid growth in popularity has occurred since about 1995. Nevertheless, the basic ideas are not new. Inductively decomposing holistic appraisals goes back at least to 1759, when Edmund Burke set out to identify the properties that characterise beautiful objects in general. In the forward direction, the theory and practice of assembling overall judgments from discrete appraisals on separate criteria has developed mostly over the last 50 years. It has given rise to an extensive literature touching many fields. Key research areas have been clinical decision making (Meehl, 1954/1996) and human expertise of various types (Chi, Glaser & Farr, 1988; Ericsson & Smith, 1991). The terminology used is diverse, and includes 'policy capturing' and 'actuarial methods'. Specifically in educational assessment, Braddock, Lloyd-Jones, and Schoer (1963) reported early developmental work on analytic approaches to grading English composition, and the rationale for it; primary trait scoring is described in Lloyd-Jones (1977). Researchers in higher education assessment have explored in recent years the use of criteria and rubrics, specifically involving students in self-and peer-assessment activities (Bloxham & West, 2004; Orsmond, Merry & Reiling, 2000; Rust, Price & O'Donovan, 2003; Woolf, 2004). Many books on assessment in higher education advocate analytic grading, and provide practitioners with detailed operational guidelines. Examples are Freeman and Lewis (1998), Huba and Freed (2000), Morgan, Dunn, Parry, and O'Reilly (2004), Stevens and Levi (2004), Suskie (2004), and Walvoord and Anderson (1998). For the most part, both the underlying principles and the various methods of implementation have been accepted uncritically. In this chapter, the sufficiency of analytic grading as a general approach for relevant classes of student works is called into question, on both theoretical and practical grounds. The basic reason is that it sets up appraisal frameworks that are, in principle, sub-optimal. Although they work adequately for some grading decisions, they do a disservice to others by unnecessarily constraining the scope of appraisals. The assumption that using preset criteria is unproblematic has had two inhibiting effects. First, teachers typically have not felt free to acknowledge, especially to students, the existence or nature of certain limitations they encounter. Second, there has been little or no imperative to explore and develop alternative ways forward. The theme of this chapter is developed around five propositions. The first four are dealt with relatively briefly; the fifth is assigned a section of its own. The driving principle is that if students are to achieve consistently high levels of performance, they need to develop a conceptualisation of what constitutes quality as a generalised attribute (Sadler, 1983). They also need to be inducted into evaluating quality, without necessarily being bound by tightly specified criteria. This approach mirrors the way multi-criterion judgments are typically made by experienced teachers. It is also an authentic representation of the ways many appraisals are made in a host of everyday contexts by experts and non-experts alike. Equipping students with evaluative insights and skills therefore contributes an important graduate skill. All five propositions are taken into account in the second half of the chapter, which outlines an approach to the assessment of complex student productions.

Cite

CITATION STYLE

APA

Sadler, D. R. (2009). Transforming holistic assessment and grading into a vehicle for complex learning. In Assessment, Learning and Judgement in Higher Education (pp. 45–63). Springer Netherlands. https://doi.org/10.1007/978-1-4020-8905-3_4

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free