On Grading Exams: Tips for Large Courses in the Humanities

(Based on "On Grading Exams: Procedural Suggestions for Large Courses" By William Bikales) Revised 2009.

Grading exams is both a very important and an extremely time consuming part of most courses, especially for large courses with correspondingly large teaching staffs. It raises significant pedagogical issues, among them grading standards -- what qualities of work instructors are looking for in exams, what constitutes A work, B work, C work -- as well as fairness and consistency. In most courses instructors look for ways to arrive at consistent standards across the course while also validating the work of the Teaching Fellows and respecting their judgment as teachers.

Each course sets its own standards and methods for grading exams. Grading works best when:

  • Courseheads involve themselves in setting standards and in establishing with the TFs the importance of grading to the course. This should be done early in the term.
  • Courseheads and Head TFs establish grading procedures -- whether TFs grade their own or others' exams, whole exams or parts of exams, whether there should be grading meetings, grading teams, curves, and so forth -- early and announce them both in TF meetings and in writing.
  • Meetings to establish grading standards are held before exams so that what constitutes an A, a B, or a C essay, the expected grade distributions, and what emphasis should be placed on originality, style, facts are all well-understood. A good way to accomplish this is to look at last year's exams to see how this year's TF would grade them.
  • Either the professor or grading teams establish specific criteria for each question.
  • Grading meetings are held in the middle of the exam grading process, to check on problematic questions and consistency across the course. At these meetings TFs can compare their own grading others' and can raise questions about particular responses they are uncertain about. It is often useful to bring in four or five ungraded bluebooks and ask each TF to read them and write down what grade they would give them. It is also helpful to assign new and experienced TFs to the same team and to remind TFs to grade in pencil before the meetings, should they wish to change grades afterwards.

Despite its inherent difficulties, grading in large courses can be made fairer for the students and less stressful for TFs and course heads by planning ahead. Grading does not ?take care of itself.? It does, however, respond well to measures such as those outlined in this tipsheet.

Important Issues in Grading


Consistency
in order to ensure that students are treated roughly the same throughout the course is one of the most important issues in grading. Inconsistency can occur because of natural and inevitable differences among TFs in their approaches to the material, the weights they put on different aspects of the work, their views on whether a B+ is a good or a mediocre grade. Courses will deal differently with this issue, some emphasizing above all else the need for consistency, others emphasizing the autonomy of TFs and thus tolerating different grading outcomes.

Grade inflation is a related problem, and is most likely to occur when Teaching Fellows grade their own students. The Head TF will have to keep an eye out for TFs' tendencies to grade their own students higher than others'.

Problem TFs are rare but can nonetheless be disruptive, particularly when they fail to work with the team and abide by group decisions. Examples of problems include not putting comments on exams, failing to attend TF meetings, grading much higher or lower than others and refusing to adjust, handing back bluebooks late. Problems occur for several reasons, such as a TF feeling too strongly about her own standards, a TF who does not care about consistency and disregards the group's guidelines, or a TF who simply does not prioritize teaching versus his other responsibilities. The Head TF should try first to reason with these TFs, but may need to call in the professor in difficult cases. In all cases the Head TF should document each step involved in dealing with such a TF.

Regrading policies need to be determined in advance. Large courses tend to be more inflexible about regarding than smaller courses, because of the sheer logistics of regarding many exams. One strategy is to tell students that if one question on the exam is reviewed, all answered will be reviewed, and that the grade may go down as well as up.

Section participation grades may be viewed as an individual TF's business to be used as an adjustment mechanism in the final grade, or as a subject to be discussed among the TFs and administered consistently.

Despite its inherent difficulties, grading in large courses can be made fairer for the students and less stressful for TFs and course heads by planning ahead. Grading does not ?take care of itself.? It does, however, respond well to measure such as those outlined in this pamphlet.

Grading Specifics: Pros and Cons


Three grading questions that a large course must consider are:

  • Should Teaching Fellows grade their own students' work?
  • Should each TF grade whole exams or only individual questions?
  • Should there be grade distribution guidelines, and if so, how detailed should they be?

None of these questions permits a clear-cut and unambiguous answer, but the advantages and disadvantages listed below can help guide your decisions on each.

Question 1: Should Teaching Fellows grade their own students' work?


Advantages

  1. TFs are better acquainted with their own students' work, and so potentially assess the work more fairly and completely.
  2. If TFs have been allowed some leeway in the content of sections, they are likely to better understand and be more able to evaluate their own students' work.
  3. It's much simpler administratively.
  4. It strengthens TFs' feelings of responsibility for their sections.

Disadvantages

  1. May lead to inconsistent grading between sections, even when grading guidelines are given.
  2. It is more difficult for TFs to grade objectively when they know the students.
  3. Requests for regrades will involve the students' own TF, which lead to awkwardness.
  4. There is a possible tendency toward grade inflation. Some TFs find it harder to be strict with their own students, or may unconsciously give their students higher grades to reflect better on their teaching, etc. (If TFs grade others' students, then one sections' grades coming out higher than the average can more confidently be attributed to a difference in quality of work.)

Overall comments and recommendations

It is always possible to let TFs grade their own students in some parts of their work and not in others; for example, in many courses they grade their own students' papers but not exams. But for any one piece of work (e.g., for exams), it is better to choose one of the two systems.

If TFs grade their own students' work, fairly strict grading guidelines are indicated and grading meetings are crucial: curve alone does not provide the necessary leverage to ensure everyone is grading in roughly the same fashion. Suggesting that TFs make every effort not to look at the students' name before grading is not foolproof, but can help.

Question 2: Should TFs grade whole exams or specialize in parts of the exam?


Advantages of TFs Grading Whole Exams

  1. TFs can compensate for what they feel is harsh grading in one part of the exam by being a bit generous in another. More generally, they may develop an overall sense of the quality of the work and balance the partial grades to add to the appropriate total.
  2. It's administratively easier.
  3. It's possible to use a final-grade distribution guideline to get a desired distribution. (In contrast, if different TFs grade each part then variations in grading styles and criteria produces a tendency for overall grades to clump in the middle-B range.)
  4. As they graded all portions of the exam, TFs, will be better able to answer students' questions about all questions.

Disadvantages

  1. TFs may not see enough blue books to compare answers carefully, particularly if students are offered a choice of questions in some sections. For example, with four different choices on an essay question, someone grading 25 whole exams may get only two or three students writing about one particular choice.
  2. It is more difficult for TFs to prepare carefully for grading the entire exam that for a specific part. It is also more difficult for TFs to split into groups and discuss grading criteria and standards for a the whole exam than for just a few questions.
  3. Each student's grade is completely dependent on one grader.

Overall comments and recommendations

Many courses opt for having groups of TFs each grade a specific part of the exam. If you follow this route, make sure that each group informs the others about their criteria and the issues that arose in grading that question so that each TF can explain the criteria to students when they hand back exams. You can do this or by having each group prepare a simple written summary including what they looked for, common errors, etc.

Question 3: Should there be grade distribution guidelines? If so, how detailed and how strict ? simply a suggested mean, or a full distribution?


Advantages of Having Grade Distribution Guidelines

  1. This is one way to avoid inconsistency in grading standards. Otherwise some TFs may ?aim? at an A- average, others at a B-; some may give no C's etc. (And faced with such inconsistency after the fact it can be difficult not to arbitrarily raise some TFs' grades, resulting in grade inflation.)
  2. It may force graders to focus on the difficult matter of distinguishing carefully between B and B+, between B+ and A-, between C+ and B-, and so forth.

Disadvantages

  1. If applied too rigidly, distribution guidelines may unfairly force students' grades to fit a distribution which is not appropriate.
  2. TFs may resent a constraint on their freedom in grading.
  3. It is more complex administratively.

Overall comments and recommendations

Nobody likes grade distributions, but grading guidelines are essential if you wish to avoid inconsistency and grade inflation. Guidelines should be no stricter than necessary, in order to encourage that they be observed. If it appears that everyone understands and accepts the need to grade according to common standards then you might try a very rough guideline -- a suggested mean and a reminder that there ought to be some spread around the mean. If not, one might try a full curve.

Guidelines should come from the professor. Some professors may say that they do not believe in curves -- and they would be delighted if everyone gets an A or disappointed if everyone gets a C, but neither possibility should be ruled out ahead of time. Such remarks are not very helpful for providing guidelines for TFs. In this case the Head TF should try to elicit from the professor a sense of what the grades mean -- what sort of work is an A, the difference between an A- and a B+, a B- and a C+, and so on.

One solution is to start with a suggested mean -- generally a B/B+ -- and agree that this will be the basic grade for people who did okay with no egregious errors, etc., but were not particularly impressive. But even this solution has problems: some graders give almost everyone grades near the mean, just to avoid trouble. This problem is exacerbated when grading by committee. People tend not to give high or low grades in order to avoid having to justify them to the other graders.

A course that uses distribution guidelines in grading each part of the course and/or each section of the exam should be prepared to assign final course grades according to distribution expectations as well. Otherwise, the composite grades are likely to be bunched around the mean. The key to grading during the term should be to evaluate all students accurately relative to each other. Once that is done you can easily adjust the grades at the end.

Conclusion


Despite the many difficulties that may arise, grading in large courses can be made more fair for the students and less stressful for TFs and professors alike by planning ahead. Grading does not take care of itself. It does, however, respond well to measures that Head TFs can undertake with the support of the faculty.

The word consistency shows up throughout this tipsheet because it is so important a goal. Undergraduates are extremely sensitive to issues of fairness and will be justifiably upset at even a hint that mere chance (that is, which section they happen to find themselves in) has more influence on their grades than their own efforts. It is both a matter of equity for them and of peace of mind for the Head TF that grading consistency be a high priority for any large course.

Despite the best preparations, things can still go wrong. If consultations with other members of the course staff or the course head prove unable to resolve the problem two resources you can turn to are the Head Teaching Fellow Network supported by the Derek Bok Center for Teaching and Learning, and the Bok Center itself. At the very least, Head TFs will find that they are not alone in facing the challenge of coordinating the difficult enterprise of grading.

Letter Grades: (Information for Faculty Offering Instruction in Arts and Sciences 2009-2010)

  • A, A- Earned by work whose excellent quality indicates a full mastery of the subject and, in the case of the grade of A, is of extraordinary distinction.
  • B+, B, B- Earned by work that indicates a good comprehension of the course material, a good command of the skills needed to work with the course material, and the student's full engagement with the course requirements and activities.
  • C+, C, C- Earned by work that indicates an adequate and satisfactory comprehension of the course material and the skills needed to work with the course material and that indicates the student has met the basic requirements for completing assigned work and participating in class activities.
  • D+, D, D- Earned by work that is unsatisfactory but that indicates some minimal command of the course materials and some minimal participation in class activities that is worthy of course credit toward the degree.
  • E Earned by work which is unsatisfactory and unworthy of course credit towards the degree.

Appendix: Grading Procedures in Various Courses


Foreign Cultures 48: China's Cultural Revolution

(35 TO 40 SECTIONS)


Prior to the exam:
The Professor and Head TF make up the exam; other TFs don?t see it in advance, so they?re not biased when interacting with students. The staff agrees on a mean grade. The Head TF sets up grading groups, each assigned to grade one section of the exam: an I.D. group, a short essay group, two long essay groups. Within each group, TFs grade a number of exams commensurate with the number of sections each taught. The Professor prepares outline answers for the groups, meant as guidelines and not as model answers.

Grading Process: The groups meet to go over the exam questions, the professor's suggested answers, and likely problems; then the TFs each take bluebooks to grade. Partway through grading they meet as a group to look at common mistakes, unanticipated interpretations of the questions, and patterns that emerge from the answers (e.g. most students emphasizing one part of the material and neglecting another). TFs share problem bluebooks, and pass around graded bluebooks to compare A's, B's, C's and overall mean grades for consistency.

Compiling Grades: After all questions on all bluebooks have been graded, each TF compiles his or her own students' exams, adds up their grades, and fills out a sheet to report the total grade to the Head TF. Each TF keeps a written record of the exams they graded, in case of lost bluebook or math error.

Literature and Arts C-43: The Medieval Court

(10-15 sections)

  1. Each TF is responsible for grading all the work of his or her students: papers, midterms and finals.
  2. The professor makes up the paper topics and exams from TF suggestions. After the exam has been prepared, the teaching staff meets to go over each question, discussing both the exceptional and the minimal answers. The TF who has designed a particular question will provide a spectrum of answers expected. Anticipated variations due to differences in what was covered in sections are discussed by the group. All TFs contribute to a discussion of how they expect their students will perform on the exam.
  3. TFs who have had experience teaching the course previously discuss past grade averages and the group establishes a general idea of where the grades are likely to fall. The Head TF invites section leaders who are having difficulty with a particular exam to discuss it with him or her.
  4. TFs grade their own students' exams and hand in grades to the Head TF, who completes the grade sheet and prepares a grade distribution sheet for the instructor.
  5. After grading has been completed, the group discusses the grades given. If there is some indication of great variation within the group- an inordinate number of high or low grades ? the instructor or the Head TF discusses this with the TF in question.
  6. The Head TF remains available to TFs for problems in grading. If the staff feels it would be beneficial, an associate form the Bok Center attends or facilitates a staff meeting around particular grading concerns.

Historical Study A-13: China

(10-18 sections)

  1. On the final exam students write three essays, out of five possibilities. They write their essays in individual bluebooks, which are sorted by essay number after the exam.
  2. TFs divide up the essays, generally according to the number of sections they have taught. TFs usually grade one essay question, although occasionally a batch of essays may cover two questions.
  3. TFs are paired by essay question. After they have read over their batch of essays once they meet to discuss criteria for the essay.
  4. After initial grading, TFs exchange essays with their partner for a second grading. In the vast majority of cases, the grades are the same or only slightly different. If there is a major discrepancy in the grading of any of the essays, the TFs discuss the particular essay and resolve the grade.

Literature & Arts C-14: The Concept of the Hero in Hellenic Civilization

(30-40 sections)
  1. Before the exam has been created, TFs meet to discuss how they might grade potential questions ? what criteria they might use, what they would be looking for, what an A- or a B grade might be.
  2. A committee of TFs helps the professor to devise the exam. They provide all the TFs with a basic outline for grading procedures; specific point values, correct answers for the objective section of the exam, and general expectations for student essays.
  3. TFs grade their own sections, in order to provide a continuity of expectations for their students.
  4. A grading meeting is held midway through the grading process. TFs discuss what kinds of answers they are getting for particular questions, what their average or mean grades are, difficulties they are having with particular bluebooks. Uniform expectations for each question become clearer at this meeting; a coherent policy is formulated to apply to all sections.
  5. A grading meeting is held at the end of the grading process. TFs discuss individual cases, read each other's problem exams, and hand in their grade lists. The Head TF notes if anyone's grades are out of line with the others' and talks with that TF about the difference. The meeting is open-ended and to some degree social, thus allowing people to read answers that exemplify particular points and to work through difficulties.
See also: Grading