Grading Practices and Considerations of Graduate Students at the Beginning of their Teaching Assignment

Research shows that expert-like approaches to problem-solving can be promoted by encouraging students to explicate their thought processes and follow a prescribed problem-solving strategy. Since grading communicates instructors' expectations, teaching assistants' grading decisions play a crucial role in forming students' approaches to problem-solving in physics. We investigated the grading practices and considerations of 43 graduate teaching assistants (TAs). The TAs were asked to grade a set of specially designed student solutions and explain their grading decisions. We found that in a quiz context, a majority of TAs noticed but did not grade on solution features which promote expert-like approaches to problem-solving. In addition, TAs graded differently in quiz and homework contexts, partly because of how they considered time limitations in a quiz. Our findings can inform professional development programs for TAs.


INTRODUCTION
Problem-solving (PS) plays a central role in physics teaching. Research has shown that instruction can promote expert-like approaches to PS by encouraging students to follow a prescribed PS strategy that explicates the tacit PS processes of an expert, [1] including: 1) describing the problem situation in physics terms; 2) planning the construction of a solution; and 3) evaluation. Research has also shown that instruction can foster learning domain knowledge through PS by encouraging students to articulate their reasoning, reflect and self-explain how domain concepts and principles were applied to solve a problem, acknowledge differences between their own and others' approaches to a problem, and attempt to resolve arising conflicts [2]. Thus, within an instructional approach based on formative assessment [3], grading should reward explication of reasoning and the use of a prescribed PS strategy.
A central way to influence grading practices in a physics classroom is through graduate TAs, both because TAs are often responsible for grading students' work and because TAs are often required to participate in professional development (PD) programs. PD should be based on research about the beliefs and practices of TAs. As one piece of this research, we studied 43 graduate TAs enrolled in a PD program at the University of Pittsburgh. In this context, we investigated: What are TAs' grading practices? Which features do they consider when grading? What are their reasons for weighing solution features?

METHODOLOGY
Data collection took place at the beginning of the TAs' teaching career, within the first month of a PD program conducted by a PER faculty member during the fall semester. TAs filled out a worksheet designed to encourage introspection regarding instructional choices related to grading [4,5]. The worksheet asked TAs to make judgments about a set of solutions designed to reflect both common student responses to a context-rich physics problem (see Fig. 1) as well as expert-like and novice approaches. Here we focus on two of the five solutions (see Fig. 2). Clearly incorrect aspects of the solutions are indicated by boxed notes. The TAs graded the student solutions for both homework and quiz contexts. For each solution, they were asked to list characteristic features and explain how and why they weighed those features to obtain a score (see Fig. 3).

FIGURE 1. Problem Statement
We suggest that the reader examine the student solutions and think about how to grade them. The student solutions were designed to reflect expert and novice approaches to PS and to trigger conflicting instructional considerations in assigning a grade. For example, in comparing solution SSD to SSE, note that both include the feature of a correct answer. However, only SSD includes a diagram, articulates the principles used to find intermediate variables, and provides clear justification for the final result. In contrast, SSE is brief with no explication of reasoning. However, the elaborated reasoning in SSD reveals two canceling incorrect calculations, involving misreading of the problem situation as well as misuse of energy conservation to imply circular motion with constant speed. In contrast, SSE, being very brief, does not give away any evidence for mistaken ideas, even though the student might be guided by a similar thought process as Student D. Thus, TAs' grading of SSE and SSD could reveal to what extent they encourage the use of a prescribed PS strategy and showing reasoning explicitly.
Data analysis involved coding the solution features listed by TAs in the worksheets (see Fig. 3) into a combination of theory-driven and emergent categories. The features were also coded for whether they were merely mentioned or weighed in grading. For example, the sample TA listed "no figure" as a feature in SSE, but when assigning a grade, did not refer to this feature when explaining how s/he obtained a score. We identified 21 features that were grouped into 5 clusters.
As shown in Table 1, cluster 1 (C1) included both features related to initial problem analysis as well as evaluation of the final result. C2 involves features related to explication of reasoning (i.e., articulation and justification of principles). We consider that TAs who grade on C1 and C2 are encouraging students to follow a prescribed PS strategy. C3 includes domain-specific features, such as invoking relevant physics concepts and principles and applying them properly. C4 includes features related to elaboration which emerged during the coding process. These features were not assigned to the "explication" category as they were imprecise (e.g., "written statements" could be interpreted to mean articulation of principles or simply a written explanation of the physical setup). Features in C4 could be productive, counterproductive, or neutral in encouraging expert-like PS approaches (assigned +, -, 0 respectively). For example, grading for conciseness could transmit a message to the students that physics problems should be solved with little detail (assigned as (-) for being counterproductive), while grading for written statements could transmit a message that explication of the thought process is important for learning from PS (assigned (+) for being productive). Finally, C5 focuses on correctness of algebra and final answer. TAs who give a large weight to these features may transmit a message to the student that the final result is acceptable without justification. Visual representation; articulating the target variables and known quantities (e.g, "knowns/ unknowns"); evaluation of the reasonability of the final answer (e.g., "check")

Grading Practice
We found that in a quiz context, TAs graded a solution which provides minimal reasoning while possibly obscuring physics mistakes (SSE) higher than a solution which shows detailed reasoning and includes canceling physics mistakes (SSD). In the quiz context, many more TAs graded SSE>SSD (N=28, 65%) compared to SSD>SSE (N=10, 23%), transmitting a message that is counterproductive to promoting the use of prescribed PS strategies and providing explication of reasoning. We found a similar gap in the HW context, although the gap is somewhat softened: 58% of TAs (N=25) graded SSE>SSD while 35% (N=15) graded SSD>SSE. In a quiz context, TAs graded SSE significantly higher than SSD (<SSE>=8.3 compared to <SSD>=7.1, p-value calculated by a t-test: 0.010) while in a homework, the averages are comparable (<SSE>=7.1 and <SSD>=6.7).

Features considered in grading
In order to quantitatively represent the features weighed by groups of TAs who are likely to have differing considerations in grading, we display the distribution of features mentioned and graded on by TAs who graded SSE>SSD and TAs who graded SSD>SSE, overlooking SSE=SSD. These distributions for the quiz (Q) and homework (HW) contexts vary depending on the solution as shown in Table 2.
We found a significant gap between the percentage of TAs who mentioned features from clusters which promote prescribed PS strategies and the percentage of TAs who graded on these features. This gap is more evident in the SSE>SSD group, in the quiz as well as in the HW context. Regarding cluster C1, (problem description and evaluation), 20% or less of the TAs stated that they grade on these features in both SSE and SSD. Also, slightly more TAs who graded SSD>SSE than TAs who graded SSE>SSD considered C1 when grading (13%-20% compared to 4%-8%). Many more TAs mentioned this cluster even though they did not consider it in their grading (46% in SSE>SSD group, 80% in SSD>SSE group). We conclude that even though TAs mentioned the cluster of problem description and evaluation, they refrained from grading on it regardless of whether it is missing (as in SSE) or present (as in SSD).
Regarding cluster C2, which involves explication, there is a lot of similarity between the SSD>SSE and the SSE>SSD groups: both refrained from grading on this cluster in the quiz context (~10%) or in the HW context (~25%) for SSE. A larger portion of TAs stated that they grade on this cluster in SSD (25%-30%) than in SSE (10%-11%) on a quiz. Similar to C1, many more TAs mentioned this cluster even though they did not consider it in their grading.
Cluster C4+ relates also to explication, however, in an ill-defined manner (see Table 1). Similar to C1 and C2, many more TAs noticed features from C4+ than graded on these features. However, the difference between the two groups becomes more prominent. In the SSD>SSE group in the quiz (60%) as well as HW (53%) context more than half of the TAs graded on this cluster in SSE, while much fewer graded on it in the SSE>SSD group (18%-28%). When grading SSD fewer TAs graded on this cluster.
This last result can be interpreted to indicate that TAs may use a subtractive grading scheme, taking points from SSE for missing explanations (C4+), but not weighing this feature in grading SSD, where it is represented. Using a subtractive grading scheme is evident also from analyzing other clusters that are most prominent in TAs' grading: domain knowledge (C3) and correctness (C5) (see italicized percentages in Table 2). Over 70% of all TAs graded on physics knowledge in SSD, where physics concepts and principles are inadequately applied. However, fewer TAs said that they grade on domain knowledge in SSE. Additionally, ~50% of all TAs graded on correctness (errors) in SSD and few (less than 20%) on SSE.

Reasons for grading
We noted previously that we found a difference in TAs' grading practices in the HW and quiz contexts, where TAs are more inclined to insist on explication of reasoning in the HW as compared to the quiz context. However, we did not find significant differences in TAs' grading of clusters in these two contexts. To understand this discrepancy, we examined TAs' reasons for weighing different solution features (listed in the right hand column in the worksheet they completed, see Fig. 3). We focus here on TAs' grading of SSE in a quiz context. The reasons were coded in a bottom-up manner, resulting in the four categories shown in Table 3. Table 3 shows that the difference in grading may stem from TAs' consideration of evidence of students' thought processes, consideration of time limitations in a quiz, or their preference for aesthetics (physics problems should be solved in a brief, condensed way).

DISCUSSION AND SUMMARY
An analysis of TAs' grading practices and considerations at the beginning of their teaching assignment reveals the following:  In a quiz context, a majority of TAs gave a higher grade to a solution that provides minimal reasoning while possibly obscuring physics mistakes as compared to a solution providing reasoning that reveals canceling mistakes. Their grading did not encourage students to use prescribed PS strategies nor to explicate their reasoning process. This tendency softened somewhat in a homework context.
 While TAs' grading differs in quiz and HW context, there is little difference in the solution clusters they considered in both contexts. Many TAs were aware of features related to explication and prescribed PS strategies, but few graded on these same features.  TAs' grading approaches indicate that they used a subtractive scheme. Most TAs considered domain knowledge (C3) and correctness (C5) to a larger extent than other clusters and when using a subtractive scheme, they often only recognized errors rather than missing justifications. In turn, their grading may transmit a message that explication of problem description, planning of the solution, and evaluation are not required in students' solutions.  The difference between HW and quiz grading may stem from how TAs considered time limitations in a quiz as a reason for accepting brief answers as adequate evidence of students' thought processes.
The results of this study concerning TAs' grading practices and underlying considerations are consistent with prior work on TAs' practices and considerations when designing example solutions for students [5]. In Ref. [5], the majority of TAs' own solutions included neither explication of reasoning nor a reasonability check of the final answer. Similarly, in the grading study, very few TAs grade on articulation and justification of principles (C2) and checking of the final answer (C1). This suggests that TAs neither design example solutions nor grade student solutions in a manner which promotes the use of expert-like PS strategies to help students learn from PS.
Since this investigation took place at the beginning of the TAs' teaching career, the results can serve to inform PD activities to prepare TAs for their grading responsibilities. As in other learning environments, PD should also elicit TAs' ideas and allow them to reflect and try to resolve conflicting ideas and approaches to physics instruction. The conflicts between the features that TAs are aware of, the features that they grade on, and the differences in how TAs consider adequate evidence of students' thought processes in different settings could serve as fruitful starting points for such discussions. In this way, TAs can be guided to implement grading practices that promote the development of expert-like approaches to PS.