Achieving Human Level Partial Credit Grading of Written Responses to Physics Conceptual Question using GPT-3.5 with Only Prompt Engineering
Large language modules (LLMs) have great potential for auto-grading student written responses to physics problems due to their capacity to process and generate natural language. In this explorative study, we use a prompt engineering technique, which we name "scaffolded chain of thought (COT)", to instruct GPT-3.5 to grade student written responses to a physics conceptual question. Compared to common COT prompting, scaffolded COT prompts GPT-3.5 to explicitly compare student responses to a detailed, well-explained rubric before generating the grading outcome. We show that when compared to human raters, the grading accuracy of GPT-3.5 using scaffolded COT is 20% - 30% higher than conventional COT. The level of agreement between AI and human raters can reach 70% - 80%, comparable to the level between two human raters. This shows promise that an LLM-based AI grader can achieve human-level grading accuracy on a physics conceptual problem using prompt engineering techniques alone.
Physics Education Research Conference 2024
Part of the PER Conference series Boston, MA: July 10-11, 2024 Pages 97-101
ComPADRE is beta testing Citation Styles!
![]() <a href="https://www.compadre.org/portal/items/detail.cfm?ID=16878">Chen, Zhongzhou, and Tong Wan. "Achieving Human Level Partial Credit Grading of Written Responses to Physics Conceptual Question using GPT-3.5 with Only Prompt Engineering." Paper presented at the Physics Education Research Conference 2024, Boston, MA, July 10-11, 2024.</a>
![]() Z. Chen and T. Wan, , presented at the Physics Education Research Conference 2024, Boston, MA, 2024, WWW Document, (https://www.compadre.org/Repository/document/ServeFile.cfm?ID=16878&DocID=5945).
![]() Z. Chen and T. Wan, Achieving Human Level Partial Credit Grading of Written Responses to Physics Conceptual Question using GPT-3.5 with Only Prompt Engineering, presented at the Physics Education Research Conference 2024, Boston, MA, 2024, <https://www.compadre.org/Repository/document/ServeFile.cfm?ID=16878&DocID=5945>.
![]() Chen, Z., & Wan, T. (2024, July 10-11). Achieving Human Level Partial Credit Grading of Written Responses to Physics Conceptual Question using GPT-3.5 with Only Prompt Engineering. Paper presented at Physics Education Research Conference 2024, Boston, MA. Retrieved May 1, 2025, from https://www.compadre.org/Repository/document/ServeFile.cfm?ID=16878&DocID=5945
![]() Chen, Zhongzhou, and Tong Wan. "Achieving Human Level Partial Credit Grading of Written Responses to Physics Conceptual Question using GPT-3.5 with Only Prompt Engineering." Paper presented at the Physics Education Research Conference 2024, Boston, MA, July 10-11, 2024. https://www.compadre.org/Repository/document/ServeFile.cfm?ID=16878&DocID=5945 (accessed 1 May 2025).
![]() Chen, Zhongzhou, and Tong Wan. "Achieving Human Level Partial Credit Grading of Written Responses to Physics Conceptual Question using GPT-3.5 with Only Prompt Engineering." Physics Education Research Conference 2024. Boston, MA: 2024. 97-101 of PER Conference. 1 May 2025 <https://www.compadre.org/Repository/document/ServeFile.cfm?ID=16878&DocID=5945>.
![]() @inproceedings{
Author = "Zhongzhou Chen and Tong Wan",
Title = {Achieving Human Level Partial Credit Grading of Written Responses to Physics Conceptual Question using GPT-3.5 with Only Prompt Engineering},
BookTitle = {Physics Education Research Conference 2024},
Pages = {97-101},
Address = {Boston, MA},
Series = {PER Conference},
Month = {July 10-11},
Year = {2024}
}
![]() %A Zhongzhou Chen %A Tong Wan %T Achieving Human Level Partial Credit Grading of Written Responses to Physics Conceptual Question using GPT-3.5 with Only Prompt Engineering %S PER Conference %D July 10-11 2024 %P 97-101 %C Boston, MA %U https://www.compadre.org/Repository/document/ServeFile.cfm?ID=16878&DocID=5945 %O Physics Education Research Conference 2024 %O July 10-11 %O application/pdf ![]() %0 Conference Proceedings %A Chen, Zhongzhou %A Wan, Tong %D July 10-11 2024 %T Achieving Human Level Partial Credit Grading of Written Responses to Physics Conceptual Question using GPT-3.5 with Only Prompt Engineering %B Physics Education Research Conference 2024 %C Boston, MA %P 97-101 %S PER Conference %8 July 10-11 %U https://www.compadre.org/Repository/document/ServeFile.cfm?ID=16878&DocID=5945 Disclaimer: ComPADRE offers citation styles as a guide only. We cannot offer interpretations about citations as this is an automated procedure. Please refer to the style manuals in the Citation Source Information area for clarifications.
Citation Source Information
The AIP Style presented is based on information from the AIP Style Manual. The APA Style presented is based on information from APA Style.org: Electronic References. The Chicago Style presented is based on information from Examples of Chicago-Style Documentation. The MLA Style presented is based on information from the MLA FAQ. Achieving Human Level Partial Credit Grading of Written Responses to Physics Conceptual Question using GPT-3.5 with Only Prompt Engineering:Know of another related resource? Login to relate this resource to it. |
ContributeRelated MaterialsSimilar Materials |