A critical examination of “expert-like” in physics education research

The growing push to address the lack of diversity in physics has come with an array of curriculum reforms and interventions. There has been work in Physics Education Research (PER) that has supported these reforms, including studying the experiences and identity development of students from minoritized backgrounds. However, there has been a lack of critical reﬂection on the core methodologies and constructs used in PER. Here, we present a critical analysis of qualitative and quantitative work used to deﬁne and measure “expert-like” thinking, beliefs, and practices in physics. We show that this work has largely omitted any consideration of race or cultural backgrounds of participants, instead deﬁning “experts” as either physics faculty or Ph.D. holders. Research in critical theory demonstrates that failing to intentionally address potential biases tends towards reinforcing those biases. Thus, work in PER on expert-like thinking may unintentionally replicate, rather than challenge, existing biased structures in physics. We conclude with recommendations for constructing more inclusive views of what it means for students to develop “expert-like” thinking.


I. INTRODUCTION
Alongside growing calls to make physics more inclusive at both classroom and structural levels [1,2], Physics Education Research (PER) is building a growing body of work on the status of minoritized groups in physics classrooms and how to implement positive institutional change. This work has identified that individuals from minoritized groups are often leaving physics because they are made to feel as though they don't belong, despite demonstrating academic performance as good (or even better than) their peers from dominant groups [3]. We call for critical reflection within the PER community on how our our methods may perpetuate this marginalization. Critical reflection is important to identify and question deeply-held assumptions (e.g., [4][5][6][7]) -without critical reflection, we risk (unintentionally) reinforcing systems of oppression and bias, even within efforts to intentionally improve equity and remove barriers to participation.
While there are efforts to use critical pedagogy [8] to transform physics and astronomy instruction, there have been few efforts to apply principles of critical reflection or critical analysis to PER methods themselves. What little exists has focused on critiquing PER-informed pedagogy and practices [9,10] and the demographics of students included in PER studies [11], introducing QuantCrit approaches in data analysis [12], and applying intersectional approaches to understanding physics identity [13][14][15].
In this paper, we bring a critical lens to how we measure the success of our educational interventions. In particular, we focus on the construct of "expert-like" within qualitative and quantitative work in PER. We will argue that many existing measures of the success of both individual students and instructional practices are built upon narrow definitions of expertise that undervalue the diverse ways in which progress in student learning and teaching practice take place. To do so, we begin by summarizing work on bias in physics and efforts to improve inclusively and equity in physics education. Next, we discuss important PER literature that aim to define expertise, and we show that these works have omitted demographic data for the physicists they identify as experts. We close with a discussion of how PER can move forward to address our role in diversity, equity, and inclusion efforts, including methods to more equitably study expertise.

A. Bias in physics, physics epistemology, and physics education
Comprehensively addressing social justice, equity, diversity, and inclusion will require work along many avenues. We need to eliminate direct sources of bias (e.g., explicit actions like racist name-calling and implicit actions like microaggressions [16,17] and unconscious/dysconscious racism [18]) and to recognize the roles of positionality and lived experiences. An individual's social, political, and cultural contexts shapes their experiences and perspectives, which in turn influences how they perceive and respond to the world around them. Since existing societal systems place dominant (e.g., white) groups at the apex of the social hierarchy, characteristics like whiteness become normalized and invisible, espe-cially to people who are from those groups and benefit the most (e.g., [19][20][21][22][23]). As a result, without explicitly taking steps to acknowledge and subvert systems of bias, actions by individuals (especially those from dominant groups) tend to reinforce social hierarchies and favor those who already have more power in the status quo (e.g., [24]).
Physics is largely composed of white male individuals, including among both college students [25][26][27] and faculty [27,28]. Without critical reflection and actions to address positionalities within the field, the implication is that physics research, education, and epistemology -which has largely been constructed by white and male individuals -inherently (yet often unintentionally) reproduces biases that favor White men. While physics and physicists often claim the field is an objective search for the "truths" of the Universe, physics research and education are ultimately done by people, meaning it is a social process and subject to societal systems and biases and thus never honestly "objective" (e.g., [29][30][31][32]).
Bug [33] discussed a feminist critique of physics's approach to diversity: merely creating spaces for women (e.g., through mentoring or other niche programs) does not necessarily mean that physics communities and spaces are diverse. However, diverse teams are needed to create unbiased analyses and more innovative solutions for physics research questions. For women who are present in physics, they may also have to contend with (de)mentoring that results in the "untraining and retraining" of women so that they may better assimilate into prevailing physics cultures [34].
Individuals whose intersectional identities include multiple marginalized groups [35,36] contend with entanglements from all axes of bias. Prescod-Weinstein's Black feminist critique of physics epistemology [37] proposed a model of white empiricism to explain physics's bifurcated logic: while Black women's stories and evidence of exclusion is often dismissed, physics constructs like string theory receive much credence and attention despite a lack of empirical evidence. Thus, white empiricism reinforces systemic norms about who has the authority to make claims about physics and to change the field's direction. As Cochran et al. noted, "[f]ailing to attend to the lived experiences of individuals with multiple and intersecting minoritized identities is consequential. At best, favorable outcomes from research will not be sustainable, and at worst, the result will exacerbate the problems of inequity and injustice in STEM, resulting in a cul-de-sac of research in STEM education, either hitting a dead end or going in circles" [38, p. 258].

B. Inclusivity and equity in physics education
In the past decade, there has been increased pressure to transform physics education to improve diversity, equity, and inclusion. Students who leave physics and/or have negative educational experiences, especially those from minoritized groups, often do so because they struggle to negotiate between the culture of physics (and of STEM, more broadly) and their own cultures -even when they perform as well as (or better than) their peers (e.g., [3,39]). A major factor in students' (physics) experiences is the classroom en-vironment. During a students' K-16 career, the majority of their time is spent in classrooms. Students' persistence in science is strongly predicted by social interactions (e.g., between instructors and students, and between students themselves), poor quality teaching, the absence of a sense of belonging, and weed-out effects [3,40].
In response to these challenges, some groups have developed classroom interventions. At the K-12 level, teachers can implement modules or similar units that are designed to address topics like recruiting women into physics careers [41,42] as well as systemic racial biases and microaggressions [43]. At the college level, Gutmann et al. [44] created an intervention to engage students in discussions about physicists' social responsibilities. Students engaged in scaffolded dialogue about what were the responsibilities of the "expert scientists" who designed the nuclear bomb.
In parallel to these efforts, since the 1990s, several models have promoted building courses (and thus classroom environments) on a foundation that embraces culturally relevant [45], culturally responsive [46], and/or culturally sustaining education [47]. In general, these approaches seek to transform education in a way where students achieve high expectations of academic performance through methods that actively engage and value students' existing knowledge and assets. Additionally, these approaches guide students to develop a critical consciousness where they are empowered to problematize and critique the world around them to reflect on and address inequities in their communities [48]. Several studies have focused on the implementation of these mindsets within physics and astronomy. For example, Lee [49] compared student learning outcomes between a culturally responsive versus typical college-level ASTRO 101 courses and found that all students experienced larger learning gains (as measured with the Astronomy Diagnostic Test 2.0 [50]), higher course grades, and higher engagement (as measured by classroom observations and student surveys) in the culturally responsive course, with more significant increases among students from historically marginalized groups. However, these approaches are currently still more common at K-12 rather than college levels, and transforming a course with these mindsets may require more instructor effort than classroom interventions.

II. THE HISTORY OF "EXPERT-LIKE" IN PER
Throughout PER's history, researchers have worked to identify both (1) the canonical conceptual knowledge by which to measure student success and (2) the attitudes, beliefs, and behaviors of physicists that we would like students to develop. In this paper, we focus on constructs designed for the latter purpose.
A. Defining "expert-like" in problem-solving literature One of the oldest areas of PER involves the study of expertlike and novice-like problem solving strategies. Early work by Reif and Heller [51,52] aimed to identify expert approaches to problem solving by characterizing "effective human problem solving." Their universal "human" approach and included no discussion of race, gender, or cultural back-grounds. In the late 1980s, work by Hardiman and others focused on interviewing both "experts" (physics Ph.D.s) and "novices" (introductory physics students) [53], and they also overlooked any description of the demographics of either the experts of novices studied. More modern work [54][55][56] has continued to omit demographics of either group, including studies published as recently as 2020 [55]. While some work in PER has examined gender differences in problem solving approaches [57], there appears to be no such PER-specific literature addressing cultural differences.

B. Ideas from physics epistemology research
Early qualitative work by DiSessa [58], Hammer [59], and Elby [60] focused on understanding student beliefs that are productive for developing an understanding of physics. Their methods largely involved observing and interviewing students about their approaches to learning physics and their epistemologies (e.g., [58][59][60][61]). These researchers did not explicitly focus on how experts thought, nor did they seek to define expertise. The focus on student thinking offered a potentially more inclusive approach if the students come from a broad range of backgrounds. We address this work here as it was fundamental in building assessments that explicitly aim to measure expert-like beliefs in thinking.
In these works on epistemology, the authors generally did not specify the demographics of the students they interviewed (e.g., [58]). If they did, the demographic breakdown was generally limited to gender, and the small number of women was described as limitation of the work (e.g., [59]). Hammer acknowledged that his results may have differed slightly if he had been successful in recruiting more female participants. He did not address the race or ethnicity of his participants. Research in the 2000s explicitly included female students' perspectives (e.g, [62]), while more recent work has specifically analyzed gender differences in epistemological views [63,64]. This is part of a broader pattern in PER: gender differences are often analyzed, but without similar analysis based on cultural background, race, or ethnicity.
The resources framework [61,65,66] directly built on work on student epistemologies. In this theoretical framework, Hammer and colleagues argued that instead of having a fixed coherent set of epistemological views, students (and people in general) have a set of epistemological resources that may be activated under certain conditions. However, critiques of this framework included the lack of an acknowledgement of the role played by cultural differences [67]. Nonetheless, researchers in science [67,68] and mathematics [69] education have explicitly used ideas from the resources framework for understanding the role of non-Western epistemologies in classrooms. However, that work is largely confined to K-12 STEM education research.
C. Defining "expert-like" in surveys of attitudes, beliefs, and skills Building explicitly on work on students' epistemology, a range of tools were developed to measure students' attitudes and beliefs about physics. These tools generally defined a specific "expert-like" response that was used as a measure of success (e.g., for whether a course helped students' thinking become more "expert-like").
One of the earliest such surveys was the Views About Science Survey (VASS), which sought to distinguish between "expert" and "folk" views of physics [70]. The expert perspective was constructed by the authors themselves, and they administered the survey to "high school and college teachers" to examine how closely their responses matched the authors'. The generally high agreement was taken as evidence for the validity of the survey. However, the authors never specified demographic information for either the "expert" teachers nor the students they targeted their survey towards.
Shortly afterwards, the Maryland Physics Expectation Survey (MPEX) [71] was designed to quantitatively study the phenomena qualitatively described by Hammer [59]. Their initial survey validation began with student interviews to ensure they reliably understood and could answer the questions. They then administered their finalized survey to five groups for further validation: introductory physics students, the US Physics Olympiad team, high school teachers in a professional development program, college teachers in a professional development program, and faculty involved in a project to implement "workshop" physics at multiple institutions. They chose this last group as the "experts" by which the "expert-like" responses would be defined, placing a bias not only towards physics faculty but also for those involved in curricular reform efforts. Thus, these individuals were interpreted as experts in teaching physics. The authors did not report demographic data for these five groups [71].
One of the most widely used surveys is the Colorado Learning Attitudes about Science Survey (CLASS) [72]. They aimed to measure similar beliefs as MPEX, and they validated their prompt groupings with item response theory to more accurately measure underlying student attitudes. Unlike the authors of VASS and MPEX, they did not initially give their survey to a sample of "experts" -in the words of the authors, "[t]he 'expert' and 'novice' responses to each statement were unambiguous so scoring of the responses was simple and obvious" [72, p. 2]. As with VASS and MPEX, demographic data for the student participants were not reported.
Building on the CLASS, the PER group at the University of Colorado Boulder developed the Colorado Learning Attitudes about Science Survey for Experimental Physics (E-CLASS) [73]. This survey was designed to measure students' attitudes and beliefs specifically related to experimental physics and laboratory work. The authors were explicit about the criteria they used: "Like any tool for assessment of instruction, the E-CLASS must meet the triple criteria of (1) measuring something that experts and instructors care about (i.e., it should be aligned with widely accepted course goals), (2) targeting areas where students may not be meeting instructors' goals, and (3) accurately capturing some aspects of student thinking and learning." [73, p. 2]. Like the authors of MPEX and VASS, they administered their survey to "experts," defined as college instructors. They offer more detailed information about these experts: "To date, we have collected 23 expert responses (3 full-time instructors and 20 with a blend of teaching and ongoing research in experimental physics) from both primarily undergraduate serving institutions (N=7) and Ph.D. granting institutions (N=16)" [73, p. 6]. As with VASS, MPEX, and CLASS, any further demographic data was omitted. Thus, while they recognized potential differences in attitudes between faculty from primarily undergraduate versus Ph.D. granting institutions, they did not consider other types of institutions (e.g., two-year colleges), nor the role of other dimensions such as gender, race, or ethnicity.
One of the newest assessments is the Physics Laboratory Inventory of Critical Thinking (PLIC). PLIC was designed to measure a broader set of student experimental decision making [74]. In constructing their survey, they began with student interviews and an open-response form, which they then used to construct the closed-response version. Unlike other authors, they did report student demographic data (gender and race). Based on their sample demographics relative to that of their institution (Cornell University), it appeared as though they made an attempt to recruit a diverse sample of participants. However, their approach for expert participants did not share this focus. They administered their survey to "expert physicists (faculty, research scientists, instructors, and postdocs)" [74, p. 6]. However, while their "experts" represented a broader professional range, as with the other authors, they did not report any demographic data on these experts.
There is a clear pattern across all of these tools: beyond knowing about the professional identities of the "experts" used to define the "expert-like" views, we are given no information about their identities outside of physics, which shapes how they engage with the field and their attitudes and beliefs about expertise (e.g., [75,76]). The authors of the PLIC offered a step forward (constructing their closed response survey based on the words and responses of a diverse student sample) that should be applied more consistently.

III. DISCUSSION
Addressing the lack of diversity in physics (and, more broadly, in STEM) has been a topic of national conversations since at least the 1970s [77]. However, inequities remain that limit the effectiveness of efforts to change this situation.
For example, one focus is the physics classroom. Arguably, classroom environments are a major factor in students' physics experiences (e.g., [3]). However, this almostexclusive focus has several shortcomings. Many of these efforts focus on classroom interventions (e.g., [41,43,44]). While these interventions can impact students' beliefs (and should continue in reform efforts), they may not do enough to eliminate the obstacles that students encounter when trying to navigate between physics' cultures and their own personal backgrounds. These kinds of barriers can cause students from marginalized backgrounds to leave science even when their academic performance is as good as (or even better than) their peers [3]. Thus, we need to incorporate strategies that take a more critical look at classroom environments and cultures. Such approaches have been proposed over the past 30 years (e.g., culturally relevant, culturally responsive, and culturally sustaining education [45][46][47]), but they are not widely discussed within physics education nor PER. Furthermore, much of the research on these strategies focuses on the K-12 level. While many studies show that high school experiences are strong predictors of performance in college physics courses (e.g., [78]), these trends do not absolve college-level physics courses of responsibility for the lack of diversity, equity, and inclusion in physics.
However, this responsibility does not belong to instructors alone. Physics education researchers also need to contribute to and participate in these difficult discussions. PER is promoted as evidence for how instructors can improve their courses. But if PER is done without explicitly reflecting on, acknowledging, and addressing how our positionalities may reinforce the biased status quo, when PER will be another barrier to achieving diversity, equity, and inclusion in physics.
There has been some progress in recent years towards identifying biases in PER. Kanim and Cid have shown that PER is largely conducted at institutions that are disproportionately populated by white students relative to the college population as a whole [11]. This bias affects the work discussed here: few authors have attempted to ensure their definitions for "expert-like" physics views are valid for students from different backgrounds. The gender and race of the participating "experts" is ignored. It is possible that the authors of these works made attempts to include physics Ph.D.s from different backgrounds in their samples, but if they did, they did not state so in their papers. We agree with Kanim and Cid that this is a critical omission that allows biases to persist.
Work in PER needs to improve on sampling from diverse populations in a non-tokenizing manner to ensure our work reflects multiple ways of thinking. The authors of the PLIC have provided one such approach, as have researchers of K-12 STEM Education who have applied the resources framework to leverage multiple epistemologies in classrooms.
Our work here is influenced in several ways. In this short paper, we focus primarily on the most commonly used and cited qualitative and quantitative studies of physics expertise. There is far more work in this area beyond our scope that we hope to address in future work. We would also be remiss to omit our own positionalities [79]: both authors are postdocs at "R1" institutions. The first author is a white woman from a highly educated family. Both her parents and all four of her grandparents held graduate degrees, and her paternal great grandfather was a physicist. The second author is a multiracial though white-passing cisgender woman, identifies as part of the LGBT+ community, and grew up in a privileged upper-middle class family. We have both come to this work by studying critical theory and culturally responsive teaching. However, our primary backgrounds are in physics and astronomy, and we are not critical theorists by training.

IV. HOW DO WE MOVE FORWARD?
We are far from the first scholars to call for more inclusive methodologies in PER. Cochran and colleagues have criti-cized the methods used to evaluate student success in physics courses, such as concept inventories and traditional exams: Introductory college-level physical science course evaluation structures are typically embedded with the idea that a good STEM student is one who has a "natural" spatial intuition (Hsi, Linn, & Bell, 2013). Critical race theory and intersectionality both urge us to consider the power of this underlying assumption and to make manifest how structural inequalities shape who succeeds and who fails under this paradigm. [38, p. 260] We similarly call into question the underlying assumptions behind how we identify expertise in attitudes and beliefs about physics. If the experts we use to define "expert-like" are as predominately white and male as the typical R1 physics department, are we missing different ways in which people might be experts? If PER truly wants to embrace diversity, equity, and inclusion, we cannot choose to continue to re-use "good-enough" surveys and deny the resulting complicity in replicating biases.
A small step forward is to revisit prior findings about expertise through studying more diverse samples. Explicitly recruiting additional non-cis-male and non-white research participants in samples of both experts and students may help; however, it is important to consider the impacts of overburdening faculty who are already shouldered with disproportionate non-research responsibilities as is the case for faculty from these groups [80,81]. Ensuring that tools are validated with diverse student populations in a non-tokenizing manner will help as well; as with Kanim and Cid, we encourage physics education researchers at R1 institutions to partner with instructors at two year colleges and other predominately undergraduate serving institutions.
Small steps, though, may not be enough. Efforts to make PER more diverse and equitable may not adequately serve a goal of justice. How do we, as researchers, support goals of dismantling systemic oppression? This may require a deeper reimagining of what success and expertise in science look like. Here, work on critical science agency [82,83] may be helpful. What if, instead of measuring the success of students or of educational interventions in terms of developing "expert-like" thinking, we instead create tools for measuring when students have ownership, recognize each other's authority, and see themselves as experts [82]?
If PER is to help promote justice, equity, diversity and inclusion in physics, we must turn a critical lens to our own research practices and constructs. We must recognize our own history that has reinforced oppression and reinvent our tools to build a more just future for physics education.

ACKNOWLEDGMENTS
We thank the three anonymous reviewers for their helpful comments. We also acknowledge the physics students who motivate us to do this work.