Inconsistent gender differences in self-efficacy and performance for engineering majors in physics and other disciplines: A cause for alarm?

Prior research has shown that self-efficacy can be a critical factor in student learning and performance in different STEM disciplines. Moreover, although past research has documented self-efficacy differences between female and male students in some STEM disciplines, there has not been research comparing these relations across disciplines. In order to better understand these relations and how self-efficacy and academic performance are related, we analyzed undergraduate engineering students’ physics, mathematics, engineering, and chemistry grades using large-scale institutional data and their self-reported self-efficacy using a validated survey in each of these disciplines to examine gender differences in engineering students’ self-efficacy and course grades. We find discipline-dependent trends in the relationship between self-efficacy and course grades, including a self-efficacy gender gap in physics which does not close by the fourth year in engineering along with a gender gap in physics course grade that favors men despite women engineering majors outperforming men in every other discipline. The troubling trends reported here should be addressed in order to make STEM learning equitable and inclusive.


I. INTRODUCTION
There is a significant under-representation of women in many Science, Technology, Engineering, and Mathematics (STEM) majors and careers including engineering [1]. Prior research suggests that many interrelated factors influence women's decisions to pursue an education in engineering, their choice of subfield to study, and even whether to remain in engineering after choosing it as their major [2][3][4][5][6]. These include sociocultural and motivational factors as well as various aspects of prior education such as quality of teaching and the level of mentorship and support [5][6][7][8][9][10][11][12][13][14]. In particular, sociocultural bias and stereotypes about who belongs in a discipline and who has what it takes to succeed in the discipline can negatively and disproportionately impact the self-efficacy and academic performance of women in various STEM subjects such as engineering, mathematics, and physics [6][7][8][9].
Self-efficacy is the belief in one's capability to succeed in a particular task or subject [15][16][17], and in the context of STEM education it can both affect and be affected by academic performance [18][19][20][21]. Due to this recursive nature, measuring self-efficacy and STEM academic performance longitudinally is vital to understanding their relationship. In particular, this feedback loop between self-efficacy and performance has the potential to promote or hinder student learning, and if left unchecked it can produce growing inequities, especially for those traditionally under-represented in a STEM discipline and subject to stereotype threats. Moreover, this feedback loop can be particularly damaging to women since success in mathematics and science courses has a positive impact on students' choice of and persistence in an engineering major [22,23] and longer-term career goals [21,24].
Prior research suggests that women tend to have a lower self-efficacy than men in subjects such as physics [25][26][27], chemistry [28], mathematics, and engineering [3,4,10,29,30], all of which are mandatory in engineering curricula in a majority of engineering programs in the US. However, there has been little research comparing self-efficacy in various STEM disciplines for the same student population. Since self-efficacy is implicated in student performance and retention, self-efficacy issues pertaining to any of these foundational science or mathematics courses in the first year engineering curriculum can have long-lasting effects on student decisions to pursue engineering degrees and careers.
Another notable gap in the research literature is the lack of studies on the change in student self-efficacy over time, particularly across a full engineering program, although some studies have investigated these changes over the course of two semesters and generally found little (or small) change over that period [26,31,32]. Another study found that over two years, the engineering self-efficacy of women showed a positive trend [33]. However, little is known about the evolution of self-efficacy and its gender differences from the first to the fourth year of an engineering program. The self-efficacy differences in place at the end of the fourth year will be more relevant to student career choices than those in the first few years of undergraduate education. One may hypothesize that since the course grades are generally higher in the more advanced engineering courses (compared to first year courses) and experience with success accumulates, gender differences in selfefficacy initially observed in the earlier years [26,31,32] may disappear. The alternative hypothesis is equally plausible in that the negative effects of lower self-efficacy on exam performance may compound over time and the feedback loop could magnify both the self-efficacy and performance differences across gender.
There are multiple factors that lead to the alignment of self-efficacy beliefs and actual efficacy in academic performance: 1) self-efficacy beliefs are informed by prior performance feedback [18], 2) future academic performance is influenced by self-efficacy [18][19][20][21], and 3) many motivational factors simultaneously influence self-efficacy and academic performance. In particular, stereotype threats that women experience in many STEM disciplines due to societal stereotypes and biases can increase their anxiety, rob them of cognitive resources while solving problems, and lead to reduced test scores [19]. These same stereotype threats can lower self-efficacy, which can result in reduced interest and engagement during learning [21,34]. However, these factors such as stereotype threat may influence performance and self-efficacy disparately by gender.
Longitudinal measurements of students' self-efficacy and academic performance in different STEM disciplines can provide insight into the long-term trends in these two intertwined factors. Engineering students are an ideal population for such a study since they engage in coursework from many STEM disciplines simultaneously, allowing for comparisons among trends in different disciplines using the same population. Here we describe an investigation of longitudinal gender differences in self-efficacy and academic performance by analyzing these two measures simultaneously. Our research questions (RQs) to investigate these trends in undergraduate engineering education are as follows: RQ1. Do men and women's self-efficacy in various disciplines change along different trajectories as they progress from their first to their fourth year? RQ2. Do gender differences in course grade vary by course discipline? RQ3. Is there a match in sign and magnitude between gender differences in self-efficacy and course grades?

II. METHODOLOGY
Using the Carnegie classification system, the university at which this study was conducted is a public, high-research doctoral university, with balanced arts and sciences and professional schools, and a large, primarily residential undergraduate population that is full-time and reasonably selective with low transfer-in [35]. De-identified demographic information and grade data were provided by the university on all first-year engineering students who had enrolled from Fall 2009 through Spring 2018, a sample totaling 3,928 students. We note that gender is not a binary construct, however the demographic data includes "gender" as a binary categorical The means and (small) standard errors of self-efficacy scores of engineering students at the end of their first, second, and fourth years in each of the foundational subjects in undergraduate engineering curricula are plotted. Self-efficacy was measured on a Likert scale from 1 to 5. The vertical range of self-efficacy scores has been restricted to better show the gender differences. Above each pair of points is Cohen's d (d < 0 and d > 0 indicate a higher mean for women and men, respectively) and the statistical significance of the gender difference according to a t-test, with * p < 0.05, * * p < 0.01, * * * p < 0.001, and ns p > 0.05. Lines are added as guides to the eye.
variable, so that is how the data are represented in these analyses. The full sample of students was 28% female and had the following race/ethnicities: 80% White, 9% Asian, 5% African American, 2% Latinx, and 5% Other. The mean age at the beginning of the students' first year was 18. A subset of this sample totaling 2,089 students also participated in surveys administered by the School of Engineering from Spring 2013 through Spring 2018 to students at the end of their first, second, and/or fourth years. Self-efficacy data were collected as part of an online survey given to all engineering students at the end of the spring semester of their first, second, and fourth years. Students were given a few reminders to complete the survey and were told that this survey is important for evaluating the effectiveness of the engineering program, resulting in a completion rate averaging 79%. The items used in this study consisted of responses to four prompts asking students to "Please rate your level of confidence in the following knowledge and skill areas: My ability to use my knowledge of [mathematics/engineering/physics/chemistry] to solve relevant engineering problems." The students were given five options -"poor," "fair," "good," "very good," and "excellent" -coded on a Likert scale from 1 to 5.
The survey was originally validated by the School of Engineering, and constructed so each item is in the context of engineering. The Department of Physics and Astronomy also administers its own validated survey in introductory physics courses which includes five items relating to self-efficacy in the context of physics specifically. A subset of engineering students (N = 446) completed both the physics survey within their Physics 2 class and the engineering survey around the same time. The physics self-efficacy measures on the two surveys were highly correlated (r = 0.60). Finally, the physics self-efficacy survey showed the same gender difference as the engineering-context physics self-efficacy (Cohen's d = 0.76 in a physics-context vs. d = 0.84 in an engineering context, both with higher means for men). This result is consistent with prior research which suggests that self-efficacy judgments for a science discipline that refer to different performance contexts (e.g., a lab setting, test-taking, working on projects) tend to cohere as a single construct [26].
In order to test for statistically significant differences in self-efficacy scores, we performed t-tests comparing the scores of men and women. These tests were run separately for the self-efficacy scores in mathematics, engineering, physics, and chemistry. Similarly, using t-tests, we investigated gender differences in course grades earned by men and women in various courses. Effect sizes for differences in self-efficacy and course grade were calculated in standard deviation units via Cohen's d, with d < 0 and d > 0 indicating higher means for women and men, respectively. The courses we investigated were the foundational courses taken by the largest number of students in the School of Engineering, namely all of the common first-year courses in engineering, physics, chemistry, and mathematics as well as a selection of advanced mathematics courses taken by students in a variety of engineering departments. The first-year engineering courses teach students to use computational tools such as MATLAB and C++ in an engineering context.

III. RESULTS
In order to understand the perceptions of these engineering students about their foundational course work and answer RQ 1, we plot in Fig. 1, the mean self-efficacy scores of men and women in each of the four foundational subjects (mathematics, engineering, physics, and chemistry) at each time point (end of the first, second, and fourth years). Looking at the first year data in Fig. 1, there is a statistically significant gender gap favoring men in self-efficacy scores for applying mathematics, engineering, and physics to their work in engineering, and no difference in chemistry. Mathematics and engineering follow similar trajectories in that the initial gap remains in the second year and is eliminated by the fourth year. TABLE I. Reported are the performance differences between engineering men and women for grades earned in introductory courses in engineering, physics, chemistry, and mathematics as well as advanced courses in mathematics. Along with summary statistics (N , mean µ, and standard deviation σ) separated for men and women, we also report the p-value from a t-test comparing the grades earned as well as effect size (Cohen's d, sign convention matching Fig. 1 In sharp contrast, the initially large gap in physics shrinks but remains significant even up to the fourth year, with a difference as large as the initial gap in engineering. Although not a focus of this study, we note that self-efficacy of both men and women appears to grow over time, as expected; the lack of growth in chemistry self-efficacy may reflect the relatively small role chemistry plays in the most populous engineering majors (electrical and mechanical engineering).
To answer RQ 2, gender differences in grades were investigated across the entire first year engineering curriculum. We also investigated these performance differences for advanced mathematics courses commonly taken to supplement engineering curricula. Table I reports the summary statistics (N , mean µ, and standard deviation σ) for each course along with a p-value from a t-test comparing the grades earned by men and women in that course and the effect size (Cohen's d).
For all but one course, there were statistically significant gender differences. Of particular note, the signs of the differences varied by discipline, with women receiving higher grades than men in every course except introductory physics (all statistically significant except for Engineering 2). Moreover, although physics had the lowest mean grades, the gender patterns in physics cannot be explained by physics be- Course Grade (Cohen's dCG)  Fig. 1) of gender differences in self-efficacy and course grades earned are plotted for each of the introductory courses as well as advanced mathematics courses. Lines have been added for d = 0 on both axes (dashed) as well as where effect sizes in self-efficacy (dSE) and course grade (dCG) are equal, namely dSE = dCG (dotted). Ellipses have been drawn to group all of the courses in each subject. Each point contains the data of only those students for which both a grade and self-efficacy were available. The self-efficacy used was the point closest to the year in which the student took the corresponding course. Vertical lines have been added showing the selfefficacy deviation (∆dSE) of each subject (measured from the average position of the constituent courses) to the dSE = dCG line.
ing the most difficult course because Calculus 2 and 3 had similarly low grades but the opposite gender differences from physics (with women on average performing better than men in all mathematics courses). Similarly, the differences could not be explained in terms of the stronger role of mathematics in physics versus chemistry because women had higher grades in every mathematics course. It should be acknowledged, however, that none of the gender differences in course grade were large. Instead, what is surprising is the pattern of medium to large gender differences in self-efficacy despite small differences in performance. Furthermore, in mathematics and engineering, we see opposite trends in the two measures, with women on average having higher grades but lower self-efficacy than men.
In order to answer RQ 3 and investigate the relationship between self-efficacy and performance, we combined the two previous analyses to simultaneously plot the effect sizes of gender differences in both self-efficacy and course grades (Fig. 2). For each point in the plot, we used only the population (N varying from 579 for Linear Algebra to 1,163 for Engineering 1) for which we had both a course grade and a reported self-efficacy in the nearest survey -first year for the introductory courses, and second year for the advanced mathematics courses. This restriction of the population may alter the effect sizes from those in Table I or Fig. 1. In addition to dashed lines along d = 0 on both axes, there is a dotted line along d SE = d CG (where the effect size of self-efficacy, d SE , equals that of course grade, d CG ), which represents where the data might fall if there was a one-to-one relationship between the effect sizes of self-efficacy and course grade. In addition, a vertical line is shown from the center of each discipline to the d SE = d CG line, which represents the deviation of selfefficacy differences from academic performance differences.

IV. SUMMARY AND IMPLICATIONS
The upward trend of physics self-efficacy scores in Fig. 1 looks similar to mathematics and engineering (consistent with prior research [33]), but notably the means are lower and the gender gap has not closed entirely in physics by the fourth year. In addition to women having a much lower physics selfefficacy, physics is the only discipline in which women are not performing as well as men. One interpretation is that the root of the problem lies in societal stereotypes about physics being a field for brilliant men, resulting in a reduction in the physics self-efficacy of these women which then negatively impacts their performance. We note that these women are performing better in mathematics and engineering courses despite having lower self-efficacy than men, so it is reasonable to ask why the same is not happening in physics courses.
Looking at the overall trends in self-efficacy and performance, there are three distinct patterns that emerge: one for both mathematics and engineering, one for physics, and one for chemistry. Self-efficacy scores in both mathematics and engineering begin in the first year with similar gender gaps, which remain in the second year and are finally eliminated by the fourth year. The mean self-efficacy scores within these disciplines for men and women, respectively, remain similar at every point in time, with mathematics scores only slightly higher, possibly due to more familiarity with the subject from high school. On the other hand, mathematics and engineering appear to behave slightly differently in course grade differences according to Table I, with women consistently outperforming men in mathematics courses and only outperforming men in one of the two introductory engineering courses. However, when we look only at the population for which we have survey data (Fig. 2), the trends in mathematics and engineering on both measures are very similar.
Chemistry is the only discipline in Fig. 2 which falls along the d SE = d CG line, where the effect sizes of gender differences in self-efficacy are perfectly aligned with the effect sizes of gender differences in performance. In contrast, while both chemistry and physics are in 'match' quadrants of the figure, chemistry is precisely on the d SE = d CG line, while physics is the furthest from it. Thus, men have higher selfefficacy than women in applying physics to engineering and also have higher performance, which is a matching trend, but the effect sizes are completely mismatched: the selfefficacy gap in physics is much more pronounced than the course grade gap, causing physics to lie further away from the d SE = d CG line than any other discipline.
Since the effect sizes of self-efficacy differences outweigh those of performance differences, we should strive towards increasing students' beliefs about their abilities in a subject in order to more accurately match their performance. To that end, we can consider the vertical distance from each subject to the d SE = d CG line, representing a deviation from a realistic perception of performance. Mathematics has a short distance to this line, followed by engineering. Finally, again, physics has the largest deviation from a realistic perception, driven by a gender gap in self-efficacy that overwhelmingly favors men despite only small differences in performance. Given the similarities in the trends of these disciplines in Fig. 1, reducing the largest deviation, namely in physics, could also serve to reduce the deviation for the others.
Engineering is a field that is generally male dominated, with variation across different majors within engineering. Further research could probe the reasons for why women are less likely to follow certain paths in engineering, especially by focusing on the relationship between perceptions of mathematics, physics, and engineering. Our data suggest that if women are making enrollment decisions influenced by selfefficacy, then they may choose against disciplines seen as more related to mathematics and physics even if they perform relatively well in those disciplines. In particular, there may be women whose high school course performance in STEM is on par with men who may be interested in engineering but whose self-efficacy in mathematics and especially in physics, decreased by stereotype threats due to societal stereotypes and biases, prevents them from choosing engineering major.
Finally, although these analyses do not show the same connection between physics and engineering as we find between mathematics and engineering, we hypothesize that the extreme gender gap we observe in physics self-efficacy may also play a role in the low ratio of women in disciplines such as mechanical and electrical engineering. Since the self-efficacy gender gap in physics is so much larger than in mathematics and self-efficacy and performance can feed on each other, it is likely that it has a larger negative impact on the grades earned by women in their introductory physics courses. Thus, improving the physics self-efficacy of women and other students traditionally underrepresented in physics by creating equitable and inclusive learning environments is essential. In particular, systemic efforts should be made by instructors, advisors, and physics, mathematics and engineering departments as a whole to create equitable and inclusive learning environments that reduce the influence of accumulated societal stereotypes and biases that can discourage women to pursue and excel in these disciplines.