Tracking student engagement with open educational resources (OER) and online homework

As education increasingly uses online and digital learning tools and resources, an opportunity arises to study students’ learning behaviors and outcomes through data analytics. In this study we perform correlation data mining of individual student’s click-stream on both an Open Educational Resource site, BoxSand.org, and online homework on Mastering Physics. We combined exploratory and statistical analyses with a long-term goal of creating inferential and predictive models.


I. INTRODUCTION
To improve student engagement and success, the Physics Department at Oregon State University implemented transformative pedagogical changes to its introductory algebra-based physics sequence.These changes transitioned in-class time from lecture-based instruction to more student centered, actively engaged peer instruction [1,2].We adopted a flipped classroom approach [3][4][5] by creating more than 300 pre-lecture videos and a host of other learning resources on BoxSand.org(BoxSand), an Open Education Resource (OER) site designed to facilitate out of class study.We accompanied these videos with pre and post lecture assignments from Mastering Physics, an online homework platform [6].In this paper, we use data from interactions with BoxSand and online homework to shed light on the effects of these new resources on student learning.This research serves as a starting point for assessing the full impact of pedagogical changes on student success.

A. Research questions
The students in this course are non-physics science majors, primarily in their junior and senior year.Their interest in physics varies widely.Will they prepare for class?Will they watch the pre-lecture videos?What began as a concern about students' motivation has presented itself as an opportunity to study engagement behavior.To measure student engagement, we tracked their interaction with BoxSand and online homework, and we used this data to inform our understanding of how increased engagement affects student success.Generally, we want to explore the usefulness of learning analytics [7] to better understand student engagement with OER and online homework and how this engagement correlated with grades in the class.Specifically, we asked the following research questions.(1) Which engagement behaviors are correlated with each other and which engagement behaviors are predictors of student success, as measured by course grade?(2) When are students most engaged with the course and does cramming before an exam help performance?(3) How does interaction with the online homework influence course grade?(4) Can changing engagement behaviors influence exam grades?(5) Can we create a statistical model to quantify and predict student final grades based on their engagement with resources and exercises outside of the classroom?
Our work addresses the PERC 2016 call for methodical approaches to PER, including predicting and generalizing learning outcomes [8][9].Previous work has explored correlations between video engagement and student performance showing little to no correlation [10][11].We hypothesize our results may be different due to the customized nature of our videos and how much they are integrated into our curriculum.We've been using this class structure for multiple years and institutional memory of the video's place and value has had time to form.As suggested by Lin et al. [11], we use external drivers such as Learning Assistants encourage students of the importance of watching videos, hopefully resulting in an increase in their value.

II.METHODS
Though this study is ongoing, the data presented in this paper is from the 2017-2018 academic year, unless stated otherwise.We presented the study to students on the second day of class in the fall term.We acquired consent to track engagement via in-class signatures, and 317 students agreed to participate for the entire sequence.In the fall, winter, and spring terms, this resulted in 55.6%, 49.5%, and 47.7% of the class consenting to the study, respectively.The average grade (71%) and female-to-male ratio (1.4) of the consenting students are nearly identical to the entire class, so we believe these consenting students constitute a representative sample with respect to these metrics.

A. Data collected
We created BoxSand [12] to host traditional course specific materials, including the syllabus, calendar, assignments, class templates, problem solutions, and more.BoxSand is also filled with thousands of OER including videos both from OSU and YouTube, textbooks such as OpenStax, infographics, concept maps, simulations, practice problem sets, and more.We designed BoxSand to provide curated sets of primary and supplementary learning resources.These resources are organized by topic in a menu driven system, and students are guided to the resources we believe are best for student learning, such as the pre-lecture videos created specifically for our course.We tracked student engagement through click-stream interactions with content on BoxSand via Google Analytics and custom Drupal modules.During the 2017-2018 academic year, we collected 1.4 million interactions from BoxSand.
Pearson, a publishing company, owns and hosts Mastering Physics, the online homework system.Pearson provided us with all click-stream interactions from our students on their site.Examples of these interactions included site navigation, problems attempted, and access to hints.During the 2017-2018 academic year, we collected 1.3 clicks on Mastering Physics.

B. Data analysis
To begin to understand student engagement behaviors, we explored their interactions with subsets of the BoxSand and online homework click-steam data, such as pre-lecture videos, exam solutions, the syllabus, the OpenStax textbook, practice problems, and more.We used this data to answer research questions (1) through ( 4), and provide a summary of results in Section III.To answer research question (5), we created a statistical regression model to quantify what we believed to be the most important online engagement behaviors related to student success: watching custom BoxSand lecture videos and working through the online homework system.This model is given by  =  0 +  1  +  2  +  3  +  4  +  5  +  + , (1) where y is the course grade,  is the proportion of total BoxSand video quartiles a student watched in a term,  is the proportion of all online homework problems attempted in a term,  is the proportion of correct attempted online homework problems relative to the attempted online homework problems in a term, W indicates winter term, and S indicates spring term.For example, if a student completes half of the online homework problems and gets all of them correct, their  and  values are 0.5 and 1, respectively.Each  term represent the average change in course grade for increases in the respective predictor variable, holding the other variables constant.In equation ( 1),  is calculated after removing each student's unique online homework contribution to the final grade, which is a maximum of 5%.Therefore, the maximum value for  is 95%.We did this in order to remove the direct influence of the online homework on course grade, allowing us to characterize the effect that engagement with online homework has on the other graded components in the course.
Equation ( 1) is a linear mixed model [14] because it incorporates multiple error components,  and , with unique variances,   2 and   2 , respectively.  2 captures student to student variability and   2 captures term to term variability within students.
We estimated the parameters in equation (1) using restricted maximum likelihood [16][17] with the lme4 package [15] in the statistical software program R [20].In Section III, we assess the impact and statistical significance of our predictor variables on final grade.We performed diagnostic checks on the model assumptions, including normality of the residuals, using standard guidelines covered thoroughly in Mixed-Effects Models in S and S-PLUS [14].We also investigated the effect of multicollinearity on our model, which occurs when predictor variables are highly correlated [19].If multicollinearity is present in a model, it is challenging to uniquely identify the contribution of each individual predictor variable.

III. RESULTS
In the following subsections, we present evidence regarding our research questions from Section I. Research questions (1) through (5) correspond to the following subsections A through E, respectively.

A. Correlations Between Engagement Behaviors
We calculated the Pearson correlation between several variables (see Fig. The variables most correlated with final grade were Video Quartiles Watched, Online HW Attempted, and Online HW Correct.The positive nature of this correlation indicates that as these variables increase, so does the average final grade.There are several other interesting patterns in the data that we believe can be used to inform future research.For example, there is high correlation between Fundamental Examples and Practice Problems.We hypothesize that there are groups of students who learn well by working through examples and problems.Using these correlations would help categorize students into groups after identifying common patterns between them, and we could target these groups with personalized feedback.Group characterizations would also be helpful in identifying "at-risk" students early in the term.

B. Cramming Behaviors
We found that students are most engaged in the course during the middle of the week (see Fig. 2) and are especially engaged during exam weeks.Furthermore, student engagement with BoxSand lecture videos changes on a weekly basis.To explore how this engagement relates to performance in the class, we calculated a "current grade" based on their running exam performance.In the weeks leading up to the first midterm, we used that exam score as their current grade.In the weeks between the 1 st and 2 nd midterm, we calculated current grades as the average of the two midterm grades.In the weeks after the 2 nd midterm, we calculated current grades as a weighted average of their midterm and final exam grades.We found that during most non-exam weeks, students who watched more videos tended to have higher grades.During exam weeks, however, the opposite was true; students who watched more videos tended to have lower grades (see Fig. 3).The BoxSand videos, which constitute the traditional content delivery lecture portion of the class, are intended to be viewed early in the learning cycle.Therefore, it is not surprising that watching videos right before exams does not adequately prepare the students to perform well.This evidence suggests that students who consistently watch videos tend to perform best in the course.
In contrast, when looking at online homework practice vs. current grades on a per week basis, this flip of correlation on exam weeks does not typically occur.The take-away is students should watch videos early when first introduced to a topic, but by the time exam preparation rolls around, they should be practicing problems.

C. Online Homework Engagement
Online homework has been shown to be an effective tool when learning physics [9,18].To visualize the distribution of student engagement with online homework and course grade, we made a 3-D bar plot (see Fig. 4).There is a clear pattern; students who engage more with Mastering Physics tend to perform better in the course.Due to the large number of problems assigned, full credit was set at 66.6% completion.Few students chose to complete more than that.

D. Changing Engagement Behaviors
We studied whether students who changed their engagement behaviors saw an effect in their exam grades by analyzing relationship between changes in the percentage of videos watched and exam grade between two subsequent exams.We found that as students increased their engagement with videos from one exam period to the next, they tended to see increases in their exam scores.If a student watched none of the videos during one period and then all of them the next, their average score increased by ~5% on the next exam.This trend of increasing engagement with videos correlates with increasing exam scores does continue throughout the year, but the effect gets smaller.We believe this is due to students changing their behavior less near the end of the term.

E. Linear Mixed Model Analysis
Engagement with pre and post lecture videos and online homework had the strongest correlations with final grade (see Fig. 1), and we believed these variables to be the most reliable online engagement predictors of course grade.Summaries of the  estimates (Table 1) from equation (1) yielded several promising findings.-7.12 % < 0.0001 First, a student who attempted all online homework problems had a 9% higher average grade than students who attempted none.Second, a student who answered all of their attempted online homework problems correctly had a ~10% higher average grade than students who answered none of their attempted problems correct.For example, a student who attempted all online homework problems and answered half of them correct saw an average increase of ~14% in their final grade.This is nearly a four-fold return on investment, as the online homework accounts for a maximum of 5% in the final grade (recall that this 14% is the raw increase in course grade after removing the direct contribution of online homework).This evidence suggests that online homework facilitates success in the other graded portions of the course, such as exams, and we think that students will find this empirical result encouraging when deciding whether to complete the online homework.Third, students who watched all pre and post lecture videos had a ~2.5% higher average grade than students who watched none.Though these videos do not impact final grade as drastically as online homework, they are still important, and can be the difference maker between letter grades.Fourth, average grades decreased in the winter and spring terms, likely due to the increasing complexity of the course material through successive terms or variance in exam difficulty.All p-values associated with the  estimates were small (Table 1), indicating they are statistically significant predictors of course grade.Lastly, between student variability (  2 ) is roughly five and a half times larger than term to term variability within students (  2 ).This is unsurprising because it is common for a student to receive similar grades in successive terms, while grades among different students can vary wildly.
None of the regression assumptions appeared violated after performing the appropriate diagnostic checks, which reinforces the validity of our model.We assessed the overall impact of multicollinearity through the condition number of the model [19].Condition numbers greater than 30 indicate the potential presence of problems associated with multicollinearity.In our model, however, the condition number was 17.66, which is evidence that multicollinearity did not affect our model results.

IV. CONCLUSIONS
In this paper, we found that student success, measured by grades, is strongly tied to engagement with the course's online components.Specifically, we found evidence that pre and post lecture videos and online homework are strong predictors of course grade, cramming behavior tends to be unhelpful, and changing engagement behaviors influences exam performance.These results reinforce what we believe most teachers already know; student engagement with the resources provided by the instructor encourages learning physics.This research is important because it gives empirical evidence of these conventional wisdoms, and we can use these results to encourage our students to follow the flipped classroom model.Furthermore, these results have been strong motivators for learning assistants, teaching assistants, and instructors, and are used to show students that engagement is crucial to student success.After all, we are teaching future scientists, and we would prefer they use these findings to draw their own conclusions about successful engagement behaviors rather than defaulting to the subjective opinions of their instructor.

A. Future work
We are working to encapsulate more than just online activities to provide a clearer picture of which overall engagement behaviors correlate with success in physics.Additionally we want to include other metrics of student success by incorporating aspects of learning not well represented in grades.While our exams and homework are written around our learning objectives, they don't effectively measure outcomes in areas such as collaborative learning and knowledge transfer to other fields.We hope future work will include a more holistic analysis of student success.
We also plan to include demographic information to understand when analyses generalize to all subgroups of students and when they do not.Another goal of future research is to create predictive models to identify struggling students and create interventions that help them back on track.We plan to use deep learning to identify patterns hidden in the data that will help classify types of student behaviors.This can then be a platform to base true adaptive learning practices and provide students individualized learning paths.Lastly, we want to use these results to inform qualitative PER scientists of interesting questions so that they can delve deeper into the root causes of these effects.
FIG 1. Pearson correlation between several BoxSand and online homework click-stream data sources and course grades.Each square represents the correlation between the corresponding variables on the x and y axes for 2017-2018.

FIG 3 .
FIG 3. Linear regression slope for number of quartiles of BoxSand videos watched vs. current grade within each week for fall 2017 and fall 2018.

FIG. 4 .
FIG. 4. Percentage of each student in an online homework engagement range vs. course grade for fall 2016.

Table 1 :
Model output for the 2017-2018 academic year Variable