The impact of social integration on student persistence in introductory Modeling Instruction courses

Increasing student retention and persistence -- in particular classes or in their major area of study -- is a challenge for universities. Students' academic and social integration into an institution seems to be vital for student retention, yet, research on the effect of interpersonal interactions is rare. Social network analysis is an approach that can be used to identify patterns of interaction that contribute to integration into the university. We analyze how students position within a social network in a Modeling Instruction (MI) course that strongly emphasizes interactive learning impacts their persistence in taking a subsequent MI course. We find that students with higher centrality at the end of the first semester of MI are more likely to enroll in a second semester of MI. While the correlation with increased persistence is an ongoing study, these findings suggest that student social integration influences persistence.


I. INTRODUCTION
The publication of Vincent Tinto's student integration model marks the start of the dialogue on undergraduate retention. In his model, Tinto introduced the notion of "external" (e.g., families, neighborhoods, work settings) and "internal" (e.g., learning groups within a classroom, residence halls) communities that affect student integration into the social and academic environment of the university [1,2]. Increasing the retention of students in a particular course and their persistence in continuing through their major area of study and finishing their degree is a big challenge for universities. Based on the work of Tinto and others increasing social and academic integrations is one of prime targets to increase persistence.
Social network analysis (SNA) is a well-suited approach to study student academic and social integration. SNA can be used to identify patterns of interaction that contribute to integration into the university. It provides a methodology to assess the effect of interpersonal interactions on students' persistence. While students' academic and social integration into an institution seems to be essential to student retention, effective implementation of measures to prevent losing students is sparse. Developing network methodologies for studying retention and persistence among university students is a newly-forming research area [5,6].
We use SNA techniques to address questions of retention and persistence of students at Florida International University (FIU) -a large, Hispanic Serving Institution. In particular, we analyze how student's position within a social network in an introductory mechanics Modeling Instruction (M-MI) course impacts their persistence in taking a subsequent electricity and magnetism MI (EM-MI) course. Modeling Instruction is a guided-inquiry interactive-engagement method of teaching that organizes instruction around building, testing and applying a handful of scientific models that represent the content core of physics. Instead of relying on lectures and textbooks, the MI program emphasizes active student construction of conceptual and mathematical models in an strongly interactive learning community. It is therefore an important case for studying the effects of building student communities on promoting persistence.

II. METHODOLOGY
To collect social network data we have developed a pencil and paper survey that was administered in the introductory mechanics MI course in the Fall 2014. Every four weeks throughout the semester students were asked the following question: Name the individual(s) (first and last name) you had a meaningful classroom interaction* with today, even if you were not the main person speaking or contributing. (You may include names of students outside of the group you usually work with) *A classroom interaction includes but is not limited to people you worked with to solve physics problems and people that you watched or listened to while solving physics problems.
In the Spring semester, we presented students in the electricity and magnetism MI course with a modified version of the survey, containing a roster with names of all students enrolled in the course and a weighted version of the question about interactions, as shown in Fig. 1 .
The SNA data were collected over one semester of M-MI course (Fall 2014) and one semester of EM-MI course (Spring 2015). Both sections were taught by the same instructor accompanied by teaching assistants (the same two TAs in both semesters) and learning assistants (three LAs in a Fall semester and two LAs in a Spring semester, one overlapping person). In each semester we collected SNA data five times throughout the duration of the course. The total number of students enrolled in the M-MI was 73 and it was 74 for the electricity and magnetism MI. Both MI courses were taken by 40 students and a second semester of physics in a more traditional arrangement was taken by 10 students from M-MI. The response rates on all surveys but one were over 75% and therefore we disregarded the survey with an unusually low return (43%) from the analysis (the last survey in the Fall semester). In our analysis we are using the last valid survey from the Fall semester, that is SNA4.
SNA uses the notion of nodes (in our case students enrolled in M-MI) and edges (the interactions identified by students in the survey) to represent the network. From a graph theoretic perspective, the relative importance of a node within a graph is determined using centrality measures. To answer a question: "Who are the most important nodes in a network?" one has to determine how central each node is [7]. Evaluating the relative position of nodes in the network helps to understand the network and their participants.
There are various measures of centrality that quantify the importance of nodes and edges. In this paper we will focus on the four most commonly used measures: degree, eigenvector, betweenness and closeness (see Fig. 2).
The degree centrality of a node i, C D (i), is the number of edges connected to it, where x ij is the value of the edge from node i to node j (the value being either 1 if the tie is present or 0 otherwise) and n is the number of nodes in a network. In the case of a directed network, that is a network that takes into an account the origin of an edge, one can define two additional measures of degree centrality: indegree (the number of ties directed to the node, can be interpreted as popularity) and outdegree (the number of ties that the node directs to others, can be interpreted as sociability): The eigenvector centrality is the sum of a node's connections to other nodes weighted by their degrees and it measures the influence of a node in a network. It is given by an eigenvector, C E , of an adjacency matrix, A, corresponding to the greatest eigenvalue, λ max , that is A is a matrix related to a graph by a ij = 1 if a node i is connected to a node j by an edge and 0 if it is not and C E is a vector containing the centralities of all nodes in the network.
The (in/out)degree and eigenvector centralities are very intuitive and relatively easy to calculate. However, they are all local measures and the network outside of the immediate vicinity of a node -i.e., outside the "ego network" -has no influence on them.
The betweenness quantifies the number of times a node acts as a bridge along the shortest path linking two other nodes. It captures the importance of a position within a whole network and can be interpreted as a measure of how much control over the flow of information a node has. It's given by where σ ij (k) is the number of shortest paths linking node i to node j that pass through node k, σ ij the number of shortest paths linking node i to node j.
The closeness is the inverse of the sum of distances from all other nodes. It emphasizes a node's independence -a node that is close to many other nodes can easily reach others without having to rely much on intermediaries, thus gaining an easy access to information in the network. It is a measure of how near an individual is to all other nodes in a network. Closeness is defined as where d ij is the shortest distance connecting node i to node j. The network from survey SNA4 using the closeness as a measure of importance is visualized in Fig. 3.
To investigate correlations between the students' centralities, gender, ethnicity, major of study, final grade and their persistence in MI, a logistic regression model (LRM) was used. To avoid confounding factors we performed multivariate logistics regression. All variables significant for the univariate analysis were incorporated into the multivariate model. The comparison of goodness of fit of multivariate and univariate models was performed using the likelihood ratio test, with the null hypothesis stating that the univariate model is a better predictor of the persistence. The variance inflation factor (VIF) was calculated to estimate how much the variance of a coefficient was inflated because of linear dependence with other predictors. Finally, the mutual information approach was used to find the most significant split into the predicting/non-predicting categories for each of the centrality measures and the chi-square test was used to verify significance of this split [8]. For the statistical analysis we used the R program [9]. We considered results with p < 0.05 as significant.

III. FINDINGS
We analyze how a student's position within a social network in a M-MI course, which strongly emphasizes interactive learning, impacts their persistence in taking a subsequent EM-MI course. We consider two cases: (1) students persistence in physics, i.e., taking any form of the second semester physics, and (2) students persistence in MI. We are interested in interactions between students and therefore we excluded from the network all instructional staff. Using the Wilcoxon rank-sum test we found no evidence for a statistically significant differences between the two population medians (i.e., with and without instructors) for all centralities but closeness. Thus, for this last measure we considered two cases -without (closeness) and with (closenessINS) instructors.
As shown in Table I, we found statistically significant positive correlations with a degree, indegree and closeness for persistence in physics. For MI we found statistically significant correlations only for measures considering the entire social network, that is betweenness and closeness, and no statistically significant correlations for measures aimed at the students ego network.
To determine whether our univariate models can be improved we considered nested multivariate models for all the statistically significant centrality measures, with a student's gender, ethnicity, academic plan (declared major) and a final grade considered as additional predictors of the persistence. We found that only for the grade made a statistically significant difference in the model fits. Table II summarizes the results of the logistic regression for both in physics and in MI cases. However, when we compared the fit of the multivariate models to the fit of the models reduced to a grade as a sole predicting variable, we found significantly better fit only for the full betweenness model (χ 2 (1) = 7.89, p = 0.005). The variance inflation factors indicates the lack of collinearity among betweenness and grade (V IF = 1.03).
Finally, to optimize the correlation and to determine the predictability threshold for centralities we used the mutual information. Table III shows the threshold values  for each centrality measure and its significance level.

IV. DISCUSSION
We find that students with higher certain centrality measures at the end of the first semester of MI are in fact more likely to enroll in a second semester of physics. For the MI sequence, we found that students with low closeness seem to be more likely to enroll in a second semester of MI while (in/out/total)degree has no affect on their decision. On the other hand, students with high betweenness score tend to either switch to traditional curriculum or to leave physics altogether. Moreover, higher grades strengthen this negative correlation, that is students with higher final grades and high betweenness are the most likely to leave MI but remain in physics.
To explain this discrepancy one needs to understand what these two measures mean in practice. Closeness can be thought of as strong embeddedness within the entire network. Students with low closeness scores are close to all the other students in the network and thus they have an easy access to information from many sources. They are also -by sheer nature of this measure -connected to many students. This can help them appreciate all the benefits of having a strong network of connection within a classroom. Betweenness, on the other hand, depends mainly on the position within the network. In practice, in order to have high betweenness it suffices to be connecting clusters otherwise separate. Thus, students with high betweenness score are not necessarily connected to many other students.
For physics in general we found a statistically significant positive correlation between (in)degree and closeness. However, due to a small sample of students who took a non-MI physics this finding requires further study.
It should be kept in mind that a centrality which is appropriate for one category will often "get it wrong" when applied to a different category. More importantly, while centralities identify the most important vertices in a given network, this ranking cannot be generalized to the remaining vertices with lower scores -centrality does not indicate the relative importance of all vertices.
While the correlation with increased persistence is an ongoing study, these findings suggest that student social integration influences persistence.