Students’ dynamic engagement with experimental data in a physics laboratory setting

Increased emphasis has been given to students’ engagement with experimental data as reform efforts have continued to transform the landscape of introductory physics laboratory courses by providing greater opportunities for authentic scientific inquiry and student agency. As a result, students become the primary driving forces of their own experimentation, and the manner in which they engage with experimental data becomes more dynamic and nonlinear. This study presents ongoing efforts to illuminate the nuanced ways students enact various data-based actions when engaging in physics laboratory experiments. In this paper, we present a single case-study analysis of a student group engaging in an inquiry-based physics laboratory to highlight the dynamic and iterative ways the group shifts between multiple data-based actions when expected to be engaging in a single laboratory task. Research data comes from audio and video files of students’ computers while they engaged in lab experimentation, coded using a constructivist grounded theory approach to identify multiple data-based actions performed by the students. Results of this case study show that students oftentimes shift between multiple data-based actions on short timescales and that these shifts can take place with implicit iterative patterns, even when the instructional setting is structured for a single experimental task.

Increased emphasis has been given to students' engagement with experimental data as reform efforts have continued to transform the landscape of introductory physics laboratory courses by providing greater opportunities for authentic scientific inquiry and student agency. As a result, students become the primary driving forces of their own experimentation, and the manner in which they engage with experimental data becomes more dynamic and nonlinear. This study presents ongoing efforts to illuminate the nuanced ways students enact various data-based actions when engaging in physics laboratory experiments. In this paper, we present a single case-study analysis of a student group engaging in an inquiry-based physics laboratory to highlight the dynamic and iterative ways the group shifts between multiple data-based actions when expected to be engaging in a single laboratory task. Research data comes from audio and video files of students' computers while they engaged in lab experimentation, coded using a constructivist grounded theory approach to identify multiple data-based actions performed by the students. Results of this case study show that students oftentimes shift between multiple data-based actions on short timescales and that these shifts can take place with implicit iterative patterns, even when the instructional setting is structured for a single experimental task. Further distribution must maintain the cover page and attribution to the article's authors.

I. INTRODUCTION
As the STEM workplace becomes more reliant on data and data analytics, stakeholders invested in undergraduate students have begun advocating for future STEM professionals to obtain greater proficiency with data-based practices (e.g. managing and analyzing data sets, differentiating anomalous data points from normal statistical scatter) during their undergraduate careers [1,2]. As a result, experimental data has received greater attention in undergraduate STEM curriculum. Within the undergraduate physics community, introductory physics laboratory courses serving physics and non-physics majors alike have increasingly incorporated experiments that prompt students to collect and analyze complex quantitative data to extract meaning about authentic scientific phenomena [3,4]. Many of these shifts have also resulted in more opportunities for students to engage in authentic scientific inquiry [5,6]. Opening physics lab courses to inquiry-based instruction can, in turn, heighten student agency, allowing students opportunities to make more decisions during experimentation rather than follow prescribed steps [7][8][9].
Authentic scientific inquiry is more dynamic and nuanced than the rigid scientific method commonly presented in traditional K-12 and undergraduate instruction [10], especially when students have more agency in their inquiry. This shift towards more authentic scientific inquiry is in line with recent trends at the K-12 level which advocate for a "scienceas-practice" approach [11][12][13][14][15][16]. As students make decisions about how to collect, analyze, and interpret their experimental data, they oftentimes divert from the traditional linear progression of experimental activities. While students in traditional physics labs are commonly expected to engage with experimental data in a linear fashion -from data collection, to analysis, to interpretation, to extracting results, with lab time allocated to each experimental step [17,18] -these shifts toward more authentic scientific experiences, where students can experimentally act with their own agency, result in students' less procedural, less linear, and more authentic engagement with experimental data.
In order to more effectively develop curriculum for physics courses that provides students authentic experiences with experimental data, what is needed is a more robust understanding of the data-based actions students utilize and how this utilization occurs. In this paper, we consider data-based actions to be the processes and practices, either physical (e.g. manipulating lab equipment) or verbal (e.g. engaging in dialogue to prepare group's collection of data), that students engage with when working with experimental data. This new understanding will influence curricular efforts to further implement authentic experiences with experimental data and because it may shed light on student inquiry more broadly when engaged with authentic data-based actions.
To this end, we ask the research questions: 1) What databased actions do students engage in when working with experimental data in an introductory physics laboratory course? and 2) In what ways do students shift between data-based ac-tions while working with experimental data in this setting?
To answer these research questions, this paper presents a case study of a student group's dynamic engagement with data-based actions while working with experimental data in a reformed introductory physics laboratory course. The results highlight the dynamic and nonlinear manner of students' utilization of data-based actions during a short period of laboratory activities when students had been ostensibly instructed to engage in a single experimental task. That is, the results capture that even when a single data-based action was expected from an instructional standpoint, students engaged in multiple data-based actions in dynamic and non-linear ways.

A. Setting
This study was conducted in a reformed Introductory Physics for Life Sciences (IPLS) laboratory course [4,5,19] at a large research intensive university in the western United States. This course was chosen intentionally due to a) its incorporation of 21 st century equipment and software which foster complex quantitative experimental data; and b) its pedagogical focus on prompting students to utilize their own agency during experimentation. Students enrolled are primarily life science majors (e.g. pre-medical, biology, etc.) in their final two years of undergraduate studies.
In this course, groups of four (or three) students design their own experiment based on an open-ended guiding prompt with support from teaching and learning assistants. Students have many opportunities to make decisions during their experiments, including what data they collect and in what ways, how they analyze, represent, and interpret that data, how they go about answering their research question, etc. While students maintain considerable agency in their experimentation, the class time is semi-structured such that students have periods of time specifically allocated for different experimental tasks -practice with lab equipment and techniques, experimental design planning, experimentation, group presentations, and writing individual lab reports. For example, in Lab 1 students follow a semi-structured timeline as follows: Each lab experiment contains similar structured timing with slight variations. Lab experiment guiding prompts include topics such as biological kinematics, diffusion in living systems, hemodynamics, and modeling of axon signal transport. Students primarily work on lab computers, with complementary equipment such as optical microscopes and spectrometers, to collect and analyze data.

B. Methods
We utilized a constructivist grounded theory approach to identify data-based actions students commonly enact when engaging with experimental data [20,21]. Video screen capture and associated audio data were collected from lab computers used by fourteen groups across all four of the course's multi-week lab sequences, resulting in roughly one-hundred hours of video screen capture and audio data. Video screen capture and audio data was used for this study because lab computers are the primary means by which students engage with experimentation and data in this course.
To start, a selection of screen capture and audio data was open coded to identify fine-grained instances that contained students' different types of engagement with their group's experimental data. During open coding, particular attention was given to student actions on the computer or with lab equipment, as well as their verbalization of experimental steps. These fine-grained instances were iteratively added upon through coding of additional selections of observational data until broader core categories became salient; the finegrained instances were then grouped into core categories. These categories were iteratively cross-referenced with additional selections of observational data to reach saturation and minimize potential overlap between categories [21,22].
The resultant code book consists of Data Collection (DCo), Data Cleaning (DCl), Data Organization (DO), Data Manipulation (DM), Data Representation (DR), Hypothesizing (Hyp), Experimental Planning (EP), and Interpretation (Int). These are the broad data-based actions that students are observed to engage in when working with experimental data in this setting; the code book is presented in Table I with definitions and examples. Because of the broad nature of these data-based actions and the population from which this research data comes, this list is not meant to be exhaustive, but is rather meant to provide a useful instrument to support our analysis and answer our research questions.
A feature of this set of data-based actions is its hierarchical structure; students engage in some of the data-based actions in reference to other actions. When students engage in the data-based actions Hyp, EP, and Int, they typically do so while referencing another data-based action that is either forthcoming or had previously occurred. Thus, when Hyp, EP, and Int are coded in reference to another data-based action, the code is listed as Primary Action -XX, where XX is the secondary action. For example, when a student engages in EP while planning steps their group will take to collect experimental data (DCo), this is coded as EP -DCo.
Upon completion of this code book, we then conducted a single case-study analysis of a student-group to better understand how students utilize and shift between data-based actions when engaging with experimental data. The case involved is a group of four students (Students A, B, C, and D) engaging in an experiment to analyze zebrafish kinematics; they focused on the social dynamics of zebrafish in a tank by analyzing distances and speeds while the fish swam to then deduce social behaviors (they were provided videos of zebrafish swimming in a small tank). The group was chosen specifically due to: a) their equitable discussions regarding the experimentation (one student did not dominate the conversation or decision making); and b) they were not considered an exemplary group in the course -their lab grades, engagement in class, or manner of experimentation were not noteworthy compared to other groups. Based on the screen capture and audio data and the group's interactions with instructional team members, it appears that the group was on task and doing what was expected of them during class time.
The focus of our analysis is a roughly fifteen minute segment of class time devoted to collecting experimental data. This episode of the group's experimentation was specifically chosen because it occurs at the beginning of the "Initial Data Collection" phase of class time, and from preliminary analysis, this episode appeared particularly dense with many databased actions apparent. During this portion of class time, students are expected to have finished developing their design plans -an overview of their research question and methods to collect and analyze data -and begin collecting data for their experiment. We coded student utterances and physical actions on the computer using the enumerated data-based actions to develop a deeper understanding of the ways students engage with experimental data, specifically during a segment of class time that aligns with a single data-based action. For our study, an utterance can range from a single word to a series of statements that explicitly encompass the same databased action. Once a student's comment extends beyond multiple statements, the comment's underlying data-based action shifts, or the speaker changes, a new utterance is defined.
As not all students were working on the lab computer at the same time, inferential coding took place based on student utterances to code multiple students' actions happening simultaneously, or when students' data-based actions were taking place away from the lab computer. For example, in the episode Student D states, "And now two-point-five divided by seventy-one-..." while Student A is using the lab computer to interpret the group's data collection parameters. We interpreted Student D's statement as their enactment of DM because it was apparent from the group's dialogue that they were performing a calculation of the group's data using alternate means (e.g. handheld calculator, personal computer). Figure 1 shows the sequence of students' enacted databased actions during the episode. There are several noteworthy aspects of this student group's enactment of data-based actions. First, though this episode comes from a segment of class time devoted to "Initial Data Collection," the group is primarily engaged with EP rather than DCo; they were planning for future enactment of data-based actions rather than carrying out those actions themselves. Before this episode, the group completed and submitted their experimental design Identifying and mitigating artifacts (e.g. glitches) in the data that are misaligned with expectations or hypotheses of the experiment of experimental step, often due to equipment or software bugs Students recognize that the zebrafish video glitches by observing the zebrafish "bounces back" instead of swims forward. Performing calculations to transform experimental data from a raw state to a state appropriate for further manipulation, representation, or interpretation Student A performs calculation to determine the distance-to-pixel ratio for the zebrafish to aid in scaling the position-tracking data from pixels to centimeters. Student A: "So twenty-five thousand divided by seventy-one-point-one equal three-hundred and fifty-one-point-six-two."

Data Representation (DR)
Creating representations (numerical, graphical, tabular, etc.) Students create a scatterplot of fish swimming versus distance between fish to interpret relationship between two variables.

Hypothesizing (Hyp)
Developing and discussing initial hypotheses of experiment as a whole or future data-based actions (coded in conjunction with other actions, represented as "Hyp -XX") Student A hypothesizes that there are going to be thirty data points that result from collecting position-tracking data of swimming zebrafish. Students discuss what data they should collect for their experiment. (EP -DCo) Student D: "I think, okay, I think we should track, that's a good point, I think we should also record the distance from the nearest ..." Student C: "'Cause, like, distance is, like, relat, like, I don't know, dist, our distance measurement should be from the closest fish."

Interpretation (Int)
Assessing the result of previously enacted data-based actions to test experimental hypotheses, generate experimental claims, extract meaning and results to explain phenomena, etc. (coded in conjunction with other actions, represented as "Int -XX") Students A and D assess resulting raw data from position-tracking a zebrafish to verify that data collection was carried out properly. plan prior to moving onto the "Initial Data Collection" phase of class time. However, they spent considerable time during this episode planning for multiple future data-based actions (see Figure 2). While this extensive planning during the "Initial Data Collection" phase could be due to several factors, it may suggest that students iteratively plan throughout experimentation rather than developing a formal plan beforehand, even when time is explicitly given for such formal planning.
Also noteworthy is an emerging pattern of co-occurrences of some data-based actions and not others. For instance, when DCl is coded from utterances 229 to 261, the group is identifying and mitigating a glitch in the zebrafish video that they consider has potential to cause significant issues in their data collection and analysis; during this segment of time, the group minimally engages in other data-based actions. This also occurs from utterances 263 to 293 (DCo) and utterances 110 to 125 (DM), among other instances. Generally, this longer-lasting, stand-alone feature is common for DCo, DCl, DO, and DM in these data. This is in contrast to Hyp, EP, and Int, which commonly occur with shorter lengths of time per action, as well as being more significantly intermixed with other actions. Because DR was not enacted in this episode, we refrain from stating that it falls into either categorization.
Finally, the student group engaged in EP and Int in preparation for and in reference to different data-based actions, not solely DCo as implied by the class time's intended structure. Because of the data-based actions' hierarchical structure, EP and Int were coded with secondary actions (see II B). Figure 2 shows the whole-group breakdown of other data-based actions students referred to when enacting EP, Int, and Hyp. By comparing Figures 1 and 2, we begin to see preliminary iterative patterns of EP in preparation of a future data-based action, then enactment of the data-based action, then Int of the results of the data-based action. For instance, during utterances 1 to 50, the group is primarily enacting EP-DCo (see Figure 2) by discussing what parameters they need to adjust on the computer software to collect accurate position-tracking data for the zebrafish. Then, beginning at utterance 88 until utterance 121 they engage in DCo (see Figure 1) by measuring and entering into the computer software their parameters with support from the teaching assistant. During this timeframe, the group shifts several times between DCo and Int-DCo, assessing their measurements and resultant parameters entered into the software. This cycle repeats later when, beginning at utterance 155, the group again enacts EP-DCo (see Figure 2) by discussing each group member's responsibilities during future DCo steps, what data they need to collect to answer their research question, and what features of the software are necessary to capture this data. Then, from utterances 217 to 293 the group is primarily engaged in DCo (see Figure  1), when Student A is using the computer software to man-ually track a zebrafish to obtain its x-y coordinate data as a function of time. During and after this DCo, the group engages in Int-DCo (utterances 284 to 323, see Figure 2) by verifying that the resultant data was in the expected format and units, and that the amount of data collected matched the length of the zebrafish video. Overall, emerging from this case is an iterative pattern of student engagement in EP in preparation for another data-based action, then enactment of that action, then Int of the results of that action; this occurs multiple times during this episode at short timescales.

IV. DISCUSSION AND CONCLUSIONS
This study suggests that students dynamically engage in data-based actions when working with experimental data in introductory physics labs. The case study described above sheds light on the dynamic manner in which students engage in data-based actions, even within a predefined time period in the classroom devoted to a single aspect of experimentation. Notably, the group represented in this case study were observed to iteratively shift between multiple data-based actions in a recurring fashion.
This study is not without its limitations. For example, while the choice to focus on a 15 minute episode during the "Initial Data Collection" phase of class time was intentional, it limits the scope of the results and does not allow for generalization of the dynamic nature of student enactment of data-based actions into other segments of physics laboratories, such as when students complete data analysis and begin formulating their experimental results.
These results also raise new questions that could potentially serve as motivation for future research endeavors. Do these dynamic shifts in student enactment of data-based actions occur throughout an entire physics experiment? What other iterative patterns are prominent during student enactment of data-based actions? What causes students or groups to shift between data-based actions? What timescales are pertinent to these shifts? How do data-based actions align with their summative assessment submissions (i.e. lab reports)? We contend that further exploration of the complexities of students' engagement with experimental data in these settings can elicit further knowledge that can positively influence continued reform efforts in physics classrooms and STEM classrooms more broadly (e.g. [5,6,23,24]).