These exercises are not tied to a specific programming language. Example implementations are provided under the Code tab, but the Exercises can be implemented in whatever platform you wish to use (e.g., Excel, Python, MATLAB, etc.).
##Exercise 1 - LINEST
Plot the data of capacitance versus distance. Does this match your expectation based on the theory developed in the course?
The derivation for an infinite parallel plate capacitor leads to the following relationship
$$C = \frac{\epsilon_{0}A}{d}$$
where $\epsilon_{0}$ is the electric permittivity of free space, $A$ is the area of the plates and $d$ is the separation between the plates. We can use the data set to confirm the inverse relationship between capacitance and separation distance, but also extract a value for the electric permittivity.
As the experiment was performed there is no guarantee that there is not a systematic offset to the separation distance nor an offset to the capacitance as read on the meter. We can add in these potential offsets with parameters $d_{0}$ and $C_{0}$ respectively as seen here
$$C+C_{0} = \frac{\epsilon_{0}A}{d+d_{0}}$$
1. In your lab notebook, start by describing what $C_{0}$ and $d_{0}$ may represent. For instance, how could the separation distance measured always be offset by the same amount each time regardless of the amount of separation?
Any attempt to linearize this relationship with both $C_{0}$ and $d_{0}$ will be unsuccessful. So instead one must start by clearly making an assumption about one of the offset parameters. While it was just acknowledged that neither of these values is likely zero, it may be that one of them is comparatively smaller than its corresponding measurement (either $C_{0} << C$ or $d_{0} << d$). This allows one to assume the value is small enough to ignore (assume it is zero after all)!
2. You may not have a good reason to choose one over the other at this point, so just choose one offset to ignore and then linearize this new relationship.
3. To further investigate this issue, apply the other assumption regarding the offset parameters and linearize this slightly different relationship.
4. Identify the interpretation of the slope and intercept for each scenario.
5. Plot the data both ways, one with $C_{0}$=0 and one with $d_{0}$=0. What, if anything, do these plots indicate? Can looking at the data both ways justify which assumption is likely better? Discuss this clearly in your lab notebook.
*Note: While it is preferred to include the uncertainties in our measurements throughout the analysis, we will be assuming the uncertainty is the same for each measurement and omitting this in order to keep the process simple and focus on the overall method of least squares fitting.*
6. Run the LINEST function for the data plots for both methods and record the slope and intercepts with uncertainties. (Recall that in EXCEL one can obtain the slope and intercept with uncertainty by highlighting any empty four cells and enter =linest(y_data,x_data,true,true) then press Ctrl+Shift+Enter).
Make certain to include the interpretations of these slopes and intercepts in your lab notebook. Do the fitted values match expectations for the electric permittivity? How do the relative values of the systematic offsets of $d_{0}$ and $C_{0}$ compare to the measured values of d and C, and does this support your earlier conclusion of which assumption is better?
Even the most precisely measured data can be plagued with systematics, so it is important to consider these possibilities carefully! Now is there a way to avoid assuming some of them are ignorable - YES! But you will have to use non-linear fitting methods such as introduced in Exercise 3.
##Exercise 2 - Manual Iterative Minimization Methods
We will now repeat this analysis using different tools within EXCEL. The purpose behind this is two-fold. One reason is to open the black box containing the LINEST function so we know what it is doing, so to speak. The second reason is so that we can extend this technique in the future to do non-linear fitting of data. It is not always possible to linearize a relationship to accommodate experimental work. Some experiments have more than one variable or contain mathematical functions (e.g., trig functions) that cannot be treated by log plots and linear fits.
1. Click on the + symbol next to Sheet 1 at the bottom of the spreadsheet to create a new sheet, and copy the original data sets onto this sheet starting in row 5 or 6. Column A should contain the separation distance, Column B should calculate the 1/d values (independent variable, x), and Column C should contain the capacitance measurements (dependent variable y).
2. Create a plot of C vs 1/d. We will add to this plot later.
Now create the parameter space:
3. Type the following labels into Row 1 starting with Column A ending with Column E: Plate area (m^2), epsilon_0, C_0, d_0, and n (the number of data points).
Note: There is a function in EXCEL to count the number of cells in a range. You can use “=COUNT(A6:A21)” to calculate the value of n for the last cell. Either type in the range or after typing “=COUNT(“ you can click and drag to highlight the cells to count.
4. Below each of these labels, in Row 2, input the corresponding values. For now, let C_0 and d_0 be equal to 0.
It is possible to rename cells in EXCEL. This makes it easier to write and interpret equations in cells for calculations.
5. Do the following for each of the cells containing values in Row 2 where you use an appropriate name or abbreviation for each cell. This should be something short but recognizable! The instructions below are for a generic example to help you with the process
1. Click on the cell to rename
2. Click on the cell name box to the left of the input bar where it currently says A2. This should highlight the default cell name.
3. Type in your chosen name for that cell. For instance, “area” for the value of the area of the plates. Immediately hit Enter!
4. If you did not already hit the Enter key, do so now.
5. The new name should now show in the box! EXCEL now knows this cell to be both A2 and/or area.

This is effectively the same as naming a variable in a computer program. EXCEL will input whatever number is stored in that cell when you call A2 or “area” in a function or routine.
Create a column of predicted theoretical values of capacitance in Column D (predicted dependent variable $C_{th}$).
6. Use Eq. 16 and the parameters in the named cells to input the equation in the first cell next to the capacitance data and label it clearly.
Tip: Once the equation is typed into the first cell, click on that cell again and double click the small square at the bottom right of the cell as indicated to by the arrow below. This fills the column down until the last filled cell in the column to the left.

7. Add the predicted dependent variable vs independent variable ($C_{th}$ vs 1/d) to your first plot.
To do this, right click on your plot area and choose Select data, then click on Add and select/highlight the ranges of data you need to add. Format this data series so it appears as a line instead of individual data points.
Create a column for the Residuals, the difference of the data point from the predicted dependent variable $(C_{i} - C_{th})$. These values are great to plot versus the independent value to evaluate the quality of the fit!!
8. Input the equation, (=C6-D6) and fill down, to calculate the Residuals into Column E and label it clearly.
9. Plot these Residuals versus 1/d on a separate graph and place it near the first plot.
Create a column for the squared deviation, $(C_{i} – C_{th})^{2}$.
10. Input the equation for the squared deviation of the fit, fill it down the column and label it clearly.
11. Next to the last parameter label of n, in Column F, insert the label chi-squared/d.o.f. The abbreviation d.o.f. stands for degrees of freedom and is the number of data points minus the number of parameters being fit. Here this will be n-3, because we have 3 parameters ($\epsilon_{0}$, $C_{0}$ and $d_{0}$)
12. Below this label input the sum of the squared deviation column divided by the number of data points minus the number of parameters. =SUM(F5:F23)/(n-3).
13. Now adjust epsilon_0, C_0 and d_0 in cells B2, C2 and D2 to see how small you can make this value of chi-squared/d.o.f.
While playing with these values (iterating) also look at your plots of
- C vs 1/d to see how the fit changes compared to the data
- the residuals to see how they change.
Record some of the values you tried and the changes to the outcome in your lab notebook.
How do these values compare with the LINEST results from earlier?
##Exercise 3 - Built-in Minimization Methods
EXCEL has a built-in program to run an iterative minimization process. We will now utilize this Solver function to minimize the chi-squared per degree of freedom value.
1. Click to select the cell that contains the chi-squared per degree of freedom value (likely cell F2).
2. Click on the Data tab in EXCEL. There should be a section labeled Analysis at the far right. The Solver function should appear in this section. (If it is not there, then you will need to go through the Add-in process under the File menu.) Ask it to MINIMIZE the chi-squared/d.o.f. cell by checking the Min radio button, and inputting the parameter cells into the “By Changing Variables Cell” . Click on the box to Uncheck the “Make Unrestricted Variables Non-Negative”. If one or more of your parameters that you are trying to fit is very small, then click on “Options” and adjust the “Precision” to three orders of magnitude smaller than your smallest number. Finally hit the solve button at the bottom of the popup window.

In a minute or less it will hopefully converge to a solution (it usually does but convergence is not guaranteed). If it doesn’t converge or it gives a nonsense solution, your starting values may be way off. Try again with different starting values. You can use your plot and previous knowledge to zero in on good parameter values.
*NOTE: This method of minimizing chi-squared works for non-linear fits as well as our linear fit here!*
Record your results in you lab notebook. How do these values compare with LINEST and your previous changes made by “hand”? Comment.
In principle, a good fit should have chi-squared/d.o.f. around unity, though a value of 2 or 3 is acceptable, especially if you are uncertain of your error estimate. Values above 5 would be evidence for rejecting a fit, if you were fairly certain of the error estimates. You can look up probabilities in a statistics book under chi-squared distribution which will tell you the probability that the data could by chance be this bad and still fit the function.
##Exercise 4 – General Linear Regression
Now we will add to our EXCEL spreadsheet to directly use the equations derived in the Introduction to fit a slope and intercept to our data with values for the uncertainty of each.
1. Enter the text label x_ave [1/m] in cell B3, and input the value of the average of all the independent variables in cell B4 below it. You can use the equation =AVERAGE(B6:B21) for your specific range. Your independent variable here is 1/d.
2. Similarly, use cell C3 for the text label y_ave [nanoF] and C4 for the calculated value of the average of the dependent variable y. The dependent variable here is C.
3. Now create a column for the values of x^2 (1/d^2). This will likely be your Column G.
4. Use cell G3 for the text label “Sum x^2” and input =SUM(G6:G21) in cell G4. Rename cell G4 as sumx2
5. Similarly, use cell H3 for the text label “Sum xy” and input =SUM(H6:H21) in cell H4. Rename cell H4 as sumxy.
6. A couple additional columns may prove useful, such as the product of x and y $(C_{i}(1/d_{i}))$ and the difference in $x_{i} - x_{ave}$ squared $((1/d_{i}-1/d_{ave})^2)$
7. Use cells J2 and L2 as text labels for the slope and uncertainty of the slope. Input the equation for the slope, Eq. 11, and uncertainty, Eq. 13, in cells K2 and M2 respectively. Use the renamed labels for cells in your EXCEL equation.
Record your equation and results in your lab notebook. How so these values compare to LINEST?
8. Use cells J1 and L1 as text labels for the intercept and uncertainty of the intercept. Input the equation for the intercept, Eq. 12, and uncertainty, Eq. 14, in cells K1 and M1 respectively. Use the renamed labels for cells in your EXCEL equation.
Record your equation and results in your lab notebook. How so these values compare to LINEST?