Which line is better for fitting the data? A horizontal line that cuts through the average y value of our data is probably the worst fit of all, however, it gives us a good starting point for talking about how to find the optimal line to fit our data.
We can measure how well this line fits the data by seeing how close it is to the data points.
However, if we simply make
So they ended up squaring each term. Squaring ensures that each term is positive.
Here is the equation that shows the total distance the data points have from the horizontal line.
Since we want the line that will give us the smallest sum of squares, this method for finding the best values for “a” and “b” is called “Least Squares”.
If we plotted the sum of squared residuals vs each rotation, we will get this function. How do we find the optimal rotation for the line? We take the derivative of this function, the derivative tells us the slope of the function at every point. The slope at the best point(the “least squares”) is zero. Remember, the different rotations are just different values for “a” the slope) and “b”(the intercept.Taking the derivatives of both the slope and the intercepts tells us where the optimal values are for the best fit.
No one ever solves this problem by hand, this is done on a computer. This is done on a computer, so for most people, it is not essential to know how to take these derivatives.Big important concept: We want to minimize the square of the distance between the observed values and the line.
We do this by taking the derivative and finding where it is equal to 0. The final line minimizes the sums of squares(it gives the “least squares”) between it and the real data. In this case, the line is defined by the following equation y = 0.77*x + 0.66.
The key to understanding why we set both the derivative with respect to a
(the slope) and the derivative with respect to b
(the intercept) to zero in the context of minimizing the Sum of Squared Residuals (SSR) in linear regression lies in the principles of multivariable optimization.
In linear regression, SSR is a function of two variables, a
and b
:
Imagine we have a set of data points and a line
Similarly, if the derivative of SSR with respect to
In summary, a non-zero derivative at a point indicates that moving in the direction opposite to the sign of the derivative will decrease the function value (SSR in this case). In the context of linear regression, we need both derivatives (with respect to