Linear Regression

Fitting a line to a dataset of observations

Simple linear regression

Yβ0+β1XY \approx \beta_0 + \beta_1X

sometimes called as regressing YY onto XX

Estimating the coefficients

By using least squares which minimizes the sum of squared errors.

Let yi=β0^+β1^xiy_i = \hat{\beta_0} + \hat{\beta_1}x_i be the prediction for Y based on the ith value of X. Then ei=yiyi^e_i = y_i - \hat{y_i} represents the ith residual, This is the difference between the ith observed response value and the ith response value that is predicted by our linear model. We define the residual sum of squares (RSS) as

RSS=e12+e22+...+en2RSS = e_1^2 + e_2^2 + ... + e_n^2

If we minimize RSS by using calculus we get

β1^=i=1n(xix¯)(yiy¯)i=1n(xix¯)2 \hat{\beta_1} = \dfrac{\textstyle\sum_{i=1}^n(x_i - \bar{x})(y_i - \bar{y})}{\textstyle\sum_{i=1}^n(x_i - \bar{x})^2}

β0^=y¯β1^x¯ \hat{\beta_0} = \bar{y} - \hat{\beta_1}\bar{x}

where y¯1ni=1nyi\bar{y} \equiv \frac{1}{n} \textstyle\sum_{i=1}^ny_i and x¯1ni=1nxi\bar{x} \equiv \frac{1}{n} \textstyle\sum_{i=1}^nx_i

Accessing accuracy of coefficient estimates

R-Squared

R-Squared is a statistical measure of how close the data are to the fitted regression line. It is also known as the coefficient of determination.

Real world applications

  1. Predict temperature
  2. Predict stock price

results matching ""

    No results matching ""