class: center, middle ## IMSE 440 ## Applied Statistical Models in Engineering
## Linear regression [ISL with Python book](https://www.statlearning.com): Chapter 3.1 --- ``` import pandas as pd ads = pd.read_csv("./data/advertising.csv") ``` .center[] --- ``` import seaborn.objects as so ( so.Plot(data=ads, x='TV', y='sales') .add(so.Dot()) ) ``` .center[] --- # Questions we may ask - .red[Is there a relationship] between advertising budget (TV, radio, and newspaper) and the sales? -- - If yes, .red[how strong] is the relationship? -- - Is the relationship .red[linear]? -- - .red[How accurately] can we estimate the effect of each channel on sales? -- - Given the budget, how accurately can we .red[predict] future sales? -- - Is there [synergy](https://en.wikipedia.org/wiki/Synergy) effect among the media? --- # [Linear regression](https://en.wikipedia.org/wiki/Linear_regression) Study the relationship between two or more variables $$\text{sales} = f(\text{TV}) = a + b \cdot \text{TV}$$ -- .green[Independent variables] (x): Input variables, also called .green[predictors] or .green[features]. -- .red[Dependent variable] (y): Output variable, also called .red[response] or .red[target]. -- $$ \begin{aligned} \text{sales} &= f(\text{TV}, \text{radio}, \text{newspaper}) \\\ &= \beta_0 + \beta_1 \cdot \text{TV} + \beta_2 \cdot \text{radio} + \beta_3 \cdot \text{newspaper} \\\ \end{aligned} $$ --- # [Simple linear regression](https://en.wikipedia.org/wiki/Simple_linear_regression) The model takes the form $$\require{color}\colorbox{yellow}{$Y= \beta_0 + \beta_1 \cdot x + \epsilon$}$$ $$\text{where }\; \epsilon \sim \text{N}(0, \sigma^2)$$ -- x is any given input to the model. -- epsilon is the error term. - modeled as a [normally-distributed](https://probstats.org/normal.html) random variable -- beta_0 & beta_1 are unknown parameters in the model. -- .red[Question: Is Y a random variable?] --- $$\require{color}\colorbox{yellow}{$Y= \beta_0 + \beta_1 \cdot x + \epsilon$}$$ $$\text{where }\; \epsilon \sim \text{N}(0, \sigma^2)$$ - What is the .red[expected value] of Y? - What is the .red[variance] of Y? - What .red[distribution] does Y follow? --- $$Y= \beta_0 + \beta_1 \cdot x + \epsilon$$ $$\text{where }\; \epsilon \sim \text{N}(0, \sigma^2)$$ $$Y \sim \text{N}(\beta_0 + \beta_1 x, \sigma^2)$$ beta_0: .red[intercept]: expected value of Y when x=0. --
beta_1: .red[slope]: expected change of Y per unit change of x. --- # Visualizing beta_0 and beta_1 .center[] --- # Estimate the parameters from data $$\text{E}[\text{sales}] = \beta_0 + \beta_1 \cdot \text{TV}$$ .center[] --- What would be our "best" estimates of beta_0 and beta_1? (What it the model that .green[fits the data the best]?) .center[] -- $$e_i=y_i-\hat{y}_i=y_i-(\beta_0+\beta_1x_i), \;\; i=1, 2, \cdots, n.$$ --- Can we use the sum of the residuals as a measure? $$e_1+e_2+\cdots+e_n$$ -- Bad idea, as the positive and negative residuals would cancel each other out. -- We square the residuals before summing them up. $$\begin{aligned} \text{RSS}&=e\_1^2+e\_2^2+\cdots+e\_n^2 \\\ \\\ &=\sum\_{i=1}^n e\_i^2 =\sum\_{i=1}^n \big[y\_i-(\beta\_0+\beta\_1 x\_i)\big]^2 \\\ \end{aligned} $$ It is called the [Residual Sum of Squares (RSS)](https://en.wikipedia.org/wiki/Residual_sum_of_squares). --- Given n pairs of data $$(x_1, y_1), (x_2, y_2), \cdots, (x_n, y_n)$$ We choose beta_0 and beta_1 such that the RSS is minimized. $$ \min\_{\beta\_0, \beta\_1}f(\beta\_0, \beta\_1)=\text{RSS}= \sum\_{i=1}^n \big[y\_i - (\beta\_0 + \beta\_1 x\_i)\big]^2 $$ This method is called [Ordinary Least Squares (OLS)](https://en.wikipedia.org/wiki/Ordinary_least_squares). --- # Ordinary Least Squares (OLS) $$ \min\_{\beta\_0, \beta\_1}f(\beta\_0, \beta\_1)=\sum\_{i=1}^n \big[y\_i - (\beta\_0 + \beta\_1x\_i)\big]^2 $$ -- Take partial derivatives w.r.t. beta_0 and beta_1, set them to 0. $$\begin{aligned} \frac{\partial f(\beta_0, \beta_1)}{\partial \beta_0} &=0 \\\ \\\ \frac{\partial f(\beta_0, \beta_1)}{\partial \beta_1} & =0 \end{aligned} $$ --- $$\begin{aligned} \frac{\partial f(\beta_0, \beta_1)}{\partial \beta_0} &=0 \\\ \\\ \frac{\partial f(\beta_0, \beta_1)}{\partial \beta_1} & =0 \end{aligned} $$ -- $$\begin{aligned} \frac{\partial f(\beta_0, \beta_1)}{\partial \beta_0} &= -\sum2\big[y_i - (\beta_0 + \beta_1x_i)\big]=0 \\\ \\\ \frac{\partial f(\beta_0, \beta_1)}{\partial \beta_1} &= -\sum2\big[y_i - (\beta_0 + \beta_1x_i)\big]x_i=0 \end{aligned} $$ -- Then use linear algebra to solve for beta_0 & beta_1. --- # Calculus refresher $$\begin{aligned} \frac{d}{dx}\(c)&=0 \\\ \\\ \frac{d}{dx}\(kx)&=k \\\ \\\ \frac{d}{dx}\big(x^2\big)&=2x \end{aligned} $$ The chain rule: $$\frac{d}{dx}\bigg(f\big(g(x)\big)\bigg)=f'\big(g(x)\big)\cdot g'(x)$$ --- # OLS solution
$$ \large{ \begin{aligned} \hat{\beta}\_1&=\frac{n\sum{x\_i y\_i}-\sum{x\_i}\sum{y\_i}}{n\sum{x\_i^2}-(\sum{x\_i})^2} \\\ \\\ &=\frac{\sum{(x\_i-\bar{x})(y\_i-\bar{y})}}{\sum{(x\_i-\bar{x})^2}} =\frac{S\_{xy}}{S\_{xx}} \\\ \\\ \hat{\beta}\_0&=\bar{y}-\bar{x}\hat{\beta}\_1 \\\ \end{aligned} } $$ --- $$(x_1, y_1), (x_2, y_2), \cdots, (x_n, y_n)$$ [Sample covariance](https://en.wikipedia.org/wiki/Covariance#Calculating_the_sample_covariance): $$\text{cov}(x, y)=\frac{\sum{(x\_i-\bar{x})(y\_i-\bar{y})}}{n-1}$$ [Sample variance](https://imse317.github.io/lectures/chap-1/measure-of-variability): $$\text{var}(x)=\frac{\sum{(x\_i-\bar{x})^2}}{n-1}$$ OLS slope estimate: $$\hat{\beta}\_1 = \frac{\sum{(x\_i-\bar{x})(y\_i-\bar{y})}}{\sum{(x\_i-\bar{x})^2}} =\frac{\text{cov}(x, y)}{\text{var}(x)}$$ --- # Visualizing OLS $$\small{\min_{\beta_0, \beta_1}f(\beta_0, \beta_1)=\sum\big[y_i - (\beta_0 + \beta_1x_i)\big]^2}$$
.center[] --- # Visualizing OLS
--- # Resources on probability and statistics
.large[.center[
]]
.large[.center[
]]