Regression
Regression is the process of finding a model that best fits a set of data.
For example
Prediction
Predict the most accurate answer such as the price of the car
Inference
Understand the association between variables
Simple linear regression
A model for predicting a quantitative response (Y) based on a single predictor variable (X).
R^2
The proportion of total variation in the response (y) explained by the least-squares regression line. For example How much of the up and down of a Camry can be explained by just it’s mileage.
Dummy variables
A technique to convert categorial data into numerical indicators (0s and 1s) for a model.
| Model | Is_camry | Is_tacoma |
|---|---|---|
| Camry | 1 | 0 |
| Tacoma | 0 | 1 |
| Volt | 0 | 0 |
| Using the combination of 0’s and 1’s we can mathematically represent categories for a regression model |
Multiple linear regression
This is when we have more than one predictor variable (X1, X2, …, Xp) to predict a quantitative response (Y).
Clustering
This is an unsupervised learning technique that involved finding subgroups in a dataset where there is no supervising output, which matches the problem description
Prediction problems fall under the category of Supervised Learning.
Steps to get MSE
The necessary order of operations for calculation is:
-
Obtain predictions (
). -
Find the residuals (
). -
Square the residuals.
-
Sum the squared residuals (this is the Residual Sum of Squares, RSS).
-
Divide the sum by
(or multiply by ).
(Note: Steps 4 and 5 can be combined by finding the mean of the squared residuals).
Practice
Since the notation
We will use the Simple Linear Regression (SLR) example from your sources for Camry vehicles.
R Exercise: Predicting Price and Calculating MSE
We will use the estimated regression equation for Camry's price based on mileage: $$ \hat{y} = 21.312 - 0.133x_{\text{miles}} $$
The goal of this exercise is to calculate the MSE, which is defined as: $$ MSE = \frac{1}{n} \sum_{i=1}^{n}\left(y_{i}-\hat{y}_{i}\right)^{2} $$
Your Task: Imagine you have a new set of 4 Camry vehicles. Use the equation above to find the predicted price (
| Car | Miles ( |
Observed Price ( |
Predicted Price ( |
Residual ( |
Squared Residual |
| 1 | 50 | 14.5 | ? | ? | ? |
| 2 | 100 | 6.1 | ? | ? | ? |
| 3 | 75 | 11.0 | ? | ? | ? |
| 4 | 20 | 18.9 | ? | ? | ? |
Step 1: Calculate the Predicted Price (
Predictions:
Residuals:
Square the residuals
Step 2: Calculate the Predicted Price (
Predictions:
Residuals:
Square the residuals
Step 3: Calculate the Predicted Price (
Predictions:
Residuals:
Square the residuals
Step 4: Calculate the Predicted Price (
Predictions:
Residuals:
Square the residuals
Step 5: Calculate the MSE