import pandas as pdExercise: Multi-variable Regression 1
The dataset progressively gets more complex:
- The Advertising Dataset has only
3numerical features, with200samples - The Auto MPG Dataset has
5numerical features and3categorical features, with398samples
Your task is to do the following:
- Load the data:
pd.read_csv() - Identify the features and their types:
df.info() - Identify the target and its type:
target_col = ... - Explore the data: stats (
df.describe()) and visuals (import seaborn as sns) - Initialize a regression model:
SGDRegressor - Train the model:
.fit() - Evaluate the model:
.score() - Inspect the model weights:
- Hint: you must pre-process the numerical data to get calibrated weights (
StandardScaler) - Example: how much does spending on TV ads contributes to Sales? (Hint:
model.coef_) - Example: how much the factors leave unexplained on what affects Sales? (Hint:
model.intercept_)
- Hint: you must pre-process the numerical data to get calibrated weights (
- Make predictions on new cases of your choice:
.predict()
1. Advertising Dataset
The Advertising Dataset is a fundamental resource in statistical learning and regression analysis. It is primarily known for its use in the first chapter of the seminal textbook “An Introduction to Statistical Learning” (ISLR) by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani.
The dataset is used to illustrate the relationship between advertising budgets across different media and the resulting product sales.
- Features:
3numerical - Target:
salesof the product (in thousands of units). - Size:
200samples. - Source: Advertising Dataset
pd.read_csv("../datasets/advertising.csv").head()| TV | Radio | Newspaper | Sales | |
|---|---|---|---|---|
| 0 | 230.1 | 37.8 | 69.2 | 22.1 |
| 1 | 44.5 | 39.3 | 45.1 | 10.4 |
| 2 | 17.2 | 45.9 | 69.3 | 12.0 |
| 3 | 151.5 | 41.3 | 58.5 | 16.5 |
| 4 | 180.8 | 10.8 | 58.4 | 17.9 |
# INSERT YOUR CODE2. Auto MPG Dataset
The Auto MPG Dataset is a classic benchmark for regression analysis in machine learning. It originally appeared in the 1983 American Statistical Association (ASA) Exposition and was later donated to the UCI Machine Learning Repository by Ross Quinlan in 1993.
The data consists of technical specifications for various car models from the late 1970s and early 1980s, primarily used to predict fuel efficiency (MPG).
- Features:
5numerical,3categorical - Target:
mpg(miles per gallon) - Size:
398samples - Source: Auto MPG Dataset
pd.read_csv("../datasets/auto-mpg.csv").head()| mpg | cylinders | displacement | horsepower | weight | acceleration | model year | origin | car name | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 18.0 | 8 | 307.0 | 130 | 3504 | 12.0 | 70 | 1 | chevrolet chevelle malibu |
| 1 | 15.0 | 8 | 350.0 | 165 | 3693 | 11.5 | 70 | 1 | buick skylark 320 |
| 2 | 18.0 | 8 | 318.0 | 150 | 3436 | 11.0 | 70 | 1 | plymouth satellite |
| 3 | 16.0 | 8 | 304.0 | 150 | 3433 | 12.0 | 70 | 1 | amc rebel sst |
| 4 | 17.0 | 8 | 302.0 | 140 | 3449 | 10.5 | 70 | 1 | ford torino |
# INSERT YOUR CODEFind more datasets on UCI Machine Learning Repository.