import pandas as pdExercise: Linear Regression
Note: you can Download the notebook from the sidebar, where it says “Jupyter”.
Your task is to do the following:
- Load the data
- Explore the data
- Identify the features and their types
- Identify the target and its type
- Initialize a regression model
- Train the model
- Evaluate the model
- Make predictions on new cases of your choice
1. Hours vs Marks Dataset
Download Hours vs Marks Dataset [Source].
Some important characteristics of the dataset, to ask yourself about:
- What is the number of samples?
- What is the distribution of the features? (
min,max,mean,median,std,quantiles). Hint: useDataFrame.describe() - What is the distribution of the target?
- Are there missing values?
- What is the relationship between the feature and the target? Increasing, decreasing, random, or non-linear (changing: maybe, increasing until X, then decreasing)? Plot the feature vs the target to answer this.
pd.read_csv("../datasets/Rounded_Student_Hours_Studied_vs_Marks_Dataset.csv").head()| Hours_Studied | Marks | |
|---|---|---|
| 0 | 4.76 | 46.27 |
| 1 | 3.00 | 34.30 |
| 2 | 2.08 | 33.63 |
| 3 | 4.04 | 47.81 |
| 4 | 9.49 | 66.26 |
2. Experience vs Salary Data
Donwload Experience vs Salary Dataset
Is the relationship between the feature (Experience Years) and the target (Salary) linear?
pd.read_csv("../datasets/Salary Data.csv").head()| Experience Years | Salary | |
|---|---|---|
| 0 | 1.1 | 39343 |
| 1 | 1.2 | 42774 |
| 2 | 1.3 | 46205 |
| 3 | 1.5 | 37731 |
| 4 | 2.0 | 43525 |
3. BMI and Life Expectancy Dataset
Download BMI and Life Expectancy Dataset
Optional: the Country column can be used to group some countries together to find out some underlying patterns that are not directly visible in the data.
pd.read_csv("../datasets/bmi_and_life_expectancy.csv").head()| Country | Life expectancy | BMI | |
|---|---|---|---|
| 0 | Afghanistan | 52.8 | 20.62058 |
| 1 | Albania | 76.8 | 26.44657 |
| 2 | Algeria | 75.5 | 24.59620 |
| 3 | Andorra | 84.6 | 27.63048 |
| 4 | Angola | 56.7 | 22.25083 |