Case Study (Salary) :

SKLearrn (Simple Linear Regression)

 - Simple Linear Regression: 

      -> aims to find the best-fitting line through the data points

 - Finding the line with the minimum sum of squares

      -> Sum of Squares: = Σ(y - ŷ)2

      -> y: Real Values , ŷ: Predicted Values

 - y = b0 + b1 x

     -> y is the dependent variable (variable on the Y axis)
     -> X is the independent variable (variable on the X axis)
     -> b1 is the slope of the line  (Coefficient)

     -> showing how much (Y) will change given a one-unit shift in (X) 
     -> while holding other variables in the model constant
     -> for 1 more year experience, the person will receive b$ on top of his salary 

 - b0 is the y-intercept (Constant)

     -> the point where the best fitting line cross the y-axis 
     -> when a person has no experience (X = 0), salary = a

Overview

- Importing the Relevant Libraries

- Loading the Data

- Declaring the Dependent and the Independent variables

- Splitting the dataset into the Training set and Test set

- Linear Regression Model

    - Creating a Linear Regression 
    - Fitting The Model
    - Predicting the Test Set Results

- Creating a Summary Table (Test Set Results)

- Making predictions 

    - Making a Single Observation Prediction
    - Making Multiple Observations Prediction

- R-Squared (R²) , Intercept , Coefficient

    - Calculating the R-squared (R²)
    - Finding the intercept
    - Finding the coefficients
    - Final Regression Equation (y = b0 + b1 x)

- Data visualization

    - Visualising the Training Set Results
    - Visualising the Test Set Results
    - Visualising the Train &Test Set Results on the same plot

Importing the Relevant Libraries

Loading the data

Declaring the Dependent and the Independent variables

    - x : (Independent variable)-> Input or Feature
    - y : (dependent variable)-> Output or Target 

Splitting the dataset into the Training set and Test set

Linear Regression Model

Creating a Linear Regression

- LinearRegression Class from linear_model Module of sklearn Library

- model -> Object of LinearRegression Class

Fitting The Model

- fit method -> training the model

Predicting the Test Set Results

- y_pred -> the predicted salaries

Creating a Summary Table (Test Set Results)

- Comparing predicted_salary & real_salary

Making Predictions

Making a Single Observation Prediction

- Predicting the salary of an employee with 12 years experience

Making Multiple Observations Prediction

- Predicting salaries of employees with 0, 1, 5 & 10 years experience

R-Squared (R²) , Intercept , Coefficient

Calculating the R-Squared (R²)

* What is R-squared?

- a statistical measure of how close the data are to the fitted regression line

- also known as the coefficient of determination

- R-squared is always between 0 and 100%

- the higher the R-squared, the better the model fits the data

Finding the Intercept (b0)

 -> y = b0 + b1 x

 -> the point where the best fitting line cross the y-axis 

 -> when a person has no experience (X = 0), salary = b0 = 26816.192244

Finding the coefficients (b1)

-> y = b0 + b1 x

-> for 1 more year experience, the person will receive b1 $ on top of his salary 

-> b1 = 9345.94

-> Person with no experience -> Salary: 26816.192244 $

-> Person with 1 year experience -> Salary: 36162.134687 $

-> 36162.134687 - 26816.192244 = 9345.94 $ more for 1 more year experience

Final Regression Equation (y = b0 + b1 x)

    - b0 = 26816.19
    - b1 = 9345.94
Salary = 26816.19 + 9345.94 × YearsExperience

Data visualization

Visualising the Training Set Results

Visualising the Test Set Results

- Visualising Regression Line:

    -> plt.plot(X_train, model.predict(X_train), color = 'red')

- The Regression Line is resulting from a unique equation:

    -> Salary = 26816.19 + 9345.94 × YearsExperience

- the predicted salaries of both training & test set

    -> will be on the same regression line

Visualising the Train &Test Set Results on the same plot

- Training set : Blue points
- Test set: Green points

- Visualising Regression Line:

        -> plt.plot(X_train, model.predict(X_train), color = 'red')

- The Regression Line is resulting from a unique equation:

        -> Salary = 26816.19 + 9345.94 × YearsExperience

- the predicted salaries of both training & test set

        -> will be on the same regression line