CatBoost Regression¶

Combined Cycle Power Plant¶

Installing CatBoost¶

!pip install catboost

Requirement already satisfied: catboost in /home/bahar/anaconda3/lib/python3.7/site-packages (0.24.3)
Requirement already satisfied: scipy in /home/bahar/anaconda3/lib/python3.7/site-packages (from catboost) (1.5.0)
Requirement already satisfied: numpy>=1.16.0 in /home/bahar/anaconda3/lib/python3.7/site-packages (from catboost) (1.18.5)
Requirement already satisfied: plotly in /home/bahar/anaconda3/lib/python3.7/site-packages (from catboost) (4.9.0)
Requirement already satisfied: pandas>=0.24.0 in /home/bahar/anaconda3/lib/python3.7/site-packages (from catboost) (1.0.5)
Requirement already satisfied: matplotlib in /home/bahar/anaconda3/lib/python3.7/site-packages (from catboost) (3.2.2)
Requirement already satisfied: six in /home/bahar/anaconda3/lib/python3.7/site-packages (from catboost) (1.15.0)
Requirement already satisfied: graphviz in /home/bahar/anaconda3/lib/python3.7/site-packages (from catboost) (0.15)
Requirement already satisfied: retrying>=1.3.3 in /home/bahar/anaconda3/lib/python3.7/site-packages (from plotly->catboost) (1.3.3)
Requirement already satisfied: python-dateutil>=2.6.1 in /home/bahar/anaconda3/lib/python3.7/site-packages (from pandas>=0.24.0->catboost) (2.8.1)
Requirement already satisfied: pytz>=2017.2 in /home/bahar/anaconda3/lib/python3.7/site-packages (from pandas>=0.24.0->catboost) (2020.1)
Requirement already satisfied: kiwisolver>=1.0.1 in /home/bahar/anaconda3/lib/python3.7/site-packages (from matplotlib->catboost) (1.2.0)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /home/bahar/anaconda3/lib/python3.7/site-packages (from matplotlib->catboost) (2.4.7)
Requirement already satisfied: cycler>=0.10 in /home/bahar/anaconda3/lib/python3.7/site-packages (from matplotlib->catboost) (0.10.0)

Attribute Information:¶

   Features consist of hourly average ambient variables

    - Temperature (T) in the range 1.81°C and 37.11°C,

    - Ambient Pressure (AP) in the range 992.89-1033.30 milibar,

    - Relative Humidity (RH) in the range 25.56% to 100.16%

    - Exhaust Vacuum (V) in teh range 25.36-81.56 cm Hg

    - Net hourly electrical energy output (EP) 420.26-495.76 MW

Importing the Relevant Libraries¶

import numpy as np
import pandas as pd

Importing the Dataset¶

url = "https://DataScienceSchools.github.io/Machine_Learning/Sklearn/Case_Study/Regression/PowerPlant/PowerPlant.csv"

df = pd.read_csv(url)

df.head()

Declaring the Dependent & the Independent Variables¶

X = df.iloc[:, :-1].values

y = df.iloc[:, -1].values

Splitting the Dataset into the Training Set and Test Set¶

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

Training the CatBoost Regression Model¶

from xgboost import XGBRegressor

model = XGBRegressor()

model.fit(X_train, y_train)

XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
             colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
             importance_type='gain', interaction_constraints='',
             learning_rate=0.300000012, max_delta_step=0, max_depth=6,
             min_child_weight=1, missing=nan, monotone_constraints='()',
             n_estimators=100, n_jobs=0, num_parallel_tree=1, random_state=0,
             reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
             tree_method='exact', validate_parameters=1, verbosity=None)

Predicting the Test Set results¶

y_pred = model.predict(X_test)

Comparing Predicted Y with Real Y (Test Set)¶

data = pd.DataFrame()

pd.set_option('precision', 2)

data['Predicted_Y'] = y_pred

data['Real_Y'] = y_test

data

Evaluating the Model Performance¶

from sklearn.metrics import r2_score

r2_score(y_test, y_pred)

0.9679174685442539

K-Fold Cross Validation¶

from sklearn.model_selection import cross_val_score

accuracies = cross_val_score(estimator = model, X = X_train, y = y_train, cv = 10)

print("Accuracy: {:.2f} %".format(accuracies.mean()*100))

print("Standard Deviation: {:.2f} %".format(accuracies.std()*100))

Accuracy: 96.45 %
Standard Deviation: 0.58 %

	AT	V	AP	RH	PE
0	8.34	40.77	1010.84	90.01	480.48
1	23.64	58.49	1011.40	74.20	445.75
2	29.74	56.90	1007.15	41.91	438.76
3	19.07	49.69	1007.22	76.79	453.09
4	11.80	40.66	1017.13	97.20	464.43

	Predicted_Y	Real_Y
0	427.90	426.18
1	450.53	451.10
2	442.27	442.87
3	442.83	443.70
4	461.28	460.59
...	...	...
1909	464.84	468.19
1910	433.22	431.16
1911	454.89	454.20
1912	445.49	444.13
1913	435.97	436.58