Logistic Regression¶

Titanic Survival¶

The sinking of the Titanic on April 15th, 1912 is one of the most tragic tragedies in history. The Titanic sank after colliding with an iceberg, killing 1502 out of 2224 passengers. The numbers of survivors were low due to the lack of lifeboats for all passengers and crew. Some passengers were more likely to survive than others, such as women, children, and upper-class. This case study analyzes what sorts of people were likely to survive this tragedy. The dataset includes the following:

Pclass: Ticket class (1 = 1st, 2 = 2nd, 3 = 3rd)
Sex: Sex
Age: Age in years
Sibsp: # of siblings / spouses aboard the Titanic
Parch: # of parents / children aboard the Titanic
Ticket: Ticket number
Fare: Passenger fare
Cabin: Cabin number
Embarked: Port of Embarkation C = Cherbourg, Q = Queenstown, S = Southampton

Target class: Survived: Survival (0 = No, 1 = Yes)

Download Dataset

Importing the Relevant Libraries¶

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

Importing the Dataset¶

url = "https://datascienceschools.github.io/Machine_Learning/Classification_Models_CaseStudies/Train_Titanic.csv"

df = pd.read_csv(url)

df.head()

Checking Missing Values¶

- Cabin & Embarked are unnecessary columns -> drop them after data visualisation

df.isnull().sum()

PassengerId      0
Survived         0
Pclass           0
Name             0
Sex              0
Age            177
SibSp            0
Parch            0
Ticket           0
Fare             0
Cabin          687
Embarked         2
dtype: int64

Checking Missing Values (Heatmap)¶

sns.heatmap(df.isnull(),yticklabels=False,cbar=False,cmap='viridis')

plt.show()

Handling Missing Values (Age)¶

def Fill_Age(data):
    age = data[0]
    sex = data[1]

    if pd.isnull(age):
        if sex is 'male': 
            return 29
        else:
            return 27
    else:
        return age
    

df['Age'] = df[['Age','Sex']].apply(Fill_Age,axis=1)

df.isnull().sum()

PassengerId      0
Survived         0
Pclass           0
Name             0
Sex              0
Age              0
SibSp            0
Parch            0
Ticket           0
Fare             0
Cabin          687
Embarked         2
dtype: int64

Explore the Dataset¶

Percentage of passengers Survived / Not Survived¶

survived = df[df['Survived'] == 1]

not_survived = df[df['Survived'] == 0]

print("Total =", len(df))

print("\nNumber of Survived passengers =", len(survived))
print("Percentage Survived = {:.2f}%".format(len(survived)*100/len(df)))
 
print("\nDid not Survive =", len(not_survived))
print("Percentage who did not survive = {:.2f}%".format(len(not_survived)*100/len(df)))

Total = 891

Number of Survived passengers = 342
Percentage Survived = 38.38%

Did not Survive = 549
Percentage who did not survive = 61.62%

Cufflinks for Plots¶

Source

Installing cufflinks¶

!pip install cufflinks

Requirement already satisfied: cufflinks in /home/bahar/anaconda3/lib/python3.7/site-packages (0.17.3)
Requirement already satisfied: colorlover>=0.2.1 in /home/bahar/anaconda3/lib/python3.7/site-packages (from cufflinks) (0.3.0)
Requirement already satisfied: ipython>=5.3.0 in /home/bahar/.local/lib/python3.7/site-packages (from cufflinks) (7.16.1)
Requirement already satisfied: setuptools>=34.4.1 in /home/bahar/anaconda3/lib/python3.7/site-packages (from cufflinks) (49.2.1)
Requirement already satisfied: plotly>=4.1.1 in /home/bahar/anaconda3/lib/python3.7/site-packages (from cufflinks) (4.9.0)
Requirement already satisfied: numpy>=1.9.2 in /home/bahar/anaconda3/lib/python3.7/site-packages (from cufflinks) (1.18.5)
Requirement already satisfied: ipywidgets>=7.0.0 in /home/bahar/anaconda3/lib/python3.7/site-packages (from cufflinks) (7.5.1)
Requirement already satisfied: pandas>=0.19.2 in /home/bahar/anaconda3/lib/python3.7/site-packages (from cufflinks) (1.2.4)
Requirement already satisfied: six>=1.9.0 in /home/bahar/anaconda3/lib/python3.7/site-packages (from cufflinks) (1.15.0)
Requirement already satisfied: pickleshare in /home/bahar/.local/lib/python3.7/site-packages (from ipython>=5.3.0->cufflinks) (0.7.5)
Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in /home/bahar/.local/lib/python3.7/site-packages (from ipython>=5.3.0->cufflinks) (3.0.5)
Requirement already satisfied: decorator in /home/bahar/.local/lib/python3.7/site-packages (from ipython>=5.3.0->cufflinks) (4.4.2)
Requirement already satisfied: pygments in /home/bahar/.local/lib/python3.7/site-packages (from ipython>=5.3.0->cufflinks) (2.6.1)
Requirement already satisfied: backcall in /home/bahar/.local/lib/python3.7/site-packages (from ipython>=5.3.0->cufflinks) (0.2.0)
Requirement already satisfied: jedi>=0.10 in /home/bahar/.local/lib/python3.7/site-packages (from ipython>=5.3.0->cufflinks) (0.17.1)
Requirement already satisfied: traitlets>=4.2 in /home/bahar/.local/lib/python3.7/site-packages (from ipython>=5.3.0->cufflinks) (4.3.3)
Requirement already satisfied: pexpect in /home/bahar/anaconda3/lib/python3.7/site-packages (from ipython>=5.3.0->cufflinks) (4.8.0)
Requirement already satisfied: ipykernel>=4.5.1 in /home/bahar/anaconda3/lib/python3.7/site-packages (from ipywidgets>=7.0.0->cufflinks) (5.3.2)
Requirement already satisfied: nbformat>=4.2.0 in /home/bahar/anaconda3/lib/python3.7/site-packages (from ipywidgets>=7.0.0->cufflinks) (5.0.7)
Requirement already satisfied: widgetsnbextension~=3.5.0 in /home/bahar/anaconda3/lib/python3.7/site-packages (from ipywidgets>=7.0.0->cufflinks) (3.5.1)
Requirement already satisfied: jupyter-client in /home/bahar/anaconda3/lib/python3.7/site-packages (from ipykernel>=4.5.1->ipywidgets>=7.0.0->cufflinks) (6.1.6)
Requirement already satisfied: tornado>=4.2 in /home/bahar/anaconda3/lib/python3.7/site-packages (from ipykernel>=4.5.1->ipywidgets>=7.0.0->cufflinks) (6.0.4)
Requirement already satisfied: parso<0.8.0,>=0.7.0 in /home/bahar/.local/lib/python3.7/site-packages (from jedi>=0.10->ipython>=5.3.0->cufflinks) (0.7.0)
Requirement already satisfied: jupyter-core in /home/bahar/anaconda3/lib/python3.7/site-packages (from nbformat>=4.2.0->ipywidgets>=7.0.0->cufflinks) (4.6.3)
Requirement already satisfied: ipython-genutils in /home/bahar/.local/lib/python3.7/site-packages (from nbformat>=4.2.0->ipywidgets>=7.0.0->cufflinks) (0.2.0)
Requirement already satisfied: jsonschema!=2.5.0,>=2.4 in /home/bahar/anaconda3/lib/python3.7/site-packages (from nbformat>=4.2.0->ipywidgets>=7.0.0->cufflinks) (3.2.0)
Requirement already satisfied: attrs>=17.4.0 in /home/bahar/anaconda3/lib/python3.7/site-packages (from jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets>=7.0.0->cufflinks) (19.3.0)
Requirement already satisfied: pyrsistent>=0.14.0 in /home/bahar/anaconda3/lib/python3.7/site-packages (from jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets>=7.0.0->cufflinks) (0.16.0)
Requirement already satisfied: importlib-metadata in /home/bahar/anaconda3/lib/python3.7/site-packages (from jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets>=7.0.0->cufflinks) (1.7.0)
Requirement already satisfied: pytz>=2017.3 in /home/bahar/anaconda3/lib/python3.7/site-packages (from pandas>=0.19.2->cufflinks) (2020.1)
Requirement already satisfied: python-dateutil>=2.7.3 in /home/bahar/anaconda3/lib/python3.7/site-packages (from pandas>=0.19.2->cufflinks) (2.8.1)
Requirement already satisfied: retrying>=1.3.3 in /home/bahar/anaconda3/lib/python3.7/site-packages (from plotly>=4.1.1->cufflinks) (1.3.3)
Requirement already satisfied: wcwidth in /home/bahar/.local/lib/python3.7/site-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->ipython>=5.3.0->cufflinks) (0.2.5)
Requirement already satisfied: notebook>=4.4.1 in /home/bahar/anaconda3/lib/python3.7/site-packages (from widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->cufflinks) (6.0.3)
Requirement already satisfied: nbconvert in /home/bahar/anaconda3/lib/python3.7/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->cufflinks) (5.6.1)
Requirement already satisfied: prometheus-client in /home/bahar/anaconda3/lib/python3.7/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->cufflinks) (0.8.0)
Requirement already satisfied: Send2Trash in /home/bahar/anaconda3/lib/python3.7/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->cufflinks) (1.5.0)
Requirement already satisfied: terminado>=0.8.1 in /home/bahar/anaconda3/lib/python3.7/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->cufflinks) (0.8.3)
Requirement already satisfied: pyzmq>=17 in /home/bahar/anaconda3/lib/python3.7/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->cufflinks) (19.0.1)
Requirement already satisfied: jinja2 in /home/bahar/anaconda3/lib/python3.7/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->cufflinks) (2.11.2)
Requirement already satisfied: zipp>=0.5 in /home/bahar/anaconda3/lib/python3.7/site-packages (from importlib-metadata->jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets>=7.0.0->cufflinks) (3.1.0)
Requirement already satisfied: MarkupSafe>=0.23 in /home/bahar/anaconda3/lib/python3.7/site-packages (from jinja2->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->cufflinks) (1.1.1)
Requirement already satisfied: mistune<2,>=0.8.1 in /home/bahar/anaconda3/lib/python3.7/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->cufflinks) (0.8.4)
Requirement already satisfied: bleach in /home/bahar/anaconda3/lib/python3.7/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->cufflinks) (3.1.5)
Requirement already satisfied: pandocfilters>=1.4.1 in /home/bahar/anaconda3/lib/python3.7/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->cufflinks) (1.4.2)
Requirement already satisfied: defusedxml in /home/bahar/anaconda3/lib/python3.7/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->cufflinks) (0.6.0)
Requirement already satisfied: entrypoints>=0.2.2 in /home/bahar/anaconda3/lib/python3.7/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->cufflinks) (0.3)
Requirement already satisfied: testpath in /home/bahar/anaconda3/lib/python3.7/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->cufflinks) (0.4.4)
Requirement already satisfied: packaging in /home/bahar/anaconda3/lib/python3.7/site-packages (from bleach->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->cufflinks) (20.4)
Requirement already satisfied: webencodings in /home/bahar/anaconda3/lib/python3.7/site-packages (from bleach->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->cufflinks) (0.5.1)
Requirement already satisfied: pyparsing>=2.0.2 in /home/bahar/anaconda3/lib/python3.7/site-packages (from packaging->bleach->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->cufflinks) (2.4.7)
Requirement already satisfied: ptyprocess>=0.5 in /home/bahar/anaconda3/lib/python3.7/site-packages (from pexpect->ipython>=5.3.0->cufflinks) (0.6.0)

Importing cufflinks¶

import cufflinks as cf

cf.go_offline()

Number of People Survived / Not Survived¶

survived = df[df['Survived']==1]['Survived'].value_counts()

dead = df[df['Survived']==0]['Survived'].value_counts()

df1 = pd.DataFrame([survived ,dead])

df1.index = ['Survived','Dead']

df1.iplot(kind='bar',barmode='stack', title='Number of Survived & Dead')

Number of People Survived based on Sex¶

- If you are a female, 

    - you have a higher chance of survival

survived_sex = df[df['Survived']==1]['Sex'].value_counts()

dead_sex = df[df['Survived']==0]['Sex'].value_counts()

df1 = pd.DataFrame([survived_sex,dead_sex])

df1.index = ['Survived','Dead']

df1.iplot(kind='bar',barmode='stack', title='Survival by Sex')

Number of People Survived based on Class¶

- If you are a first class

     - you have a higher chance of survival

survived_pclass = df[df['Survived']==1]['Pclass'].value_counts()

dead_pclass = df[df['Survived']==0]['Pclass'].value_counts()

df1 = pd.DataFrame([survived_pclass, dead_pclass])

df1.index = ['Survived','Dead']

df1.iplot(kind='bar',barmode='stack', title='Survival by Pclass')

Number of People Survived based on Siblings Status¶

- If you have 1 sibling (SibSp = 1)

    - you have a higher chance of survival compared to being alone (Parch = 0)

survived_SibSp = df[df['Survived']==1]['SibSp'].value_counts()

dead_SibSp = df[df['Survived']==0]['SibSp'].value_counts()

df1 = pd.DataFrame([survived_SibSp, dead_SibSp])

df1.index = ['Survived','Dead']

df1.iplot(kind='bar',barmode='stack', title='Survival by Number of siblings / spouses aboard the Titanic')

Number of People Survived based on Parch Status (Parents/Children onboard)¶

- If you have 1 family member (Parch = 1)

    - you have a higher chance of survival compared to being alone (Parch = 0)

survived_Parch = df[df['Survived']==1]['Parch'].value_counts()

dead_Parch = df[df['Survived']==0]['Parch'].value_counts()

df1 = pd.DataFrame([survived_Parch, dead_Parch])

df1.index = ['Survived','Dead']

df1.iplot(kind='bar',barmode='stack', title='Survival by Number of parents / children aboard the Titanic')

Number of People Survived based on the port they emparked from¶

- Port of Embarkation C = Cherbourg, Q = Queenstown, S = Southampton

- If you embarked from port "C"

    - you have a higher chance of survival compared to other ports!

survived_Embarked = df[df['Survived']==1]['Embarked'].value_counts()

dead_Embarked = df[df['Survived']==0]['Embarked'].value_counts()

df1 = pd.DataFrame([survived_Embarked, dead_Embarked])

df1.index = ['Survived','Dead']

df1.iplot(kind='bar',barmode='stack', title='Survival by Port of Embarkation C = Cherbourg, Q = Queenstown, S = Southampton')

Number of People Survived based on Age¶

- If you are a baby

    - you have a higher chance of survival

Grouping Age¶

df['Age_Group'] = pd.cut(df['Age'], bins=[0,5,10,20,30,40,50,60,70,80])

df.head()

Survival by Age Group¶

survived_Age_Group = df[df['Survived']==1]['Age_Group'].value_counts()

dead_Age_Group = df[df['Survived']==0]['Age_Group'].value_counts()

df1 = pd.DataFrame([survived_Age_Group, dead_Age_Group])

df1.index = ['Survived','Dead']

df['Age'].iplot(kind='hist',bins=30, xTitle='Age',color='skyblue')

df['Age'].iplot(kind='box', xTitle='Age',color='lightgreen')

df1.iplot(kind='bar',barmode='stack', title='Survival by Age Group')

Number of People Survived based on Fare¶

- If you pay a higher fare

    - you have a higher chance of survival

Grouping Fare¶

df['Fare_Group'] = pd.cut(df['Fare'], bins=[0, 50, 100, 200, 300, 600])

df.head()

Survival by Fare Group¶

survived_Fare_Group = df[df['Survived']==1]['Fare_Group'].value_counts()

dead_Fare_Group = df[df['Survived']==0]['Fare_Group'].value_counts()

df1 = pd.DataFrame([survived_Fare_Group, dead_Fare_Group])

df1.index = ['Survived','Dead']

df['Fare'].iplot(kind='hist',bins=30, xTitle='Fare', color='lightgreen')

df['Fare'].iplot(kind='box', xTitle='Age',color='lightgreen')

df1.iplot(kind='bar',barmode='stack', title='Survival by Fare Group')

Handling Categorical Data - Dummy Variable(Sex)¶

- male: 1
- female: 0

df['Male'] = pd.get_dummies(df['Sex'], drop_first = True)

df.head()

Drop Unnecessary Columns¶

df.drop(['PassengerId','Name', 'Sex','Ticket','Cabin', 'Embarked', 'Age_Group', 'Fare_Group' ], axis = 1 , inplace = True)
        
df.head()

Rearrange Columns¶

df = df[['Pclass', 'Age', 'SibSp', 'Parch', 'Fare', 'Male', 'Survived']]

df.head()

Declaring the Dependent & the Independent Variables¶

X = df.iloc[:,:-1].values

y = df.iloc[:,-1].values

Splitting the Dataset into the Training Set and Test Set¶

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 11)

Feature Scaling¶

from sklearn.preprocessing import StandardScaler

sc = StandardScaler()

X_train = sc.fit_transform(X_train)

X_test = sc.transform(X_test)

Training the Logistic Regression Model¶

from sklearn.linear_model import LogisticRegression

model = LogisticRegression(random_state = 0)

model.fit(X_train, y_train)

LogisticRegression(random_state=0)

Predicting the Test Set Results¶

y_pred = model.predict(X_test)

Confusion Matrix¶

from sklearn.metrics import confusion_matrix, accuracy_score

cm = confusion_matrix(y_test, y_pred)

accuracy = accuracy_score(y_test, y_pred)

print("Accuracy is: {:.2f}%".format(accuracy*100))

sns.heatmap(cm, annot = True, fmt="d")

plt.show()

Accuracy is: 84.36%

Classification Report¶

from sklearn.metrics import classification_report

print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.89      0.87      0.88       118
           1       0.76      0.79      0.77        61

    accuracy                           0.84       179
   macro avg       0.82      0.83      0.83       179
weighted avg       0.84      0.84      0.84       179

k-Fold Cross Validation¶

from sklearn.model_selection import cross_val_score

accuracies = cross_val_score(estimator = model, X = X_train, y = y_train, cv = 10)

print("Accuracy: {:.2f} %".format(accuracies.mean()*100))

print("Standard Deviation: {:.2f} %".format(accuracies.std()*100))

Accuracy: 77.67 %
Standard Deviation: 5.49 %

Making Prediction (New Dataset)¶

Importing New Dataset¶

url = "https://datascienceschools.github.io/Machine_Learning/Classification_Models_CaseStudies/Test_Titanic.csv"

new_data = pd.read_csv(url)

new_data.head()

Dropping unnecessary columns from New Dataset¶

new_data.drop(['PassengerId','Name', 'Ticket','Cabin', 'Embarked' ], axis = 1 , inplace = True)
        
new_data.head()

Checking Missing Values¶

new_data.isnull().sum()

Pclass     0
Sex        0
Age       86
SibSp      0
Parch      0
Fare       1
dtype: int64

Handling Missing Values ( Age & Fare)¶

def Fill_Age(data):
    age = data[0]
    sex = data[1]

    if pd.isnull(age):
        if sex is 'male': 
            return 29
        else:
            return 27
    else:
        return age
    

new_data['Age'] = new_data[['Age','Sex']].apply(Fill_Age,axis=1)

new_data = new_data.dropna(axis=0) 

new_data.isnull().sum()

Pclass    0
Sex       0
Age       0
SibSp     0
Parch     0
Fare      0
dtype: int64

Handling Categorical Data - Dummy Variable(Sex)¶

new_data['Male'] = pd.get_dummies(new_data['Sex'], drop_first = True)

new_data.drop(['Sex'], axis = 1, inplace = True)
         
new_data.head()

Declaring Independent Variables¶

new_data_X = new_data.iloc[:,:].values

Feature Scaling (New Data)¶

new_data_X = sc.transform(new_data_X)

Predicting Dependent Variable (Survived)¶

new_data_y_pred = model.predict(new_data_X)

new_data['predicted_Survive'] = new_data_y_pred

new_data.head()

Number of Predicted Survive / Not Servive¶

survive = new_data[new_data['predicted_Survive']==1]['predicted_Survive'].value_counts()

not_survive = new_data[new_data['predicted_Survive']==0]['predicted_Survive'].value_counts()

df1 = pd.DataFrame([survive , not_survive ])

df1.index = ['Survive','Not Survive']

df1.iplot(kind='bar',barmode='stack', title='Number of Predicted Survive & Not Survive')

	PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Ticket	Fare	Cabin	Embarked
0	1	0	3	Braund, Mr. Owen Harris	male	22.0	1	A/5 21171	7.2500	NaN	S
1	2	1	1	Cumings, Mrs. John Bradley (Florence Briggs Th...	female	38.0	1	PC 17599	71.2833	C85	C
2	3	1	3	Heikkinen, Miss. Laina	female	26.0	0	STON/O2. 3101282	7.9250	NaN	S
3	4	1	1	Futrelle, Mrs. Jacques Heath (Lily May Peel)	female	35.0	1	113803	53.1000	C123	S
4	5	0	3	Allen, Mr. William Henry	male	35.0	0	373450	8.0500	NaN	S

	PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Ticket	Fare	Cabin	Embarked	Age_Group
0	1	0	3	Braund, Mr. Owen Harris	male	22.0	1	A/5 21171	7.2500	NaN	S	(20, 30]
1	2	1	1	Cumings, Mrs. John Bradley (Florence Briggs Th...	female	38.0	1	PC 17599	71.2833	C85	C	(30, 40]
2	3	1	3	Heikkinen, Miss. Laina	female	26.0	0	STON/O2. 3101282	7.9250	NaN	S	(20, 30]
3	4	1	1	Futrelle, Mrs. Jacques Heath (Lily May Peel)	female	35.0	1	113803	53.1000	C123	S	(30, 40]
4	5	0	3	Allen, Mr. William Henry	male	35.0	0	373450	8.0500	NaN	S	(30, 40]

	PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Ticket	Fare	Cabin	Embarked	Age_Group	Fare_Group
0	1	0	3	Braund, Mr. Owen Harris	male	22.0	1	A/5 21171	7.2500	NaN	S	(20, 30]	(0, 50]
1	2	1	1	Cumings, Mrs. John Bradley (Florence Briggs Th...	female	38.0	1	PC 17599	71.2833	C85	C	(30, 40]	(50, 100]
2	3	1	3	Heikkinen, Miss. Laina	female	26.0	0	STON/O2. 3101282	7.9250	NaN	S	(20, 30]	(0, 50]
3	4	1	1	Futrelle, Mrs. Jacques Heath (Lily May Peel)	female	35.0	1	113803	53.1000	C123	S	(30, 40]	(50, 100]
4	5	0	3	Allen, Mr. William Henry	male	35.0	0	373450	8.0500	NaN	S	(30, 40]	(0, 50]

	PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Ticket	Fare	Cabin	Embarked	Age_Group	Fare_Group	Male
0	1	0	3	Braund, Mr. Owen Harris	male	22.0	1	A/5 21171	7.2500	NaN	S	(20, 30]	(0, 50]	1
1	2	1	1	Cumings, Mrs. John Bradley (Florence Briggs Th...	female	38.0	1	PC 17599	71.2833	C85	C	(30, 40]	(50, 100]	0
2	3	1	3	Heikkinen, Miss. Laina	female	26.0	0	STON/O2. 3101282	7.9250	NaN	S	(20, 30]	(0, 50]	0
3	4	1	1	Futrelle, Mrs. Jacques Heath (Lily May Peel)	female	35.0	1	113803	53.1000	C123	S	(30, 40]	(50, 100]	0
4	5	0	3	Allen, Mr. William Henry	male	35.0	0	373450	8.0500	NaN	S	(30, 40]	(0, 50]	1

	Survived	Pclass	Age	SibSp	Fare	Male
0	0	3	22.0	1	7.2500	1
1	1	1	38.0	1	71.2833	0
2	1	3	26.0	0	7.9250	0
3	1	1	35.0	1	53.1000	0
4	0	3	35.0	0	8.0500	1

	PassengerId	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Cabin	Embarked
0	892	3	Kelly, Mr. James	male	34.5	0	0	330911	7.8292	NaN	Q
1	893	3	Wilkes, Mrs. James (Ellen Needs)	female	47.0	1	0	363272	7.0000	NaN	S
2	894	2	Myles, Mr. Thomas Francis	male	62.0	0	0	240276	9.6875	NaN	Q
3	895	3	Wirz, Mr. Albert	male	27.0	0	0	315154	8.6625	NaN	S
4	896	3	Hirvonen, Mrs. Alexander (Helga E Lindqvist)	female	22.0	1	1	3101298	12.2875	NaN	S