Pandas - DataFrames

*  2-Dimensional pandas object is a dataframe

  • Importing Pandas Library
  • Converting a list to a dataframe
  • Converting a dictionary to a dataframe
  • Loading Data
  • Saving DataFrame to an Excel file & a CSV file
  • Summary Statistics
  • Summary Statistics (Categorical Columns)
  • Statistical Functions
  • Finding Unique values
  • DataFrame Information
  • Number of Rows & Columns
  • Columns Name
  • Columns Datatypes
  • Number of Missing Values
  • Slicing Data

    1. by loc: using row_index & column_label

    2. by iloc: using row_index & column_index

    3. Alternative Way

  • Query or Filtering using Boolean Operations

Importing Pandas Library

In [81]:
import pandas as pd

Converting a list to a dataframe

In [82]:
my_list = ['a', 'b', 'c']

my_dataframe = pd.DataFrame(my_list)

my_dataframe
Out[82]:
0
0 a
1 b
2 c

Converting a dictionary to a dataframe

In [83]:
my_dictionary = pd.DataFrame({
    'name': ['John' , 'Sara', 'Steve', 'Rose', 'Bob'],
    'salary': [200,200,400,500,300]})

my_dataframe = pd.DataFrame(my_dictionary)

my_dataframe
Out[83]:
name salary
0 John 200
1 Sara 200
2 Steve 400
3 Rose 500
4 Bob 300

Loading Data

- Imporing Data from a github source

 url = "https://..." 
 df = pd.read_csv(url)    -> df: name of our DataFrame

- Imporing Data from a csv file or excel file

 df = pd.read_csv('Titanic.csv')

 df = pd.read_csv('Titanic.xlsx')


- Reading Data

- df -> getting first few rows and last few rows


- df.head()   -> Displaying first 5 rows
- df.head(n)  -> Displaying first n rows
- df.head(10) -> Displaying first 10 rows

- df.tail()   -> Displaying last 5 rows
- df.tail(n)  -> Displaying last n rows
- df.tail(10) -> Displaying last 10 rows
In [84]:
url = "https://datascienceschools.github.io/Titanic.csv"

df = pd.read_csv(url)

df.head()
Out[84]:
pclass name sex age sibsp parch ticket fare cabin embarked survived
0 1 Allen, Miss. Elisabeth Walton female 29.0000 0 0 24160 211.3375 B5 S 1
1 1 Allison, Master. Hudson Trevor male 0.9167 1 2 113781 151.5500 C22 C26 S 1
2 1 Allison, Miss. Helen Loraine female 2.0000 1 2 113781 151.5500 C22 C26 S 0
3 1 Allison, Mr. Hudson Joshua Creighton male 30.0000 1 2 113781 151.5500 C22 C26 S 0
4 1 Allison, Mrs. Hudson J C (Bessie Waldo Daniels) female 25.0000 1 2 113781 151.5500 C22 C26 S 0

Description of the columns

* pclass:    Class of travel
* name:      Full name of the passenger
* sex:       Gender
* age:       Numerical age
* sibsp:     Number of siblings/spouse aboard
* parch:     Number of parents/child aboard
* ticket:    Ticket number
* fare:      Cost of the ticket
* cabin:     Location of room
* emarked:   Port that the passenger embarked at 
                 (C - Cherbourg, S - Southampton, Q = Queenstown)
* survived:  Survial metric (0 - died, 1 - survived)

Saving DataFrame to an Excel file & a CSV file

  - Save to a Commma Separated File (CSV)

         df.to_csv("file_name.csv")

  - Save to an Excel File

        df.to_excel("file_name.xlsx")
In [85]:
df.to_csv("Titanic.csv", index=False)

df.to_excel("Titanic.xlsx", index=False)

Summary Statistics

   - Summary Statistics of all columns with numerical data

     df.describe()

 - Summary Statistics of specified columns with numerical data 

     df['column_name'].describe()

     df[['column_name_1', ..., 'column_name_n']].describe()
In [86]:
df.describe()
Out[86]:
pclass age sibsp parch fare survived
count 1309.000000 1046.000000 1309.000000 1309.000000 1308.000000 1309.000000
mean 2.294882 29.881135 0.498854 0.385027 33.295479 0.381971
std 0.837836 14.413500 1.041658 0.865560 51.758668 0.486055
min 1.000000 0.166700 0.000000 0.000000 0.000000 0.000000
25% 2.000000 21.000000 0.000000 0.000000 7.895800 0.000000
50% 3.000000 28.000000 0.000000 0.000000 14.454200 0.000000
75% 3.000000 39.000000 1.000000 0.000000 31.275000 1.000000
max 3.000000 80.000000 8.000000 9.000000 512.329200 1.000000

Summary Statistics (Categorical Columns)

- Add -> include='all'
In [87]:
df.describe(include='all')
Out[87]:
pclass name sex age sibsp parch ticket fare cabin embarked survived
count 1309.000000 1309 1309 1046.000000 1309.000000 1309.000000 1309 1308.000000 295 1307 1309.000000
unique NaN 1307 2 NaN NaN NaN 929 NaN 186 3 NaN
top NaN Kelly, Mr. James male NaN NaN NaN CA. 2343 NaN C23 C25 C27 S NaN
freq NaN 2 843 NaN NaN NaN 11 NaN 6 914 NaN
mean 2.294882 NaN NaN 29.881135 0.498854 0.385027 NaN 33.295479 NaN NaN 0.381971
std 0.837836 NaN NaN 14.413500 1.041658 0.865560 NaN 51.758668 NaN NaN 0.486055
min 1.000000 NaN NaN 0.166700 0.000000 0.000000 NaN 0.000000 NaN NaN 0.000000
25% 2.000000 NaN NaN 21.000000 0.000000 0.000000 NaN 7.895800 NaN NaN 0.000000
50% 3.000000 NaN NaN 28.000000 0.000000 0.000000 NaN 14.454200 NaN NaN 0.000000
75% 3.000000 NaN NaN 39.000000 1.000000 0.000000 NaN 31.275000 NaN NaN 1.000000
max 3.000000 NaN NaN 80.000000 8.000000 9.000000 NaN 512.329200 NaN NaN 1.000000

Statistical Functions

1.  count(): number of observations excluding missing values

2.  sum(): sum of values

3.  mean(): mean of Values

4.  median(): median of Values

5.  mode(): mode of values

6.  std(): standard Deviation of the Values

7.  min(): minimum Value

8.  max(): maximum Value

  Example:

      - For all columns 

         df.count() -> Showing count for all columns

         df.count(numeric_only=True) -> Showing count for numeric columns


     - For specified columns with numerical data 

         df['column_name'].count()

         df[['column_name_1', ..., 'column_name_n']].count()

Finding Maximum

In [88]:
df.max(numeric_only=True)
Out[88]:
pclass        3.0000
age          80.0000
sibsp         8.0000
parch         9.0000
fare        512.3292
survived      1.0000
dtype: float64
In [89]:
df[['age', 'fare']].max()
Out[89]:
age      80.0000
fare    512.3292
dtype: float64

Finding Minimum

In [90]:
df.min(numeric_only=True)
Out[90]:
pclass      1.0000
age         0.1667
sibsp       0.0000
parch       0.0000
fare        0.0000
survived    0.0000
dtype: float64
In [91]:
df[['age', 'fare']].min()
Out[91]:
age     0.1667
fare    0.0000
dtype: float64

Finding Mean

In [92]:
df.mean(numeric_only=True)
Out[92]:
pclass       2.294882
age         29.881135
sibsp        0.498854
parch        0.385027
fare        33.295479
survived     0.381971
dtype: float64
In [93]:
df[['age', 'fare']].mean()
Out[93]:
age     29.881135
fare    33.295479
dtype: float64

Finding Median

In [94]:
df.median(numeric_only=True)
Out[94]:
pclass       3.0000
age         28.0000
sibsp        0.0000
parch        0.0000
fare        14.4542
survived     0.0000
dtype: float64
In [95]:
df[['age', 'fare']].median()
Out[95]:
age     28.0000
fare    14.4542
dtype: float64

Finding Mode

In [96]:
df[['age', 'fare']].mode()
Out[96]:
age fare
0 24.0 8.05

Finding Standard Deviation

In [97]:
df.std(numeric_only=True)
Out[97]:
pclass       0.837836
age         14.413500
sibsp        1.041658
parch        0.865560
fare        51.758668
survived     0.486055
dtype: float64
In [98]:
df[['age', 'fare']].std()
Out[98]:
age     14.413500
fare    51.758668
dtype: float64

Finding Sum of values

In [99]:
df.sum(numeric_only=True)
Out[99]:
pclass       3004.0000
age         31255.6667
sibsp         653.0000
parch         504.0000
fare        43550.4869
survived      500.0000
dtype: float64
In [100]:
df['fare'].sum()
Out[100]:
43550.4869

Finding total number of values

In [101]:
df.count()
Out[101]:
pclass      1309
name        1309
sex         1309
age         1046
sibsp       1309
parch       1309
ticket      1309
fare        1308
cabin        295
embarked    1307
survived    1309
dtype: int64
In [102]:
df[['age', 'fare']].count()
Out[102]:
age     1046
fare    1308
dtype: int64

Finding Unique values

In [103]:
df['embarked'].unique()
Out[103]:
array(['S', 'C', nan, 'Q'], dtype=object)

DataFrame Information

     - RangeIndex: 1309 -> total 1309 rows in our dataframe

     - Data columns (total 11 columns) -> total 11 columns in our dataframe

     - Other function to get number of rows and columns: 

                 - df.shape

                     (1309, 11)

     - Other function to get the name of columns:

                 - df.columns

                     Index(['pclass', 'name', 'sex', 'age', 'sibsp', 'parch', 
                     'ticket', 'fare','cabin', 'embarked', 'survived']

     - Datatype integer -> pclass, sibsp, parch, survived
     - Datatype float   -> age, fare
     - Datatype object  -> name, sex,ticket, cabin, embarked

     - Other function to get the type of data in each column:

                - df.dtypes


     - Number of rows excluding missing values:

         pclass,name,sex, sinsp, parch, ticket, survived : 1309 rows
         age: 1046 rows
         fare: 1308 rows
         cabin: 295 rows
         embarked: 1307 rows

        => as we have total 1309 rows, some columns have missing values: 

            - age, fare, cabin, embarked

        - Function to get the total numbar of missing values in each column:

            - df.isnull().sum()        
In [104]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1309 entries, 0 to 1308
Data columns (total 11 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   pclass    1309 non-null   int64  
 1   name      1309 non-null   object 
 2   sex       1309 non-null   object 
 3   age       1046 non-null   float64
 4   sibsp     1309 non-null   int64  
 5   parch     1309 non-null   int64  
 6   ticket    1309 non-null   object 
 7   fare      1308 non-null   float64
 8   cabin     295 non-null    object 
 9   embarked  1307 non-null   object 
 10  survived  1309 non-null   int64  
dtypes: float64(2), int64(4), object(5)
memory usage: 112.6+ KB

Number of Rows & Columns

In [105]:
df.shape
Out[105]:
(1309, 11)

Columns Name

In [106]:
df.columns
Out[106]:
Index(['pclass', 'name', 'sex', 'age', 'sibsp', 'parch', 'ticket', 'fare',
       'cabin', 'embarked', 'survived'],
      dtype='object')

Columns Datatypes

- Data type of all columns

     df.dtypes

 - Data type of specified columns

     df[['column_name_1', ..., 'column_name_n']].dtypes
In [107]:
df.dtypes
Out[107]:
pclass        int64
name         object
sex          object
age         float64
sibsp         int64
parch         int64
ticket       object
fare        float64
cabin        object
embarked     object
survived      int64
dtype: object

Number of Missing Values

- Number of Missing Values for all columns

       df.isnull().sum()

- Number of Missing Values for a single column

       df['column_name'].isnull().sum()

- Number of Missing Values for multiple columns

       df[['column_name_1',...,'column_name_n']].isnull().sum()
In [108]:
df.isnull().sum()
Out[108]:
pclass         0
name           0
sex            0
age          263
sibsp          0
parch          0
ticket         0
fare           1
cabin       1014
embarked       2
survived       0
dtype: int64

Slicing Data

- 1. by loc: using row_index & column_label

- 2. by iloc: using row_index & column_index

- 3. Alternative Way

1. Slicing data using loc

- Slicing a Single Cell

    df.loc[row_index, 'column_label']


- Slicing Multiple Cells

    df.loc[row_index, ['column_label_1',...,'column_label_n']]


- Slicing a Single Column

    df.loc[:, 'column_label']

    -> : means showing all rows


- Slicing Multiple Colmuns

     df.loc[:, ['column_label_1', 'column_label_2', ... 'column_label_n']

     -> : means showing all rows


- Slicing a Single Row

    df.loc[row_index, :]

    -> : means showing all columns


- Slicing Multiple Rows

    df.loc[row_index_range, :]

    -> row_index_range means range of indexex like [:5] , [4:] , [3:8]

    -> : means showing all columns


- Slicing Multiple Rows and Multiple Columns

    df.loc[row_index_range, column_index_range]

    -> row_index_range means range of indexex like [:5] , [4:] , [3:8]

    -> column_index_range means range of indexex like [:4] , [7:] , [1:6]

Slicing a Single Cell

- row_index = 1 

- column_label = 'name'
In [109]:
df.loc[1,'name']
Out[109]:
'Allison, Master. Hudson Trevor'

Slicing multiple Cells

- row_index = 1 

- column_label = 'name', 'age', 'fare'
In [110]:
df.loc[1,['name','age','fare']]
Out[110]:
name    Allison, Master. Hudson Trevor
age                             0.9167
fare                            151.55
Name: 1, dtype: object

Slicing a single column

- row_index = : -> meaning all rows

- Column_label = 'name'
In [111]:
df.loc[:,'name']
Out[111]:
0                         Allen, Miss. Elisabeth Walton
1                        Allison, Master. Hudson Trevor
2                          Allison, Miss. Helen Loraine
3                  Allison, Mr. Hudson Joshua Creighton
4       Allison, Mrs. Hudson J C (Bessie Waldo Daniels)
                             ...                       
1304                               Zabour, Miss. Hileni
1305                              Zabour, Miss. Thamine
1306                          Zakarian, Mr. Mapriededer
1307                                Zakarian, Mr. Ortin
1308                                 Zimmerman, Mr. Leo
Name: name, Length: 1309, dtype: object

Slicing multiple columns

- row_index = : -> meaning all rows

- Column_label = 'name', 'fare', 'cabin'
In [112]:
df.loc[:,['age','fare','cabin']]
Out[112]:
age fare cabin
0 29.0000 211.3375 B5
1 0.9167 151.5500 C22 C26
2 2.0000 151.5500 C22 C26
3 30.0000 151.5500 C22 C26
4 25.0000 151.5500 C22 C26
... ... ... ...
1304 14.5000 14.4542 NaN
1305 NaN 14.4542 NaN
1306 26.5000 7.2250 NaN
1307 27.0000 7.2250 NaN
1308 29.0000 7.8750 NaN

1309 rows × 3 columns

Slicing a Single Row

- row_index = 1 

- Column_label = : -> meaning all columns
In [113]:
df.loc[1,:]
Out[113]:
pclass                                   1
name        Allison, Master. Hudson Trevor
sex                                   male
age                                 0.9167
sibsp                                    1
parch                                    2
ticket                              113781
fare                                151.55
cabin                              C22 C26
embarked                                 S
survived                                 1
Name: 1, dtype: object

Slicing Multiple Rows

- row_index = 2:5 

- Column_label = : -> meaning all columns
In [114]:
df.loc[2:5,:]
Out[114]:
pclass name sex age sibsp parch ticket fare cabin embarked survived
2 1 Allison, Miss. Helen Loraine female 2.0 1 2 113781 151.55 C22 C26 S 0
3 1 Allison, Mr. Hudson Joshua Creighton male 30.0 1 2 113781 151.55 C22 C26 S 0
4 1 Allison, Mrs. Hudson J C (Bessie Waldo Daniels) female 25.0 1 2 113781 151.55 C22 C26 S 0
5 1 Anderson, Mr. Harry male 48.0 0 0 19952 26.55 E12 S 1

Slicing Multiple Rows & Multiple Columns

- row_index = 1:3 

- Column_label = : -> 'name','fare','age'
In [115]:
df.loc[1:3,['name','fare','age']]
Out[115]:
name fare age
1 Allison, Master. Hudson Trevor 151.55 0.9167
2 Allison, Miss. Helen Loraine 151.55 2.0000
3 Allison, Mr. Hudson Joshua Creighton 151.55 30.0000

Slicing data using iloc

- Slicing a Single Cell

    df.iloc[row_index, column_index]


- Slicing Multiple Cells

    df.iloc[row_index, column_index_range]

    -> column_index_range means range of indexex like [:5] , [4:] , [3:8]


- Slicing a Single Column

    df.iloc[:, column_index]

    -> : means showing all rows


- Slicing Multiple Colmuns

     df.iloc[:, column_index_range]

     -> : means showing all rows
     -> column_index_range means range of indexex like [:5] , [4:] , [3:8]


- Slicing a Single Row

    df.iloc[row_index, :]

    -> : means showing all columns


- Slicing Multiple Rows

    df.iloc[row_index_range, :]

    -> row_index_range means range of indexex like [:5] , [4:] , [3:8]

    -> : means showing all columns


- Slicing Multiple Rows and Multiple Columns

    df.iloc[row_index_range, column_index_range]

    -> row_index_range means range of indexex like [:5] , [4:] , [3:8]

    -> column_index_range means range of indexex like [:4] , [7:] , [1:6]

Slicing a Single Cell

In [116]:
df.iloc[1,0]
Out[116]:
1

Slicing Multiple Cells

In [117]:
df.iloc[0,2:5]
Out[117]:
sex      female
age          29
sibsp         0
Name: 0, dtype: object

Slicing a Single Column

In [118]:
df.iloc[:,8]
Out[118]:
0            B5
1       C22 C26
2       C22 C26
3       C22 C26
4       C22 C26
         ...   
1304        NaN
1305        NaN
1306        NaN
1307        NaN
1308        NaN
Name: cabin, Length: 1309, dtype: object

Slicing Multiple Columns

In [119]:
df.iloc[:,2:6]
Out[119]:
sex age sibsp parch
0 female 29.0000 0 0
1 male 0.9167 1 2
2 female 2.0000 1 2
3 male 30.0000 1 2
4 female 25.0000 1 2
... ... ... ... ...
1304 female 14.5000 1 0
1305 female NaN 1 0
1306 male 26.5000 0 0
1307 male 27.0000 0 0
1308 male 29.0000 0 0

1309 rows × 4 columns

Slicing a Single Row

In [120]:
df.iloc[5,:]
Out[120]:
pclass                        1
name        Anderson, Mr. Harry
sex                        male
age                          48
sibsp                         0
parch                         0
ticket                    19952
fare                      26.55
cabin                       E12
embarked                      S
survived                      1
Name: 5, dtype: object

Slicing Multiple Rows

In [121]:
df.iloc[3:8,:]
Out[121]:
pclass name sex age sibsp parch ticket fare cabin embarked survived
3 1 Allison, Mr. Hudson Joshua Creighton male 30.0 1 2 113781 151.5500 C22 C26 S 0
4 1 Allison, Mrs. Hudson J C (Bessie Waldo Daniels) female 25.0 1 2 113781 151.5500 C22 C26 S 0
5 1 Anderson, Mr. Harry male 48.0 0 0 19952 26.5500 E12 S 1
6 1 Andrews, Miss. Kornelia Theodosia female 63.0 1 0 13502 77.9583 D7 S 1
7 1 Andrews, Mr. Thomas Jr male 39.0 0 0 112050 0.0000 A36 S 0

Slicing Multiple Rows & Multiple Columns

In [122]:
df.iloc[3:8,2:7]
Out[122]:
sex age sibsp parch ticket
3 male 30.0 1 2 113781
4 female 25.0 1 2 113781
5 male 48.0 0 0 19952
6 female 63.0 1 0 13502
7 male 39.0 0 0 112050

Slicing Data using Alternative Way

- Slicing Columns using column_label

- Slicing Rows using index_range

Slicing a Single Column

In [123]:
df[['name']]
Out[123]:
name
0 Allen, Miss. Elisabeth Walton
1 Allison, Master. Hudson Trevor
2 Allison, Miss. Helen Loraine
3 Allison, Mr. Hudson Joshua Creighton
4 Allison, Mrs. Hudson J C (Bessie Waldo Daniels)
... ...
1304 Zabour, Miss. Hileni
1305 Zabour, Miss. Thamine
1306 Zakarian, Mr. Mapriededer
1307 Zakarian, Mr. Ortin
1308 Zimmerman, Mr. Leo

1309 rows × 1 columns

Slicing Multiple Columns

In [124]:
df[['name','age','ticket']]
Out[124]:
name age ticket
0 Allen, Miss. Elisabeth Walton 29.0000 24160
1 Allison, Master. Hudson Trevor 0.9167 113781
2 Allison, Miss. Helen Loraine 2.0000 113781
3 Allison, Mr. Hudson Joshua Creighton 30.0000 113781
4 Allison, Mrs. Hudson J C (Bessie Waldo Daniels) 25.0000 113781
... ... ... ...
1304 Zabour, Miss. Hileni 14.5000 2665
1305 Zabour, Miss. Thamine NaN 2665
1306 Zakarian, Mr. Mapriededer 26.5000 2656
1307 Zakarian, Mr. Ortin 27.0000 2670
1308 Zimmerman, Mr. Leo 29.0000 315082

1309 rows × 3 columns

Slicing Rows using index_range

Slicing a Single Row

In [125]:
df[3:4]
Out[125]:
pclass name sex age sibsp parch ticket fare cabin embarked survived
3 1 Allison, Mr. Hudson Joshua Creighton male 30.0 1 2 113781 151.55 C22 C26 S 0

Slicing Multiple Rows

In [126]:
df[6:9]
Out[126]:
pclass name sex age sibsp parch ticket fare cabin embarked survived
6 1 Andrews, Miss. Kornelia Theodosia female 63.0 1 0 13502 77.9583 D7 S 1
7 1 Andrews, Mr. Thomas Jr male 39.0 0 0 112050 0.0000 A36 S 0
8 1 Appleton, Mrs. Edward Dale (Charlotte Lamson) female 53.0 2 0 11769 51.4792 C101 S 1

Query or Filtering using Boolean Operations

Filtering (df['pclass'] == 1)

In [127]:
df[df['pclass'] == 1]
Out[127]:
pclass name sex age sibsp parch ticket fare cabin embarked survived
0 1 Allen, Miss. Elisabeth Walton female 29.0000 0 0 24160 211.3375 B5 S 1
1 1 Allison, Master. Hudson Trevor male 0.9167 1 2 113781 151.5500 C22 C26 S 1
2 1 Allison, Miss. Helen Loraine female 2.0000 1 2 113781 151.5500 C22 C26 S 0
3 1 Allison, Mr. Hudson Joshua Creighton male 30.0000 1 2 113781 151.5500 C22 C26 S 0
4 1 Allison, Mrs. Hudson J C (Bessie Waldo Daniels) female 25.0000 1 2 113781 151.5500 C22 C26 S 0
... ... ... ... ... ... ... ... ... ... ... ...
318 1 Williams-Lambert, Mr. Fletcher Fellows male NaN 0 0 113510 35.0000 C128 S 0
319 1 Wilson, Miss. Helen Alice female 31.0000 0 0 16966 134.5000 E39 E41 C 1
320 1 Woolner, Mr. Hugh male NaN 0 0 19947 35.5000 C52 S 1
321 1 Wright, Mr. George male 62.0000 0 0 113807 26.5500 NaN S 0
322 1 Young, Miss. Marie Grice female 36.0000 0 0 PC 17760 135.6333 C32 C 1

323 rows × 11 columns

Query ('pclass == 1')

In [128]:
df.query('pclass == 1')
Out[128]:
pclass name sex age sibsp parch ticket fare cabin embarked survived
0 1 Allen, Miss. Elisabeth Walton female 29.0000 0 0 24160 211.3375 B5 S 1
1 1 Allison, Master. Hudson Trevor male 0.9167 1 2 113781 151.5500 C22 C26 S 1
2 1 Allison, Miss. Helen Loraine female 2.0000 1 2 113781 151.5500 C22 C26 S 0
3 1 Allison, Mr. Hudson Joshua Creighton male 30.0000 1 2 113781 151.5500 C22 C26 S 0
4 1 Allison, Mrs. Hudson J C (Bessie Waldo Daniels) female 25.0000 1 2 113781 151.5500 C22 C26 S 0
... ... ... ... ... ... ... ... ... ... ... ...
318 1 Williams-Lambert, Mr. Fletcher Fellows male NaN 0 0 113510 35.0000 C128 S 0
319 1 Wilson, Miss. Helen Alice female 31.0000 0 0 16966 134.5000 E39 E41 C 1
320 1 Woolner, Mr. Hugh male NaN 0 0 19947 35.5000 C52 S 1
321 1 Wright, Mr. George male 62.0000 0 0 113807 26.5500 NaN S 0
322 1 Young, Miss. Marie Grice female 36.0000 0 0 PC 17760 135.6333 C32 C 1

323 rows × 11 columns

Filtering (df['age'] > 70)

In [129]:
df[df['age'] > 70]
Out[129]:
pclass name sex age sibsp parch ticket fare cabin embarked survived
9 1 Artagaveytia, Mr. Ramon male 71.0 0 0 PC 17609 49.5042 NaN C 0
14 1 Barkworth, Mr. Algernon Henry Wilson male 80.0 0 0 27042 30.0000 A23 S 1
61 1 Cavendish, Mrs. Tyrell William (Julia Florence... female 76.0 1 0 19877 78.8500 C46 S 1
135 1 Goldschmidt, Mr. George B male 71.0 0 0 PC 17754 34.6542 A5 C 0
727 3 Connors, Mr. Patrick male 70.5 0 0 370369 7.7500 NaN Q 0
1235 3 Svensson, Mr. Johan male 74.0 0 0 347060 7.7750 NaN S 0

Query ('age>70')

In [130]:
df.query('age>70')
Out[130]:
pclass name sex age sibsp parch ticket fare cabin embarked survived
9 1 Artagaveytia, Mr. Ramon male 71.0 0 0 PC 17609 49.5042 NaN C 0
14 1 Barkworth, Mr. Algernon Henry Wilson male 80.0 0 0 27042 30.0000 A23 S 1
61 1 Cavendish, Mrs. Tyrell William (Julia Florence... female 76.0 1 0 19877 78.8500 C46 S 1
135 1 Goldschmidt, Mr. George B male 71.0 0 0 PC 17754 34.6542 A5 C 0
727 3 Connors, Mr. Patrick male 70.5 0 0 370369 7.7500 NaN Q 0
1235 3 Svensson, Mr. Johan male 74.0 0 0 347060 7.7750 NaN S 0

Filtering(df['name'] == 'Barkworth, Mr. Algernon Henry Wilson'])

In [131]:
df[df['name'] == 'Barkworth, Mr. Algernon Henry Wilson']
Out[131]:
pclass name sex age sibsp parch ticket fare cabin embarked survived
14 1 Barkworth, Mr. Algernon Henry Wilson male 80.0 0 0 27042 30.0 A23 S 1

Query ("name == 'Barkworth, Mr. Algernon Henry Wilson'")

In [132]:
df.query("name == 'Barkworth, Mr. Algernon Henry Wilson'")
Out[132]:
pclass name sex age sibsp parch ticket fare cabin embarked survived
14 1 Barkworth, Mr. Algernon Henry Wilson male 80.0 0 0 27042 30.0 A23 S 1

Filtering( (df['age'] > 60) | (df['age'] < 5) )

In [133]:
df[(df['age'] > 60) | (df['age'] < 5)]
Out[133]:
pclass name sex age sibsp parch ticket fare cabin embarked survived
1 1 Allison, Master. Hudson Trevor male 0.9167 1 2 113781 151.5500 C22 C26 S 1
2 1 Allison, Miss. Helen Loraine female 2.0000 1 2 113781 151.5500 C22 C26 S 0
6 1 Andrews, Miss. Kornelia Theodosia female 63.0000 1 0 13502 77.9583 D7 S 1
9 1 Artagaveytia, Mr. Ramon male 71.0000 0 0 PC 17609 49.5042 NaN C 0
14 1 Barkworth, Mr. Algernon Henry Wilson male 80.0000 0 0 27042 30.0000 A23 S 1
... ... ... ... ... ... ... ... ... ... ... ...
1225 3 Storey, Mr. Thomas male 60.5000 0 0 3701 NaN NaN S 0
1230 3 Strom, Miss. Telma Matilda female 2.0000 0 1 347054 10.4625 G6 S 0
1235 3 Svensson, Mr. Johan male 74.0000 0 0 347060 7.7750 NaN S 0
1240 3 Thomas, Master. Assad Alexander male 0.4167 0 1 2625 8.5167 NaN C 1
1261 3 Turkula, Mrs. (Hedwig) female 63.0000 0 0 4134 9.5875 NaN S 1

84 rows × 11 columns

Query('age > 60 | age < 5')

In [134]:
df.query('age > 60 | age < 5')
Out[134]:
pclass name sex age sibsp parch ticket fare cabin embarked survived
1 1 Allison, Master. Hudson Trevor male 0.9167 1 2 113781 151.5500 C22 C26 S 1
2 1 Allison, Miss. Helen Loraine female 2.0000 1 2 113781 151.5500 C22 C26 S 0
6 1 Andrews, Miss. Kornelia Theodosia female 63.0000 1 0 13502 77.9583 D7 S 1
9 1 Artagaveytia, Mr. Ramon male 71.0000 0 0 PC 17609 49.5042 NaN C 0
14 1 Barkworth, Mr. Algernon Henry Wilson male 80.0000 0 0 27042 30.0000 A23 S 1
... ... ... ... ... ... ... ... ... ... ... ...
1225 3 Storey, Mr. Thomas male 60.5000 0 0 3701 NaN NaN S 0
1230 3 Strom, Miss. Telma Matilda female 2.0000 0 1 347054 10.4625 G6 S 0
1235 3 Svensson, Mr. Johan male 74.0000 0 0 347060 7.7750 NaN S 0
1240 3 Thomas, Master. Assad Alexander male 0.4167 0 1 2625 8.5167 NaN C 1
1261 3 Turkula, Mrs. (Hedwig) female 63.0000 0 0 4134 9.5875 NaN S 1

84 rows × 11 columns

NOTE:

 a & b (pandas) = a and b (python)
 a | b (pandas) = a or b (python)