Web Scraping (Using Pandas read_html)

Source

One URL (Year 2019)

Combining URl & String - One URL

In [1]:
year = '2019'

url_link = 'https://www.basketball-reference.com/leagues/NBA_{}_per_game.html'

url = url_link.format(year)

url
Out[1]:
'https://www.basketball-reference.com/leagues/NBA_2019_per_game.html'

Read HTML webpage into pandas

In [2]:
import pandas as pd

df = pd.read_html(url, header = 0)

df
Out[2]:
[      Rk        Player Pos Age   Tm   G  GS    MP   FG   FGA  ...   FT%  ORB  \
 0      1  Álex Abrines  SG  25  OKC  31   2  19.0  1.8   5.1  ...  .923  0.2   
 1      2    Quincy Acy  PF  28  PHO  10   0  12.3  0.4   1.8  ...  .700  0.3   
 2      3  Jaylen Adams  PG  22  ATL  34   1  12.6  1.1   3.2  ...  .778  0.3   
 3      4  Steven Adams   C  25  OKC  80  80  33.4  6.0  10.1  ...  .500  4.9   
 4      5   Bam Adebayo   C  21  MIA  82  28  23.3  3.4   5.9  ...  .735  2.0   
 ..   ...           ...  ..  ..  ...  ..  ..   ...  ...   ...  ...   ...  ...   
 729  528  Tyler Zeller   C  29  MEM   4   1  20.5  4.0   7.0  ...  .778  2.3   
 730  529    Ante Žižić   C  22  CLE  59  25  18.3  3.1   5.6  ...  .705  1.8   
 731  530   Ivica Zubac   C  21  TOT  59  37  17.6  3.6   6.4  ...  .802  1.9   
 732  530   Ivica Zubac   C  21  LAL  33  12  15.6  3.4   5.8  ...  .864  1.6   
 733  530   Ivica Zubac   C  21  LAC  26  25  20.2  3.8   7.2  ...  .733  2.3   
 
      DRB  TRB  AST  STL  BLK  TOV   PF   PTS  
 0    1.4  1.5  0.6  0.5  0.2  0.5  1.7   5.3  
 1    2.2  2.5  0.8  0.1  0.4  0.4  2.4   1.7  
 2    1.4  1.8  1.9  0.4  0.1  0.8  1.3   3.2  
 3    4.6  9.5  1.6  1.5  1.0  1.7  2.6  13.9  
 4    5.3  7.3  2.2  0.9  0.8  1.5  2.5   8.9  
 ..   ...  ...  ...  ...  ...  ...  ...   ...  
 729  2.3  4.5  0.8  0.3  0.8  1.0  4.0  11.5  
 730  3.6  5.4  0.9  0.2  0.4  1.0  1.9   7.8  
 731  4.2  6.1  1.1  0.2  0.9  1.2  2.3   8.9  
 732  3.3  4.9  0.8  0.1  0.8  1.0  2.2   8.5  
 733  5.3  7.7  1.5  0.4  0.9  1.4  2.5   9.4  
 
 [734 rows x 30 columns]]

How many tables are there in the webpage?

In [3]:
len(df)
Out[3]:
1

Select the First Table

In [4]:
df_2019 = df[0]

df_2019
Out[4]:
Rk Player Pos Age Tm G GS MP FG FGA ... FT% ORB DRB TRB AST STL BLK TOV PF PTS
0 1 Álex Abrines SG 25 OKC 31 2 19.0 1.8 5.1 ... .923 0.2 1.4 1.5 0.6 0.5 0.2 0.5 1.7 5.3
1 2 Quincy Acy PF 28 PHO 10 0 12.3 0.4 1.8 ... .700 0.3 2.2 2.5 0.8 0.1 0.4 0.4 2.4 1.7
2 3 Jaylen Adams PG 22 ATL 34 1 12.6 1.1 3.2 ... .778 0.3 1.4 1.8 1.9 0.4 0.1 0.8 1.3 3.2
3 4 Steven Adams C 25 OKC 80 80 33.4 6.0 10.1 ... .500 4.9 4.6 9.5 1.6 1.5 1.0 1.7 2.6 13.9
4 5 Bam Adebayo C 21 MIA 82 28 23.3 3.4 5.9 ... .735 2.0 5.3 7.3 2.2 0.9 0.8 1.5 2.5 8.9
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
729 528 Tyler Zeller C 29 MEM 4 1 20.5 4.0 7.0 ... .778 2.3 2.3 4.5 0.8 0.3 0.8 1.0 4.0 11.5
730 529 Ante Žižić C 22 CLE 59 25 18.3 3.1 5.6 ... .705 1.8 3.6 5.4 0.9 0.2 0.4 1.0 1.9 7.8
731 530 Ivica Zubac C 21 TOT 59 37 17.6 3.6 6.4 ... .802 1.9 4.2 6.1 1.1 0.2 0.9 1.2 2.3 8.9
732 530 Ivica Zubac C 21 LAL 33 12 15.6 3.4 5.8 ... .864 1.6 3.3 4.9 0.8 0.1 0.8 1.0 2.2 8.5
733 530 Ivica Zubac C 21 LAC 26 25 20.2 3.8 7.2 ... .733 2.3 5.3 7.7 1.5 0.4 0.9 1.4 2.5 9.4

734 rows × 30 columns

Multiple URL (Year 2015 to 2019)

Combining URl & String - a List of URL

In [5]:
years = [2015, 2016, 2017, 2018, 2019]
urls = []

url_link = 'https://www.basketball-reference.com/leagues/NBA_{}_per_game.html'

for year in years:
  url = url_link.format(year)
  urls.append(url)

urls
Out[5]:
['https://www.basketball-reference.com/leagues/NBA_2015_per_game.html',
 'https://www.basketball-reference.com/leagues/NBA_2016_per_game.html',
 'https://www.basketball-reference.com/leagues/NBA_2017_per_game.html',
 'https://www.basketball-reference.com/leagues/NBA_2018_per_game.html',
 'https://www.basketball-reference.com/leagues/NBA_2019_per_game.html']

Read HTML webpage into pandas

In [6]:
df = []

for url in urls:
    data = pd.read_html(url, header = 0)
    df.append(data)
    
df
Out[6]:
[[      Rk          Player Pos Age   Tm   G  GS    MP   FG   FGA  ...   FT%  \
  0      1      Quincy Acy  PF  24  NYK  68  22  18.9  2.2   4.9  ...  .784   
  1      2    Jordan Adams  SG  20  MEM  30   0   8.3  1.2   2.9  ...  .609   
  2      3    Steven Adams   C  21  OKC  70  67  25.3  3.1   5.7  ...  .502   
  3      4     Jeff Adrien  PF  28  MIN  17   0  12.6  1.1   2.6  ...  .579   
  4      5   Arron Afflalo  SG  29  TOT  78  72  32.1  4.8  11.3  ...  .843   
  ..   ...             ...  ..  ..  ...  ..  ..   ...  ...   ...  ...   ...   
  670  490  Thaddeus Young  PF  26  TOT  76  68  32.0  5.9  12.7  ...  .655   
  671  490  Thaddeus Young  PF  26  MIN  48  48  33.4  6.0  13.4  ...  .682   
  672  490  Thaddeus Young  PF  26  BRK  28  20  29.6  5.8  11.7  ...  .606   
  673  491     Cody Zeller   C  22  CHO  62  45  24.0  2.8   6.0  ...  .774   
  674  492    Tyler Zeller   C  25  BOS  82  59  21.1  4.1   7.5  ...  .823   
  
       ORB  DRB  TRB  AST  STL  BLK  TOV   PF   PTS  
  0    1.2  3.3  4.4  1.0  0.4  0.3  0.9  2.2   5.9  
  1    0.3  0.6  0.9  0.5  0.5  0.2  0.5  0.8   3.1  
  2    2.8  4.6  7.5  0.9  0.5  1.2  1.4  3.2   7.7  
  3    1.4  3.2  4.5  0.9  0.2  0.5  0.5  1.8   3.5  
  4    0.3  2.8  3.2  1.7  0.5  0.1  1.5  2.1  13.3  
  ..   ...  ...  ...  ...  ...  ...  ...  ...   ...  
  670  1.7  3.7  5.4  2.3  1.6  0.3  1.5  2.3  14.1  
  671  1.6  3.5  5.1  2.8  1.8  0.4  1.6  2.4  14.3  
  672  1.9  4.1  5.9  1.4  1.4  0.3  1.5  2.0  13.8  
  673  1.6  4.3  5.8  1.6  0.5  0.8  1.0  2.5   7.6  
  674  1.8  3.9  5.7  1.4  0.2  0.6  0.9  2.5  10.2  
  
  [675 rows x 30 columns]],
 [      Rk          Player Pos Age   Tm   G  GS    MP   FG   FGA  ...   FT%  \
  0      1      Quincy Acy  PF  25  SAC  59  29  14.8  2.0   3.6  ...  .735   
  1      2    Jordan Adams  SG  21  MEM   2   0   7.5  1.0   3.0  ...  .600   
  2      3    Steven Adams   C  22  OKC  80  80  25.2  3.3   5.3  ...  .582   
  3      4   Arron Afflalo  SG  30  NYK  71  57  33.4  5.0  11.3  ...  .840   
  4      5   Alexis Ajinça   C  27  NOP  59  17  14.6  2.5   5.3  ...  .839   
  ..   ...             ...  ..  ..  ...  ..  ..   ...  ...   ...  ...   ...   
  596  472       Joe Young  PG  23  IND  41   0   9.4  1.5   4.1  ...  .800   
  597  473      Nick Young  SG  30  LAL  54   2  19.1  2.3   6.9  ...  .829   
  598  474  Thaddeus Young  PF  27  BRK  73  73  33.0  6.8  13.2  ...  .644   
  599  475     Cody Zeller   C  23  CHO  73  60  24.3  3.2   6.0  ...  .754   
  600  476    Tyler Zeller   C  26  BOS  60   3  11.8  2.3   4.8  ...  .815   
  
       ORB  DRB  TRB  AST  STL  BLK  TOV   PF   PTS  
  0    1.1  2.1  3.2  0.5  0.5  0.4  0.5  1.7   5.2  
  1    0.0  1.0  1.0  1.5  1.5  0.0  1.0  1.0   3.5  
  2    2.7  3.9  6.7  0.8  0.5  1.1  1.1  2.8   8.0  
  3    0.3  3.4  3.7  2.0  0.4  0.1  1.2  2.0  12.8  
  4    1.3  3.3  4.6  0.5  0.3  0.6  0.9  2.3   6.0  
  ..   ...  ...  ...  ...  ...  ...  ...  ...   ...  
  596  0.1  1.1  1.2  1.6  0.4  0.0  0.8  0.7   3.8  
  597  0.3  1.5  1.8  0.6  0.4  0.1  0.6  0.9   7.3  
  598  2.4  6.6  9.0  1.9  1.5  0.5  1.9  2.5  15.1  
  599  1.9  4.3  6.2  1.0  0.8  0.9  0.9  2.8   8.7  
  600  1.0  1.9  3.0  0.5  0.2  0.4  0.8  1.6   6.1  
  
  [601 rows x 30 columns]],
 [      Rk             Player Pos Age   Tm   G  GS    MP   FG  FGA  ...   FT%  \
  0      1       Álex Abrines  SG  23  OKC  68   6  15.5  2.0  5.0  ...  .898   
  1      2         Quincy Acy  PF  26  TOT  38   1  14.7  1.8  4.5  ...  .750   
  2      2         Quincy Acy  PF  26  DAL   6   0   8.0  0.8  2.8  ...  .667   
  3      2         Quincy Acy  PF  26  BRK  32   1  15.9  2.0  4.8  ...  .754   
  4      3       Steven Adams   C  23  OKC  80  80  29.9  4.7  8.2  ...  .611   
  ..   ...                ...  ..  ..  ...  ..  ..   ...  ...  ...  ...   ...   
  614  482        Cody Zeller   C  24  CHO  62  58  27.8  4.1  7.1  ...  .679   
  615  483       Tyler Zeller   C  27  BOS  51   5  10.3  1.5  3.1  ...  .564   
  616  484  Stephen Zimmerman   C  20  ORL  19   0   5.7  0.5  1.6  ...  .600   
  617  485        Paul Zipser  SF  22  CHI  44  18  19.2  2.0  5.0  ...  .775   
  618  486        Ivica Zubac   C  19  LAL  38  11  16.0  3.3  6.3  ...  .653   
  
       ORB  DRB  TRB  AST  STL  BLK  TOV   PF   PTS  
  0    0.3  1.0  1.3  0.6  0.5  0.1  0.5  1.7   6.0  
  1    0.5  2.5  3.0  0.5  0.4  0.4  0.6  1.8   5.8  
  2    0.3  1.0  1.3  0.0  0.0  0.0  0.3  1.5   2.2  
  3    0.6  2.8  3.3  0.6  0.4  0.5  0.6  1.8   6.5  
  4    3.5  4.2  7.7  1.1  1.1  1.0  1.8  2.4  11.3  
  ..   ...  ...  ...  ...  ...  ...  ...  ...   ...  
  614  2.2  4.4  6.5  1.6  1.0  0.9  1.0  3.0  10.3  
  615  0.8  1.6  2.4  0.8  0.1  0.4  0.4  1.2   3.5  
  616  0.6  1.3  1.8  0.2  0.1  0.3  0.2  0.9   1.2  
  617  0.3  2.5  2.8  0.8  0.3  0.4  0.9  1.8   5.5  
  618  1.1  3.1  4.2  0.8  0.4  0.9  0.8  1.7   7.5  
  
  [619 rows x 30 columns]],
 [      Rk         Player Pos Age   Tm   G  GS    MP   FG  FGA  ...   FT%  ORB  \
  0      1   Álex Abrines  SG  24  OKC  75   8  15.1  1.5  3.9  ...  .848  0.3   
  1      2     Quincy Acy  PF  27  BRK  70   8  19.4  1.9  5.2  ...  .817  0.6   
  2      3   Steven Adams   C  24  OKC  76  76  32.7  5.9  9.4  ...  .559  5.1   
  3      4    Bam Adebayo   C  20  MIA  69  19  19.8  2.5  4.9  ...  .721  1.7   
  4      5  Arron Afflalo  SG  32  ORL  53   3  12.9  1.2  3.1  ...  .846  0.1   
  ..   ...            ...  ..  ..  ...  ..  ..   ...  ...  ...  ...   ...  ...   
  685  537   Tyler Zeller   C  28  BRK  42  33  16.7  3.0  5.5  ...  .667  1.5   
  686  537   Tyler Zeller   C  28  MIL  24   1  16.9  2.6  4.4  ...  .895  2.0   
  687  538    Paul Zipser  SF  23  CHI  54  12  15.3  1.5  4.3  ...  .760  0.2   
  688  539     Ante Žižić   C  21  CLE  32   2   6.7  1.5  2.1  ...  .724  0.8   
  689  540    Ivica Zubac   C  20  LAL  43   0   9.5  1.4  2.8  ...  .765  1.0   
  
       DRB  TRB  AST  STL  BLK  TOV   PF   PTS  
  0    1.2  1.5  0.4  0.5  0.1  0.3  1.7   4.7  
  1    3.1  3.7  0.8  0.5  0.4  0.9  2.1   5.9  
  2    4.0  9.0  1.2  1.2  1.0  1.7  2.8  13.9  
  3    3.8  5.5  1.5  0.5  0.6  1.0  2.0   6.9  
  4    1.2  1.2  0.6  0.1  0.2  0.4  1.1   3.4  
  ..   ...  ...  ...  ...  ...  ...  ...   ...  
  685  3.1  4.6  0.7  0.2  0.5  0.8  1.9   7.1  
  686  2.7  4.6  0.8  0.3  0.6  0.5  2.0   5.9  
  687  2.2  2.4  0.9  0.4  0.3  0.8  1.6   4.0  
  688  1.1  1.9  0.2  0.1  0.4  0.3  0.9   3.7  
  689  1.8  2.9  0.6  0.2  0.3  0.6  1.1   3.7  
  
  [690 rows x 30 columns]],
 [      Rk        Player Pos Age   Tm   G  GS    MP   FG   FGA  ...   FT%  ORB  \
  0      1  Álex Abrines  SG  25  OKC  31   2  19.0  1.8   5.1  ...  .923  0.2   
  1      2    Quincy Acy  PF  28  PHO  10   0  12.3  0.4   1.8  ...  .700  0.3   
  2      3  Jaylen Adams  PG  22  ATL  34   1  12.6  1.1   3.2  ...  .778  0.3   
  3      4  Steven Adams   C  25  OKC  80  80  33.4  6.0  10.1  ...  .500  4.9   
  4      5   Bam Adebayo   C  21  MIA  82  28  23.3  3.4   5.9  ...  .735  2.0   
  ..   ...           ...  ..  ..  ...  ..  ..   ...  ...   ...  ...   ...  ...   
  729  528  Tyler Zeller   C  29  MEM   4   1  20.5  4.0   7.0  ...  .778  2.3   
  730  529    Ante Žižić   C  22  CLE  59  25  18.3  3.1   5.6  ...  .705  1.8   
  731  530   Ivica Zubac   C  21  TOT  59  37  17.6  3.6   6.4  ...  .802  1.9   
  732  530   Ivica Zubac   C  21  LAL  33  12  15.6  3.4   5.8  ...  .864  1.6   
  733  530   Ivica Zubac   C  21  LAC  26  25  20.2  3.8   7.2  ...  .733  2.3   
  
       DRB  TRB  AST  STL  BLK  TOV   PF   PTS  
  0    1.4  1.5  0.6  0.5  0.2  0.5  1.7   5.3  
  1    2.2  2.5  0.8  0.1  0.4  0.4  2.4   1.7  
  2    1.4  1.8  1.9  0.4  0.1  0.8  1.3   3.2  
  3    4.6  9.5  1.6  1.5  1.0  1.7  2.6  13.9  
  4    5.3  7.3  2.2  0.9  0.8  1.5  2.5   8.9  
  ..   ...  ...  ...  ...  ...  ...  ...   ...  
  729  2.3  4.5  0.8  0.3  0.8  1.0  4.0  11.5  
  730  3.6  5.4  0.9  0.2  0.4  1.0  1.9   7.8  
  731  4.2  6.1  1.1  0.2  0.9  1.2  2.3   8.9  
  732  3.3  4.9  0.8  0.1  0.8  1.0  2.2   8.5  
  733  5.3  7.7  1.5  0.4  0.9  1.4  2.5   9.4  
  
  [734 rows x 30 columns]]]

How many tables are there in the df?

In [7]:
len(df)
Out[7]:
5

Select the First table (2015)

In [8]:
df_2019 = df[0][0]

df_2019
Out[8]:
Rk Player Pos Age Tm G GS MP FG FGA ... FT% ORB DRB TRB AST STL BLK TOV PF PTS
0 1 Quincy Acy PF 24 NYK 68 22 18.9 2.2 4.9 ... .784 1.2 3.3 4.4 1.0 0.4 0.3 0.9 2.2 5.9
1 2 Jordan Adams SG 20 MEM 30 0 8.3 1.2 2.9 ... .609 0.3 0.6 0.9 0.5 0.5 0.2 0.5 0.8 3.1
2 3 Steven Adams C 21 OKC 70 67 25.3 3.1 5.7 ... .502 2.8 4.6 7.5 0.9 0.5 1.2 1.4 3.2 7.7
3 4 Jeff Adrien PF 28 MIN 17 0 12.6 1.1 2.6 ... .579 1.4 3.2 4.5 0.9 0.2 0.5 0.5 1.8 3.5
4 5 Arron Afflalo SG 29 TOT 78 72 32.1 4.8 11.3 ... .843 0.3 2.8 3.2 1.7 0.5 0.1 1.5 2.1 13.3
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
670 490 Thaddeus Young PF 26 TOT 76 68 32.0 5.9 12.7 ... .655 1.7 3.7 5.4 2.3 1.6 0.3 1.5 2.3 14.1
671 490 Thaddeus Young PF 26 MIN 48 48 33.4 6.0 13.4 ... .682 1.6 3.5 5.1 2.8 1.8 0.4 1.6 2.4 14.3
672 490 Thaddeus Young PF 26 BRK 28 20 29.6 5.8 11.7 ... .606 1.9 4.1 5.9 1.4 1.4 0.3 1.5 2.0 13.8
673 491 Cody Zeller C 22 CHO 62 45 24.0 2.8 6.0 ... .774 1.6 4.3 5.8 1.6 0.5 0.8 1.0 2.5 7.6
674 492 Tyler Zeller C 25 BOS 82 59 21.1 4.1 7.5 ... .823 1.8 3.9 5.7 1.4 0.2 0.6 0.9 2.5 10.2

675 rows × 30 columns

Select the Second table (2016)

In [9]:
df_2019 = df[1][0]

df_2019
Out[9]:
Rk Player Pos Age Tm G GS MP FG FGA ... FT% ORB DRB TRB AST STL BLK TOV PF PTS
0 1 Quincy Acy PF 25 SAC 59 29 14.8 2.0 3.6 ... .735 1.1 2.1 3.2 0.5 0.5 0.4 0.5 1.7 5.2
1 2 Jordan Adams SG 21 MEM 2 0 7.5 1.0 3.0 ... .600 0.0 1.0 1.0 1.5 1.5 0.0 1.0 1.0 3.5
2 3 Steven Adams C 22 OKC 80 80 25.2 3.3 5.3 ... .582 2.7 3.9 6.7 0.8 0.5 1.1 1.1 2.8 8.0
3 4 Arron Afflalo SG 30 NYK 71 57 33.4 5.0 11.3 ... .840 0.3 3.4 3.7 2.0 0.4 0.1 1.2 2.0 12.8
4 5 Alexis Ajinça C 27 NOP 59 17 14.6 2.5 5.3 ... .839 1.3 3.3 4.6 0.5 0.3 0.6 0.9 2.3 6.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
596 472 Joe Young PG 23 IND 41 0 9.4 1.5 4.1 ... .800 0.1 1.1 1.2 1.6 0.4 0.0 0.8 0.7 3.8
597 473 Nick Young SG 30 LAL 54 2 19.1 2.3 6.9 ... .829 0.3 1.5 1.8 0.6 0.4 0.1 0.6 0.9 7.3
598 474 Thaddeus Young PF 27 BRK 73 73 33.0 6.8 13.2 ... .644 2.4 6.6 9.0 1.9 1.5 0.5 1.9 2.5 15.1
599 475 Cody Zeller C 23 CHO 73 60 24.3 3.2 6.0 ... .754 1.9 4.3 6.2 1.0 0.8 0.9 0.9 2.8 8.7
600 476 Tyler Zeller C 26 BOS 60 3 11.8 2.3 4.8 ... .815 1.0 1.9 3.0 0.5 0.2 0.4 0.8 1.6 6.1

601 rows × 30 columns

Select the last table (2019)

In [10]:
df_2019 = df[4][0]

df_2019
Out[10]:
Rk Player Pos Age Tm G GS MP FG FGA ... FT% ORB DRB TRB AST STL BLK TOV PF PTS
0 1 Álex Abrines SG 25 OKC 31 2 19.0 1.8 5.1 ... .923 0.2 1.4 1.5 0.6 0.5 0.2 0.5 1.7 5.3
1 2 Quincy Acy PF 28 PHO 10 0 12.3 0.4 1.8 ... .700 0.3 2.2 2.5 0.8 0.1 0.4 0.4 2.4 1.7
2 3 Jaylen Adams PG 22 ATL 34 1 12.6 1.1 3.2 ... .778 0.3 1.4 1.8 1.9 0.4 0.1 0.8 1.3 3.2
3 4 Steven Adams C 25 OKC 80 80 33.4 6.0 10.1 ... .500 4.9 4.6 9.5 1.6 1.5 1.0 1.7 2.6 13.9
4 5 Bam Adebayo C 21 MIA 82 28 23.3 3.4 5.9 ... .735 2.0 5.3 7.3 2.2 0.9 0.8 1.5 2.5 8.9
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
729 528 Tyler Zeller C 29 MEM 4 1 20.5 4.0 7.0 ... .778 2.3 2.3 4.5 0.8 0.3 0.8 1.0 4.0 11.5
730 529 Ante Žižić C 22 CLE 59 25 18.3 3.1 5.6 ... .705 1.8 3.6 5.4 0.9 0.2 0.4 1.0 1.9 7.8
731 530 Ivica Zubac C 21 TOT 59 37 17.6 3.6 6.4 ... .802 1.9 4.2 6.1 1.1 0.2 0.9 1.2 2.3 8.9
732 530 Ivica Zubac C 21 LAL 33 12 15.6 3.4 5.8 ... .864 1.6 3.3 4.9 0.8 0.1 0.8 1.0 2.2 8.5
733 530 Ivica Zubac C 21 LAC 26 25 20.2 3.8 7.2 ... .733 2.3 5.3 7.7 1.5 0.4 0.9 1.4 2.5 9.4

734 rows × 30 columns

Data cleaning (Removing Extra Headers)

We can see that the table header is presented multiple times in several rows.

In [11]:
df_2019[df_2019.Age == 'Age']
Out[11]:
Rk Player Pos Age Tm G GS MP FG FGA ... FT% ORB DRB TRB AST STL BLK TOV PF PTS
22 Rk Player Pos Age Tm G GS MP FG FGA ... FT% ORB DRB TRB AST STL BLK TOV PF PTS
49 Rk Player Pos Age Tm G GS MP FG FGA ... FT% ORB DRB TRB AST STL BLK TOV PF PTS
70 Rk Player Pos Age Tm G GS MP FG FGA ... FT% ORB DRB TRB AST STL BLK TOV PF PTS
97 Rk Player Pos Age Tm G GS MP FG FGA ... FT% ORB DRB TRB AST STL BLK TOV PF PTS
132 Rk Player Pos Age Tm G GS MP FG FGA ... FT% ORB DRB TRB AST STL BLK TOV PF PTS
161 Rk Player Pos Age Tm G GS MP FG FGA ... FT% ORB DRB TRB AST STL BLK TOV PF PTS
186 Rk Player Pos Age Tm G GS MP FG FGA ... FT% ORB DRB TRB AST STL BLK TOV PF PTS
217 Rk Player Pos Age Tm G GS MP FG FGA ... FT% ORB DRB TRB AST STL BLK TOV PF PTS
244 Rk Player Pos Age Tm G GS MP FG FGA ... FT% ORB DRB TRB AST STL BLK TOV PF PTS
269 Rk Player Pos Age Tm G GS MP FG FGA ... FT% ORB DRB TRB AST STL BLK TOV PF PTS
297 Rk Player Pos Age Tm G GS MP FG FGA ... FT% ORB DRB TRB AST STL BLK TOV PF PTS
324 Rk Player Pos Age Tm G GS MP FG FGA ... FT% ORB DRB TRB AST STL BLK TOV PF PTS
349 Rk Player Pos Age Tm G GS MP FG FGA ... FT% ORB DRB TRB AST STL BLK TOV PF PTS
382 Rk Player Pos Age Tm G GS MP FG FGA ... FT% ORB DRB TRB AST STL BLK TOV PF PTS
411 Rk Player Pos Age Tm G GS MP FG FGA ... FT% ORB DRB TRB AST STL BLK TOV PF PTS
438 Rk Player Pos Age Tm G GS MP FG FGA ... FT% ORB DRB TRB AST STL BLK TOV PF PTS
468 Rk Player Pos Age Tm G GS MP FG FGA ... FT% ORB DRB TRB AST STL BLK TOV PF PTS
498 Rk Player Pos Age Tm G GS MP FG FGA ... FT% ORB DRB TRB AST STL BLK TOV PF PTS
527 Rk Player Pos Age Tm G GS MP FG FGA ... FT% ORB DRB TRB AST STL BLK TOV PF PTS
554 Rk Player Pos Age Tm G GS MP FG FGA ... FT% ORB DRB TRB AST STL BLK TOV PF PTS
579 Rk Player Pos Age Tm G GS MP FG FGA ... FT% ORB DRB TRB AST STL BLK TOV PF PTS
604 Rk Player Pos Age Tm G GS MP FG FGA ... FT% ORB DRB TRB AST STL BLK TOV PF PTS
642 Rk Player Pos Age Tm G GS MP FG FGA ... FT% ORB DRB TRB AST STL BLK TOV PF PTS
671 Rk Player Pos Age Tm G GS MP FG FGA ... FT% ORB DRB TRB AST STL BLK TOV PF PTS
694 Rk Player Pos Age Tm G GS MP FG FGA ... FT% ORB DRB TRB AST STL BLK TOV PF PTS
715 Rk Player Pos Age Tm G GS MP FG FGA ... FT% ORB DRB TRB AST STL BLK TOV PF PTS

26 rows × 30 columns

Number of Extra Headers

In [12]:
len(df_2019[df_2019.Age == 'Age'])
Out[12]:
26

Dropping Extra Headers

In [13]:
df = df_2019.drop(df_2019[df_2019.Age == 'Age'].index)

df.shape
Out[13]:
(708, 30)

Quick Exploratory Data Analysis

Histogram

In [18]:
import seaborn as sns

sns.distplot(df.PTS, 
             kde=False,
             hist_kws=dict(edgecolor="black", linewidth=2),
             color='#00BFC4')
Out[18]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fdecca4fdd0>