Automated EDA Libraries¶

- Pandas-Profiling 
- Sweet-Viz
- Auto-Viz
- D-Tale

Source

Importing the Relevant Libraries¶

import pandas as pd
import matplotlib.pyplot as ply
import seaborn as sns
sns.set()

Importing the Data¶

url = "https://datascienceschools.github.io/Exploratory_Data_Analysis/HR_Analytics.csv"

df = pd.read_csv(url)

df.head()

Pandas-Profiling¶

The pandas-profiling library generates a report having:

An overview of the dataset
Variable properties
Interaction of variables
Correlation of variables
Sample data
Missing values

Installing Pandas-Profiling¶

!pip install pandas-profiling

EDA with Pandas-Profiling¶

from pandas_profiling import ProfileReport

profile = ProfileReport(df, explorative=True)

Saving Results to HTML file¶

profile.to_file("output.html")

Sweetviz¶

The Sweetviz library generates a report having:

An overview of the dataset
Variable properties
Categorical associations
Numerical associations
Most frequent, smallest, largest values for numerical features

Installing sweetviz¶

!pip install sweetviz

EDA with Sweetviz¶

import sweetviz as sv

sweet_report = sv.analyze(df)

Saving Results to HTML file¶

sweet_report.show_html("output_sweetViz.html")

Report output_sweetViz.html was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop up, regardless, the report IS saved in your notebook/colab files.

Autoviz¶

The Autoviz library generates a report having:

An overview of the dataset
Pairwise scatter plot of continuous variables
Distribution of categorical variables
Heatmaps of continuous variables
Average numerical variable by each categorical variable

Installing Autoviz, xlrd¶

!pip install autoviz

!pip install xlrd

EDA with Autoviz¶

from autoviz.AutoViz_Class import AutoViz_Class

autoviz = AutoViz_Class().AutoViz(url)

Imported AutoViz_Class version: 0.0.81. Call using:
    from autoviz.AutoViz_Class import AutoViz_Class
    AV = AutoViz_Class()
    AV.AutoViz(filename, sep=',', depVar='', dfte=None, header=0, verbose=0,
                            lowess=False,chart_format='svg',max_rows_analyzed=150000,max_cols_analyzed=30)
Note: verbose=0 or 1 generates charts and displays them in your local Jupyter notebook.
      verbose=2 saves plots in your local machine under AutoViz_Plots directory and does not display charts.
Shape of your Data Set: (19158, 14)
############## C L A S S I F Y I N G  V A R I A B L E S  ####################
Classifying variables in data set...
    Number of Numeric Columns =  1
    Number of Integer-Categorical Columns =  1
    Number of String-Categorical Columns =  7
    Number of Factor-Categorical Columns =  0
    Number of String-Boolean Columns =  1
    Number of Numeric-Boolean Columns =  1
    Number of Discrete String Columns =  2
    Number of NLP String Columns =  0
    Number of Date Time Columns =  0
    Number of ID Columns =  1
    Number of Columns to Delete =  0
    14 Predictors classified...
        This does not include the Target column(s)
        3 variables removed since they were ID or low-information variables

Time to run AutoViz (in seconds) = 10.334

 ###################### VISUALIZATION Completed ########################

D-Tale¶

Installing D-Tale¶

!pip install dtale

EDA with D-Tale¶

import dtale

dtale.show(df)

	enrollee_id	city	city_development_index	gender	relevent_experience	enrolled_university	education_level	major_discipline	experience	company_size	company_type	last_new_job	training_hours	target
0	8949	city_103	0.920	Male	Has relevent experience	no_enrollment	Graduate	STEM	>20	NaN	NaN	1	36	1.0
1	29725	city_40	0.776	Male	No relevent experience	no_enrollment	Graduate	STEM	15	50-99	Pvt Ltd	>4	47	0.0
2	11561	city_21	0.624	NaN	No relevent experience	Full time course	Graduate	STEM	5	NaN	NaN	never	83	0.0
3	33241	city_115	0.789	NaN	No relevent experience	NaN	Graduate	Business Degree	<1	NaN	Pvt Ltd	never	52	1.0
4	666	city_162	0.767	Male	Has relevent experience	no_enrollment	Masters	STEM	>20	50-99	Funded Startup	4	8	0.0