Case Study (Wine Quality) : Query

In [58]:
import pandas as pd

df= pd.read_csv('wine.csv')

Query : First Way

Example 1 :

In [59]:
df_new = df.query('fixed_acidity > 7')

df_new.head()
Out[59]:
fixed_acidity volatile_acidity citric_acid residual_sugar chlorides free_sulfur_dioxide total_sulfur_dioxide density ph sulphates alcohol quality color
0 7.4 0.70 0.00 1.9 0.076 11.0 34.0 0.9978 3.51 0.56 9.4 5 red
1 7.8 0.88 0.00 2.6 0.098 25.0 67.0 0.9968 3.20 0.68 9.8 5 red
2 7.8 0.76 0.04 2.3 0.092 15.0 54.0 0.9970 3.26 0.65 9.8 5 red
3 11.2 0.28 0.56 1.9 0.075 17.0 60.0 0.9980 3.16 0.58 9.8 6 red
4 7.4 0.70 0.00 1.9 0.076 11.0 34.0 0.9978 3.51 0.56 9.4 5 red

Example 2 :

In [60]:
df_new = df.query('color == "red"')

df_new.head()
Out[60]:
fixed_acidity volatile_acidity citric_acid residual_sugar chlorides free_sulfur_dioxide total_sulfur_dioxide density ph sulphates alcohol quality color
0 7.4 0.70 0.00 1.9 0.076 11.0 34.0 0.9978 3.51 0.56 9.4 5 red
1 7.8 0.88 0.00 2.6 0.098 25.0 67.0 0.9968 3.20 0.68 9.8 5 red
2 7.8 0.76 0.04 2.3 0.092 15.0 54.0 0.9970 3.26 0.65 9.8 5 red
3 11.2 0.28 0.56 1.9 0.075 17.0 60.0 0.9980 3.16 0.58 9.8 6 red
4 7.4 0.70 0.00 1.9 0.076 11.0 34.0 0.9978 3.51 0.56 9.4 5 red

Query : Second Way

Example 1 :

In [61]:
df_new = df[df['fixed_acidity'] > 7]

df_new.head()
Out[61]:
fixed_acidity volatile_acidity citric_acid residual_sugar chlorides free_sulfur_dioxide total_sulfur_dioxide density ph sulphates alcohol quality color
0 7.4 0.70 0.00 1.9 0.076 11.0 34.0 0.9978 3.51 0.56 9.4 5 red
1 7.8 0.88 0.00 2.6 0.098 25.0 67.0 0.9968 3.20 0.68 9.8 5 red
2 7.8 0.76 0.04 2.3 0.092 15.0 54.0 0.9970 3.26 0.65 9.8 5 red
3 11.2 0.28 0.56 1.9 0.075 17.0 60.0 0.9980 3.16 0.58 9.8 6 red
4 7.4 0.70 0.00 1.9 0.076 11.0 34.0 0.9978 3.51 0.56 9.4 5 red

Example 2 :

In [62]:
df_new = df[df['color'] == "red"]
df_new.head()
Out[62]:
fixed_acidity volatile_acidity citric_acid residual_sugar chlorides free_sulfur_dioxide total_sulfur_dioxide density ph sulphates alcohol quality color
0 7.4 0.70 0.00 1.9 0.076 11.0 34.0 0.9978 3.51 0.56 9.4 5 red
1 7.8 0.88 0.00 2.6 0.098 25.0 67.0 0.9968 3.20 0.68 9.8 5 red
2 7.8 0.76 0.04 2.3 0.092 15.0 54.0 0.9970 3.26 0.65 9.8 5 red
3 11.2 0.28 0.56 1.9 0.075 17.0 60.0 0.9980 3.16 0.58 9.8 6 red
4 7.4 0.70 0.00 1.9 0.076 11.0 34.0 0.9978 3.51 0.56 9.4 5 red

Question:

Do wines with higher alcoholic content generally has higher average quality?

Answer: Yes

- You should use the median to split the samples into two groups -> 

         1.Low_alcohol
         2.High_alcohol


- You should calculte averge quality for both groups

Splitting data into two groups

- Calculate median for 'alcohol'

- Low_alcohol -> data with 'alcohol' content less than the median

- High_alcohol -> data with 'alcohol' content greater than or equal to the median
In [63]:
Median = df['alcohol'].median()

Low_alcohol = df[df['alcohol'] < Median]

High_alcohol = df[df['alcohol'] >= Median]

Finding average quality for Low_alcohol group

In [70]:
print("Low_alcohol Result:")

Low_alcohol[['alcohol','quality']].mean()
Low_alcohol Result:
Out[70]:
alcohol    9.485463
quality    5.475921
dtype: float64

Finding average quality for High_alcohol group

In [69]:
print("High_alcohol Result:")

High_alcohol[['alcohol','quality']].mean()
High_alcohol Result:
Out[69]:
alcohol    11.454793
quality     6.146084
dtype: float64

Question:

Do sweeter wines (more residual sugar) has higher average quality?

Answer : Yes

  • You should use the median to split the samples into two groups ->

       1.less residual
       2.more residual
  • You should calculte averge quality for both groups

Splitting data into two groups

 - Calculate median for 'residual_sugar'

 - less_residual: 

     data with 'residual_sugar' content less than the median

 - more_residual:

     data with 'residual_sugar' content greater than or equal to the median
In [66]:
Median = df['residual_sugar'].median()

less_residual = df[df['residual_sugar'] < Median]

more_residual = df[df['residual_sugar'] >= Median]

Finding average quality for Less Residual group

In [67]:
print("Less Residual Result:")

less_residual[['residual_sugar','quality']].mean()
Less Residual Result:
Out[67]:
residual_sugar    1.826418
quality           5.808801
dtype: float64

Finding average quality for more Residual group

In [68]:
print("More Residual Result:")

more_residual[['residual_sugar','quality']].mean()
More Residual Result:
Out[68]:
residual_sugar    9.012492
quality           5.827829
dtype: float64