Case Study (Wine Quality) : Query¶

import pandas as pd

df= pd.read_csv('wine.csv')

Query : First Way¶

Example 1 :¶

df_new = df.query('fixed_acidity > 7')

df_new.head()

Example 2 :¶

df_new = df.query('color == "red"')

df_new.head()

Query : Second Way¶

Example 1 :¶

df_new = df[df['fixed_acidity'] > 7]

df_new.head()

Example 2 :¶

df_new = df[df['color'] == "red"]
df_new.head()

Question:¶

Do wines with higher alcoholic content generally has higher average quality?¶

Answer: Yes¶

- You should use the median to split the samples into two groups -> 

         1.Low_alcohol
         2.High_alcohol


- You should calculte averge quality for both groups

Splitting data into two groups¶

- Calculate median for 'alcohol'

- Low_alcohol -> data with 'alcohol' content less than the median

- High_alcohol -> data with 'alcohol' content greater than or equal to the median

Median = df['alcohol'].median()

Low_alcohol = df[df['alcohol'] < Median]

High_alcohol = df[df['alcohol'] >= Median]

Finding average quality for Low_alcohol group¶

print("Low_alcohol Result:")

Low_alcohol[['alcohol','quality']].mean()

Low_alcohol Result:

alcohol    9.485463
quality    5.475921
dtype: float64

Finding average quality for High_alcohol group¶

print("High_alcohol Result:")

High_alcohol[['alcohol','quality']].mean()

High_alcohol Result:

alcohol    11.454793
quality     6.146084
dtype: float64

Question:¶

Do sweeter wines (more residual sugar) has higher average quality?¶

Answer : Yes¶

You should use the median to split the samples into two groups ->
```
   1.less residual
   2.more residual
```

You should calculte averge quality for both groups

Splitting data into two groups¶

 - Calculate median for 'residual_sugar'

 - less_residual: 

     data with 'residual_sugar' content less than the median

 - more_residual:

     data with 'residual_sugar' content greater than or equal to the median

Median = df['residual_sugar'].median()

less_residual = df[df['residual_sugar'] < Median]

more_residual = df[df['residual_sugar'] >= Median]

Finding average quality for Less Residual group¶

print("Less Residual Result:")

less_residual[['residual_sugar','quality']].mean()

Less Residual Result:

residual_sugar    1.826418
quality           5.808801
dtype: float64

Finding average quality for more Residual group¶

print("More Residual Result:")

more_residual[['residual_sugar','quality']].mean()

More Residual Result:

residual_sugar    9.012492
quality           5.827829
dtype: float64

	fixed_acidity	volatile_acidity	citric_acid	residual_sugar	chlorides	free_sulfur_dioxide	total_sulfur_dioxide	density	ph	sulphates	alcohol	quality	color
0	7.4	0.70	0.00	1.9	0.076	11.0	34.0	0.9978	3.51	0.56	9.4	5	red
1	7.8	0.88	0.00	2.6	0.098	25.0	67.0	0.9968	3.20	0.68	9.8	5	red
2	7.8	0.76	0.04	2.3	0.092	15.0	54.0	0.9970	3.26	0.65	9.8	5	red
3	11.2	0.28	0.56	1.9	0.075	17.0	60.0	0.9980	3.16	0.58	9.8	6	red
4	7.4	0.70	0.00	1.9	0.076	11.0	34.0	0.9978	3.51	0.56	9.4	5	red

	fixed_acidity	volatile_acidity	citric_acid	residual_sugar	chlorides	free_sulfur_dioxide	total_sulfur_dioxide	density	ph	sulphates	alcohol	quality	color
0	7.4	0.70	0.00	1.9	0.076	11.0	34.0	0.9978	3.51	0.56	9.4	5	red
1	7.8	0.88	0.00	2.6	0.098	25.0	67.0	0.9968	3.20	0.68	9.8	5	red
2	7.8	0.76	0.04	2.3	0.092	15.0	54.0	0.9970	3.26	0.65	9.8	5	red
3	11.2	0.28	0.56	1.9	0.075	17.0	60.0	0.9980	3.16	0.58	9.8	6	red
4	7.4	0.70	0.00	1.9	0.076	11.0	34.0	0.9978	3.51	0.56	9.4	5	red

	fixed_acidity	volatile_acidity	citric_acid	residual_sugar	chlorides	free_sulfur_dioxide	total_sulfur_dioxide	density	ph	sulphates	alcohol	quality	color
0	7.4	0.70	0.00	1.9	0.076	11.0	34.0	0.9978	3.51	0.56	9.4	5	red
1	7.8	0.88	0.00	2.6	0.098	25.0	67.0	0.9968	3.20	0.68	9.8	5	red
2	7.8	0.76	0.04	2.3	0.092	15.0	54.0	0.9970	3.26	0.65	9.8	5	red
3	11.2	0.28	0.56	1.9	0.075	17.0	60.0	0.9980	3.16	0.58	9.8	6	red
4	7.4	0.70	0.00	1.9	0.076	11.0	34.0	0.9978	3.51	0.56	9.4	5	red

	fixed_acidity	volatile_acidity	citric_acid	residual_sugar	chlorides	free_sulfur_dioxide	total_sulfur_dioxide	density	ph	sulphates	alcohol	quality	color
0	7.4	0.70	0.00	1.9	0.076	11.0	34.0	0.9978	3.51	0.56	9.4	5	red
1	7.8	0.88	0.00	2.6	0.098	25.0	67.0	0.9968	3.20	0.68	9.8	5	red
2	7.8	0.76	0.04	2.3	0.092	15.0	54.0	0.9970	3.26	0.65	9.8	5	red
3	11.2	0.28	0.56	1.9	0.075	17.0	60.0	0.9980	3.16	0.58	9.8	6	red
4	7.4	0.70	0.00	1.9	0.076	11.0	34.0	0.9978	3.51	0.56	9.4	5	red