from IPython.display import Image
from IPython.core.display import HTML
Image(url= "bellcurve.jpg", width=600)
- The normal or Gaussian distribution is a continuous probability distribution characterized by a symmetric bell-shaped curve
- A normal distribution is defined by its
- center (mean, μ)
- spread (standard deviation, σ)
- The bulk of the observations generated from a normal distribution lie near the mean, which lies at the exact center of the distribution: as a rule of thumb, about 68% of the data lies within 1 standard deviation of the mean, 95% lies within 2 standard deviations and 99.7% lies within 3 standard deviations. The normal distribution is perhaps the most important distribution in all of statistics. It turns out that many real world phenomena, like IQ test scores and human heights, roughly follow a normal distribution, so it is often used to model random variables. Many common statistical tests assume distributions are normal
from IPython.display import Image
from IPython.core.display import HTML
Image(url= "normal_distribution_formula.jpg", width=400)
1/(sigma * np.sqrt(2 * np.pi)) *
np.exp( - (bins - mu)**2 / (2 * sigma**2)
- A normal distribution of 1000 observations for heights of men in the US
- Identify minimum height of 2.2% of tallest man
- Check the bell curve:
- 2.1% + 0.1% = 2.2% -> mean + 2 * standard_deviation
import numpy as np
from numpy import random
import matplotlib.pyplot as plt
men_mean = 175.5
men_sd = 7.4
min_tallest_men = 175.5 + 2 * 7.4
print(min_tallest_men)
mu, sigma = 175.5, 7.4 # mean and standard deviation
s = np.random.normal(mu, sigma, 1000)
count, bins, ignored = plt.hist(s, 50, density=True)
plt.plot(bins, 1/(sigma * np.sqrt(2 * np.pi)) *
np.exp( - (bins - mu)**2 / (2 * sigma**2) ),
linewidth=2, color='r')
plt.title('Men')
plt.show()
- A normal distribution of 1000 observations for heights of women in the US
- Identify minimum height of 2.2% of tallest women
- Check the bell curve:
- 2.1% + 0.1% = 2.2% -> mean + 2 * standard_deviation
women_mean = 161.8
women_sd = 6.9
min_tallest_women = 161.8 + 2 * 6.9
print(min_tallest_women)
mu, sigma = 161.8, 6.9 # mean and standard deviation
s = np.random.normal(mu, sigma, 1000)
count, bins, ignored = plt.hist(s, 50, density=True)
plt.plot(bins, 1/(sigma * np.sqrt(2 * np.pi)) *
np.exp( - (bins - mu)**2 / (2 * sigma**2) ),
linewidth=2, color='r')
plt.title('Women')
plt.show()