The Normal or Gaussian Distribution

In [1]:
from IPython.display import Image
from IPython.core.display import HTML 
Image(url= "bellcurve.jpg", width=600)
Out[1]:
- The normal or Gaussian distribution is a continuous probability distribution characterized by a symmetric bell-shaped curve

- A normal distribution is defined by its

        - center (mean, μ) 
        - spread (standard deviation, σ)

- The bulk of the observations generated from a normal distribution lie near the mean, which lies at the exact center of the distribution: as a rule of thumb, about 68% of the data lies within 1 standard deviation of the mean, 95% lies within 2 standard deviations and 99.7% lies within 3 standard deviations. The normal distribution is perhaps the most important distribution in all of statistics. It turns out that many real world phenomena, like IQ test scores and human heights, roughly follow a normal distribution, so it is often used to model random variables. Many common statistical tests assume distributions are normal

source

In [2]:
from IPython.display import Image
from IPython.core.display import HTML 
Image(url= "normal_distribution_formula.jpg", width=400)
Out[2]:

Python Code

In [ ]:
1/(sigma * np.sqrt(2 * np.pi)) *
               np.exp( - (bins - mu)**2 / (2 * sigma**2)

Heights of Men

- A normal distribution of 1000 observations for heights of men in the US

- Identify minimum height of 2.2% of tallest man 

    - Check the bell curve:

        - 2.1% + 0.1% = 2.2% -> mean + 2 * standard_deviation

Importing the Relevant Libraries

In [3]:
import numpy as np

from numpy import random

import matplotlib.pyplot as plt

Identify minimum height of 2.2% of tallest Men

In [4]:
men_mean = 175.5

men_sd = 7.4

min_tallest_men = 175.5 + 2 * 7.4

print(min_tallest_men)
190.3

Random 1000 Observations (Men)

In [5]:
mu, sigma = 175.5, 7.4 # mean and standard deviation
s = np.random.normal(mu, sigma, 1000)

Normal Distrinution (Men)

In [6]:
count, bins, ignored = plt.hist(s, 50, density=True)

plt.plot(bins, 1/(sigma * np.sqrt(2 * np.pi)) *
               np.exp( - (bins - mu)**2 / (2 * sigma**2) ),
         linewidth=2, color='r')

plt.title('Men')
plt.show()

Heights of Women

- A normal distribution of 1000 observations for heights of women in the US

- Identify minimum height of 2.2% of tallest women  

    - Check the bell curve:

        - 2.1% + 0.1% = 2.2% -> mean + 2 * standard_deviation

Identify minimum height of 2.2% of tallest women

In [7]:
women_mean = 161.8

women_sd = 6.9

min_tallest_women = 161.8 + 2 * 6.9

print(min_tallest_women)
175.60000000000002

Random 1000 Observations (Women)

In [8]:
mu, sigma = 161.8, 6.9 # mean and standard deviation
s = np.random.normal(mu, sigma, 1000)

Normal Distrinution (Women)

In [9]:
count, bins, ignored = plt.hist(s, 50, density=True)

plt.plot(bins, 1/(sigma * np.sqrt(2 * np.pi)) *
               np.exp( - (bins - mu)**2 / (2 * sigma**2) ),
         linewidth=2, color='r')

plt.title('Women')
plt.show()