Normal Probability Plot – Explanation & Examples

Normal Probability PlotThe definition of the normal probability plot is:

β€œThe normal probability plot is a plot used to assess the normal distribution of numerical data.”

In this topic, we will discuss the normal probability plot from the following aspects:

  1. What is a normal probability plot?
  2. How to make a normal probability plot?
  3. How to read a normal probability plot?
  4. Practice questions.
  5. Answer key.

1. What is a normal probability plot?

The normal probability plot is a plot used to assess the normal distribution of any numerical data.

Making a histogram of your data can help you decide whether or not a set of data is normal, but there is a more specialized type of plot you can create, called a normal probability plot.

If the data follow a normal distribution then a normal probability plot of the theoretical percentiles of the normal distribution on the x-axis versus the observed sample percentiles on the y-axis should be approximately linear.

The theoretical p% percentile of a normal distribution is the value such that p% of the values are lower than that value.

The sample p% percentile of any numerical data is the value such that p% of the measurements fall below that value.

For example, the 50% percentile or the median is the value so that 50% or half of your measurements fall below that value.

Another example, the 27% percentile is the value so that 27% of the data points in your numerical data fall below that value.

2. How to make a normal probability plot?

We will go through several examples.

– Example 1

The following are the weights (in kg) of 100 persons from a certain survey.

52.44 52.77 54.56 53.07 53.13 54.72 53.46 51.73 52.31 52.55 54.22 53.36 53.40 53.11 52.44 54.79 53.50 51.03 53.70 52.53 51.93 52.78 51.97 52.27 52.37 51.31 53.84 53.15 51.86 54.25 53.43 52.70 53.90 53.88 53.82 53.69 53.55 52.94 52.69 52.62 52.31 52.79 51.73 55.17 54.21 51.88 52.60 52.53 53.78 52.92 53.25 52.97 52.96 54.37 52.77 54.52 51.45 53.58 53.12 53.22 53.38 52.50 52.67 51.98 51.93 53.30 53.45 53.05 53.92 55.05 52.51 50.69 54.01 52.29 52.31 54.03 52.72 51.78 53.18 52.86 53.01 53.39 52.63 53.64 52.78 53.33 54.10 53.44 52.67 54.15 53.99 53.55 53.24 52.37 54.36 52.40 55.19 54.53 52.76 51.97.

Draw a normal probability plot of this data.

1. Order the numbers from smallest to largest number.

50.69 51.03 51.31 51.45 51.73 51.73 51.78 51.86 51.88 51.93 51.93 51.97 51.97 51.98 52.27 52.29 52.31 52.31 52.31 52.37 52.37 52.40 52.44 52.44 52.50 52.51 52.53 52.53 52.55 52.60 52.62 52.63 52.67 52.67 52.69 52.70 52.72 52.76 52.77 52.77 52.78 52.78 52.79 52.86 52.92 52.94 52.96 52.97 53.01 53.05 53.07 53.11 53.12 53.13 53.15 53.18 53.22 53.24 53.25 53.30 53.33 53.36 53.38 53.39 53.40 53.43 53.44 53.45 53.46 53.50 53.55 53.55 53.58 53.64 53.69 53.70 53.78 53.82 53.84 53.88 53.90 53.92 53.99 54.01 54.03 54.10 54.15 54.21 54.22 54.25 54.36 54.37 54.52 54.53 54.56 54.72 54.79 55.05 55.17 55.19.

2. Assign a rank to each value of your data.

weight

rank
50.69

1

51.03

2
51.31

3

51.45

4
51.73

5

51.73

6
51.78

7

51.86

8
51.88

9

51.93

10
51.93

11

51.97

12

51.97

13
51.98

14

52.27

15
52.29

16

52.31

17
52.31

18

52.31

19
52.37

20

52.37

21
52.40

22

52.44

23
52.44

24

52.50

25
52.51

26

52.53

27
52.53

28

52.55

29
52.60

30

52.62

31
52.63

32

52.67

33
52.67

34

52.69

35
52.70

36

52.72

37
52.76

38

52.77

39
52.77

40

52.78

41
52.78

42

52.79

43
52.86

44

52.92

45
52.94

46

52.96

47
52.97

48

53.01

49
53.05

50

53.07

51
53.11

52

53.12

53
53.13

54

53.15

55
53.18

56

53.22

57
53.24

58

53.25

59
53.30

60

53.33

61
53.36

62

53.38

63
53.39

64

53.40

65
53.43

66

53.44

67
53.45

68

53.46

69
53.50

70

53.55

71

53.55

72
53.58

73

53.64

74
53.69

75

53.70

76
53.78

77

53.82

78
53.84

79

53.88

80
53.90

81

53.92

82
53.99

83

54.01

84
54.03

85

54.10

86
54.15

87

54.21

88
54.22

89

54.25

90
54.36

91

54.37

92
54.52

93

54.53

94
54.56

95

54.72

96
54.79

97

55.05

98

55.17

99
55.19

100

Note that repeated values or ties are ranked sequentially as usual.

The first (smallest) value is 50.69 so its rank is 1, the next value is 51.03 so its rank is 2.

The last (largest) value is 55.19 so its rank is 100.

3. Calculate the cumulative probability (pi) associated with each rank (i) using the following formula:

pi=(i-a)/(n+1-2a)

Where:

i = 1,2,3,…..n. n is the number of data points.

a = 3/8 for n ≀ 10, and = 0.5 for n > 10.

Since the number of data points = 100 which is larger than 10, so the formula reduces to:

pi=(i-0.5)/n

The following table will be produced:

weight

rankpi
50.691

0.005

51.03

20.015
51.31

3

0.025

51.45

40.035

51.73

50.045
51.73

6

0.055

51.78

70.065
51.868

0.075

51.88

90.085
51.9310

0.095

51.9311

0.105

51.97

120.115
51.9713

0.125

51.98

140.135
52.2715

0.145

52.29

160.155
52.3117

0.165

52.31

180.175
52.3119

0.185

52.37

200.195
52.3721

0.205

52.40

220.215
52.4423

0.225

52.44

240.235
52.5025

0.245

52.51

260.255
52.5327

0.265

52.53

280.275
52.5529

0.285

52.60

300.295
52.6231

0.305

52.63

320.315
52.6733

0.325

52.67

340.335
52.6935

0.345

52.70

360.355
52.7237

0.365

52.76

380.375
52.7739

0.385

52.77

400.395
52.7841

0.405

52.78

42

0.415

52.79

430.425
52.8644

0.435

52.92

450.445
52.9446

0.455

52.96

47

0.465

52.97

480.475
53.0149

0.485

53.05

500.495
53.0751

0.505

53.11

520.515
53.1253

0.525

53.13

540.535
53.1555

0.545

53.18

560.555
53.2257

0.565

53.24

58

0.575

53.25

590.585
53.3060

0.595

53.33

610.605
53.3662

0.615

53.38

63

0.625
53.3964

0.635

53.40

650.645
53.4366

0.655

53.44

670.665
53.4568

0.675

53.46

690.685
53.5070

0.695

53.55

710.705
53.5572

0.715

53.58

730.725
53.6474

0.735

53.69

750.745

53.70

76

0.755

53.78

770.765
53.8278

0.775

53.84

790.785
53.8880

0.795

53.90

810.805
53.9282

0.815

53.99

830.825

54.01

84

0.835

54.03

85

0.845
54.1086

0.855

54.15

870.865
54.2188

0.875

54.22

890.885
54.2590

0.895

54.36

910.905
54.3792

0.915

54.52

93

0.925

54.5394

0.935

54.56

950.945
54.72

96

0.955

54.7997

0.965

55.05

980.975
55.1799

0.985

55.19

100

0.995

4. Calculate the Z-score for each pi value (zi). The function qnorm of the R programming language finds the Z-score that is associated with each pi or probability.

For example, when pi = 0.5, the Z-score = 0.

qnorm(0.5)

## [1] 0

This is because the Z-score is for a normal distribution with mean = 0 and standard deviation = 1.

Z score box plot values

We know from the normal distribution properties that when the data value equals the mean or 0, the probability of data points < 0 = the probability of data points > 0 = 0.5.

As a result, the Z-score values are negative for every data point that has an associated p less than 0.5 and positive for those that have a p greater than 0.5.

The following table will be produced.

weight

rankpizi
50.6910.005

-2.58

51.03

20.015-2.17
51.3130.025

-1.96

51.45

40.035-1.81
51.7350.045

-1.70

51.73

60.055-1.60
51.7870.065

-1.51

51.86

80.075-1.44
51.8890.085

-1.37

51.93

100.095-1.31
51.93110.105

-1.25

51.97

120.115-1.20
51.97130.125

-1.15

51.98

140.135-1.10
52.27150.145

-1.06

52.29

160.155-1.02
52.31170.165

-0.97

52.31

180.175-0.93
52.31190.185

-0.90

52.37

200.195-0.86
52.37210.205

-0.82

52.40

220.215-0.79
52.44230.225

-0.76

52.44

240.235-0.72
52.50250.245

-0.69

52.51

260.255-0.66
52.53270.265

-0.63

52.53

280.275-0.60
52.55290.285

-0.57

52.60

300.295-0.54
52.62310.305

-0.51

52.63

320.315-0.48
52.67330.325

-0.45

52.67

340.335

-0.43

52.69

350.345-0.40
52.70360.355

-0.37

52.72

370.365-0.35
52.76380.375

-0.32

52.77

390.385-0.29
52.77400.395

-0.27

52.78

410.405-0.24
52.78420.415

-0.21

52.79

430.425-0.19
52.86440.435

-0.16

52.92

450.445-0.14
52.94460.455

-0.11

52.96

470.465-0.09
52.97480.475

-0.06

53.01

490.485-0.04
53.05500.495

-0.01

53.07

510.5050.01
53.11520.515

0.04

53.12

530.5250.06
53.13540.535

0.09

53.15

550.5450.11
53.18560.555

0.14

53.22

570.5650.16
53.24580.575

0.19

53.25

590.5850.21
53.30600.595

0.24

53.33

610.6050.27
53.36620.615

0.29

53.38

630.6250.32
53.39640.635

0.35

53.40

650.6450.37
53.43660.655

0.40

53.44

670.6650.43
53.45680.675

0.45

53.46

690.6850.48
53.50700.695

0.51

53.55

710.7050.54
53.55720.715

0.57

53.58

730.7250.60
53.64740.735

0.63

53.69

750.7450.66
53.70760.755

0.69

53.78

770.7650.72
53.82780.775

0.76

53.84

790.7850.79
53.88800.795

0.82

53.90

810.8050.86
53.92820.815

0.90

53.99

830.8250.93
54.01840.835

0.97

54.03

850.8451.02
54.10860.855

1.06

54.15

870.8651.10
54.21880.875

1.15

54.22

890.8851.20
54.25900.895

1.25

54.36

910.9051.31
54.37920.915

1.37

54.52

930.925

1.44

54.53

940.9351.51
54.56950.945

1.60

54.72

960.9551.70
54.79970.965

1.81

55.05

980.9751.96
55.17990.985

2.17

55.19

1000.995

2.58

5. Create an x-y scatter plot of your z-score values on the x-axis versus their corresponding data points on the y-axis.

x y scatter plot
6. If the weight data are consistent with the normal percentiles from a normal distribution, the points should lie close to a straight line.

As a reference, a straight line can be added to the plot which passes through the first and third quartiles.

From the table, we see that the first quartile (at pi = 0.25) was about 52.50 kg and zi = -0.69 and third quartile (at pi = 0.75) was 53.69 kg and zi = 0.66.

The further the points vary from this line, the greater the indication of departure from normality.

Plot with a straight line
Nearly all the data are on the straight line, so it is normally distributed data.

– Example 2

The following is the ankle diameter in centimeters, measured as the sum of two ankles for 60 physically active individuals from a certain survey.

14.1 15.1 14.1 15.0 14.9 13.9 15.6 14.6 13.2 15.0 14.5 16.0 15.4 13.2 14.0 14.0 16.0 14.7 14.8 15.5 13.9 14.4 13.8 14.1 14.7 14.9 15.3 14.5 13.2 13.2 15.8 14.0 15.1 15.0 12.9 14.0 13.0 14.0 15.4 16.4 15.2 13.8 14.9 16.0 16.0 16.3 15.3 16.5 14.4 13.4 14.4 14.2 15.4 15.0 13.0 13.0 14.8 16.2 15.4 14.4.

Reference:

Heinz G, Peterson LJ, Johnson RW, Kerk CJ. 2003. Exploring Relationships in Body Dimensions. Journal of Statistics Education 11(2).

Draw a normal probability plot of this data.

1. Order the numbers from smallest to largest number.

12.9 13.0 13.0 13.0 13.2 13.2 13.2 13.2 13.4 13.8 13.8 13.9 13.9 14.0 14.0 14.0 14.0 14.0 14.1 14.1 14.1 14.2 14.4 14.4 14.4 14.4 14.5 14.5 14.6 14.7 14.7 14.8 14.8 14.9 14.9 14.9 15.0 15.0 15.0 15.0 15.1 15.1 15.2 15.3 15.3 15.4 15.4 15.4 15.4 15.5 15.6 15.8 16.0 16.0 16.0 16.0 16.2 16.3 16.4 16.5.

2. Assign a rank to each value of your data.

diameter

rank
12.9

1

13.0

2
13.0

3

13.0

4
13.2

5

13.2

6

13.2

7
13.2

8

13.4

9
13.8

10

13.8

11
13.9

12

13.9

13
14.0

14

14.0

15
14.0

16

14.0

17

14.0

18
14.1

19

14.1

20
14.1

21

14.2

22

14.4

23
14.4

24

14.4

25

14.4

26
14.5

27

14.5

28
14.6

29

14.7

30

14.7

31
14.8

32

14.8

33

14.9

34
14.9

35

14.9

36
15.0

37

15.0

38

15.0

39

15.0

40
15.1

41

15.1

42

15.2

43
15.3

44

15.3

45
15.4

46

15.4

47
15.4

48

15.4

49
15.5

50

15.6

51
15.8

52

16.0

53
16.0

54

16.0

55
16.0

56

16.2

57

16.3

58
16.4

59

16.5

60

Note that repeated values or ties are ranked sequentially as usual.

The first (smallest) value is 12.9 cm so its rank is 1, the next value is 13.0 cm so its rank is 2.

The last (largest) value is 16.5 so its rank is 60.

3. Calculate the cumulative probability (pi) associated with each rank (I).

Since the number of data points = 60 which is larger than 10, so the formula reduces to:

pi=(i-0.5)/n

The following table will be produced:

diameter

rankpi
12.91

0.008

13.0

20.025
13.03

0.042

13.0

40.058
13.25

0.075

13.2

60.092
13.27

0.108

13.2

80.125
13.49

0.142

13.8

100.158
13.811

0.175

13.9

120.192
13.913

0.208

14.0

140.225
14.015

0.242

14.0

160.258
14.017

0.275

14.0

180.292
14.119

0.308

14.1

200.325
14.121

0.342

14.2

220.358
14.423

0.375

14.4

240.392
14.425

0.408

14.4

260.425
14.527

0.442

14.5

280.458
14.629

0.475

14.7

300.492
14.731

0.508

14.8

320.525
14.833

0.542

14.9

340.558
14.935

0.575

14.9

36

0.592

15.0

370.608
15.038

0.625

15.0

390.642
15.040

0.658

15.1

410.675
15.142

0.692

15.2

430.708
15.3

44

0.725

15.3

450.742
15.446

0.758

15.4

470.775
15.448

0.792

15.4

490.808
15.550

0.825

15.6

510.842
15.852

0.858

16.0

530.875
16.054

0.892

16.0

550.908
16.056

0.925

16.2

570.942
16.358

0.958

16.4

590.975
16.560

0.992

4. Calculate the Z-score for each pi value using the function qnorm of the R programming language.

diameter

rankpizi
12.910.008

-2.41

13.0

20.025-1.96
13.030.042

-1.73

13.0

40.058-1.57
13.250.075

-1.44

13.2

60.092-1.33
13.270.108

-1.24

13.2

80.125-1.15
13.490.142

-1.07

13.8

100.158-1.00
13.8110.175

-0.93

13.9

120.192

-0.87

13.9

130.208-0.81
14.0140.225

-0.76

14.0

150.242-0.70
14.0160.258

-0.65

14.0

170.275

-0.60

14.0

180.292-0.55
14.1190.308

-0.50

14.1

200.325-0.45
14.1210.342

-0.41

14.2

220.358-0.36
14.4230.375

-0.32

14.4

240.392-0.27
14.4250.408

-0.23

14.4

260.425-0.19
14.5270.442

-0.15

14.5

280.458

-0.11

14.6

290.475-0.06
14.7300.492

-0.02

14.7

310.5080.02
14.8320.525

0.06

14.8

330.5420.11
14.9340.558

0.15

14.9

350.5750.19
14.9360.592

0.23

15.0

370.6080.27
15.0380.625

0.32

15.0

390.6420.36
15.0400.658

0.41

15.1

410.6750.45
15.1420.692

0.50

15.2

430.7080.55
15.3440.725

0.60

15.3

450.7420.65
15.4460.758

0.70

15.4

470.7750.76
15.4480.792

0.81

15.4

490.8080.87
15.5500.825

0.93

15.6

510.8421.00
15.8520.858

1.07

16.0

530.8751.15
16.0540.892

1.24

16.0

550.9081.33
16.0560.925

1.44

16.2

570.9421.57
16.3580.958

1.73

16.4

590.9751.96
16.5600.992

2.41

5. Create an x-y scatter plot of your z-score values on the x-axis versus their corresponding data points on the y-axis.

x y scatter plot of your z score values on the x axis versus their corresponding data points on the y6. If the diameter data are consistent with the normal percentiles from a normal distribution, the points should lie close to a straight line.

As a reference, a straight line is plotted which passes through the first and third quartiles.

From the table, we see that the first quartile (at pi = 0.25) was about 14.0 cm and zi = -0.65 and third quartile (at pi = 0.75) was 15.4 cm and zi = 0.70.

Plot of normally distributed data
Nearly all the data are on the straight line, so it is normally distributed data.

3. How to read a normal probability plot?

The shape of a normal probability plot can tell you the distribution of your data.

– Example 1: normally-distributed variable

The following plot is the histogram and normal probability plot for heights in cm of 100 individuals.

Plot of histogram and normal probability plot for heights in cm of 100 individuals

When the data is normally distributed, the histogram is nearly symmetric, unimodal, and bell-shaped.

The normal probability plot of normally distributed data will show nearly all the points on the reference straight line, at least when the few large and small values are ignored.

– Example 2: normally-distributed variable with one outlier

The following plot is the histogram and normal probability plot for heights in cm of 100 individuals.

Plot is the histogram and normal probability plot for heights in cm of 100 individuals

The histogram of the data will be the same except for a faraway bin for the outlier.

The normal probability plot will show that nearly all the points are near the straight line except the far away outlier point.

– Example 3: Right-skewed variable

The following plot is the histogram and normal probability plot for the Annual income of 100 individuals.

plot is the histogram and normal probability plot for the Annual income of 100 individuals
The histogram of right-skewed data looks unimodal with less frequent large values.

The normal probability plot of right-skewed data has an inverted C shape.

– Example 4: Left-skewed variable

The following plot is the histogram and normal probability plot for the Physical ability Lawyers’ ratings of state judges in the US Superior Court.

plot is the histogram and normal probability plot for the Physical ability Lawyers ratings of state judges in the US Superior Court
The histogram of left-skewed data looks unimodal with less frequent small values.

The normal probability plot of left-skewed data has a nearly C shape.

4. Practice questions

1. The following is the age in years for 20 participants from a certain survey.

26 48 67 39 25 25 36 44 44 47 53 52 52 51 52 40 77 44 40 45.

Draw a normal probability plot of this data.

2. The following normal probability plots for the weights (in kg) of males and females from a certain survey.

normal probability plots for the weights in kg of males and females from a certain surveyWhich sex has a normally distributed weight?

3. The following normal probability plots for the total cholesterol (in mg/dl) of different smoking statuses from a certain survey.

normal probability plots for the total cholestero of different smoking statuses from a certain surveyWhich smoking status has a normally distributed total cholesterol level?

4. The following normal probability plots for the annual income (in USD) of different employment statuses from a certain survey.

normal probability plots for the annual income in USD of different employment statuses from a certain survey

Which employment status has a normally distributed annual income?

5. The following normal probability plots for the air pressure (in millibars) of different storm classes (status).

normal probability plots for the air pressure in millibars of different storm classes

Which storm class has a normally distributed pressure?

5. Answer key

1. Order the numbers from smallest to largest number.

25 25 26 36 39 40 40 44 44 44 45 47 48 51 52 52 52 53 67 77.

  • Assign a rank to each value of your data.

Age

rank
25

1

25

2
26

3

36

4
39

5

40

6
40

7

44

8
44

9

44

10

45

11
47

12

48

13
51

14

52

15

52

16
52

17

53

18

67

19
77

20

  • Calculate the cumulative probability (pi) associated with each rank (I).

Since the number of data points = 20 which is larger than 10, so the formula reduces to:

pi=(i-0.5)/n

The following table will be produced:

Age

rankpi
251

0.025

25

2

0.075

26

30.125
364

0.175

39

50.225
406

0.275

40

7

0.325

44

80.375
449

0.425

44

10

0.475

45

110.525
4712

0.575

48

130.625
5114

0.675

52

15

0.725

52

160.775
5217

0.825

53

180.875
6719

0.925

77

20

0.975

  • Calculate the Z-score for each pi value.

Age

rankpizi
2510.025

-1.96

25

20.075-1.44
2630.125

-1.15

36

40.175-0.93
3950.225

-0.76

40

60.275-0.60
4070.325

-0.45

44

80.375-0.32
4490.425

-0.19

44

100.475-0.06
45110.525

0.06

47

120.5750.19
48130.625

0.32

51

140.675

0.45

52

150.7250.60
52160.775

0.76

52

170.8250.93
53180.875

1.15

67

190.9251.44
77200.975

1.96

  • Create an x-y scatter plot of your z-score values on the x-axis versus their corresponding data points on the y-axis.

x y scatter plot of your z score values

  • As a reference, a straight line can be added to the plot which passes through the first and third quartiles.

straight line added to the plot which passes through the first and third quartiles

Nearly all the points on the straight line except small and large values, so it is nearly normally distributed data.

2. Males have nearly normally distributed weights as nearly all the points are along the straight line.

In females, the normal probability plot shows an inverted C shape which means that the female weights are right-skewed.

3.Β  All the smoking statuses have nearly normally distributed total cholesterol levels as nearly all the points are along the straight line, except for small and large values.

4. β€œnot in labor force” and β€œunemployed” statuses have nearly normally distributed annual income as nearly all the points are along the straight line, except for large values.

β€œemployed” status has right-skewed annual income as the normal probability plot takes an inverted C-shape.

5. Tropical depression storms have nearly normally distributed pressure as nearly all the points are along the straight line, except for large and small values.

Hurricane and tropical storms have left-skewed pressure values as the normal probability plot takes a C-shape.

Previous LessonΒ |Β Main Page | Next Lesson