Logistic Equation – Explanation & Examples

Logistic EquationThe definition of the logistic equation is:

“The logistic equation is a sigmoid function, which takes any real number and outputs a value between zero and certain positive number.”

In this topic, we will discuss the logistic equation from the following aspects:

  1. What is the logistic equation?
  2. Logistic equation formula.
  3. How to solve the logistic equation?
  4. Application of logistic function in ecology.
  5. Application of logistic function in statistics.
  6. Practice questions.
  7. Answer key.

1. What is the logistic equation?

The logistic equation is a sigmoid function, which takes any real number from negative infinity -∞ to positive infinity +∞ and outputs a value between zero and a certain positive number.

Logistic equation formula

The equation is :

f(x)=L/(1+e^(-k(x-x_0)) )

where:

f(x) is the logistic equation or function.

L is the logistic function or curve maximum value.

e is a mathematical constant approximately equal to 2.71828.

k is the logistic growth rate or steepness of the curve.

x_0 is the value of x at the sigmoid curve midpoint.

For values of x between -∞ to +∞, the logistic equation draws an S-curve with the curve f(x) approaching L as x approaches +∞ and approaching zero as x approaches -∞.

– Example 1

For the x values:

-6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6.

Draw the logistic curve when L = 1, k = 1, and x_0=0.

We follow these steps:

1. plot a table of values.

x

-6

-5

-4

-3

-2

-1

0

1

2

3

4

5

6

2. Using the above formula, calculate the logistic function for each value.

x

f(x)
-6

0.002472623

-5

0.006692851
-4

0.017986210

-3

0.047425873

-2

0.119202922
-1

0.268941421

0

0.500000000

1

0.731058579

2

0.880797078
3

0.952574127

4

0.982013790
5

0.993307149

6

0.997527377

3. Plot the x values on the x-axis and the logistic function value on the y-axis.

Connect the intersecting points with a line to draw the sigmoid curve.

Plot with sigmoid curve

For comparison, we can add two other equations with the same parameters except that L = 5 and L =10 respectively.

We update the table.

x

f(x)_1f(x)_5f(x)_10
-60.0024726230.01236312

0.02472623

-5

0.0066928510.033464250.06692851
-40.0179862100.08993105

0.17986210

-3

0.0474258730.23712937

0.47425873

-2

0.1192029220.596014611.19202922
-10.2689414211.34470711

2.68941421

0

0.5000000002.50000000

5.00000000

1

0.7310585793.655292897.31058579
20.8807970784.40398539

8.80797078

3

0.9525741274.762870639.52574127
40.9820137904.91006895

9.82013790

5

0.9933071494.966535759.93307149
60.9975273774.98763688

9.97527377

Where f(x)_1 is the logistic function with L = 1, f(x)_5 is the logistic function with L = 5, and f(x)_10 is the logistic function with L = 10.

and plot the 3 different sigmoid curves.

Plot with 3 different sigmoid curves

We see that the maximum of each logistic function is its L value.

– Example 2

For the x values:

-6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6.

Draw the logistic curve when L = 1, k = 1, 2, or 3, and x_0=0.

We follow these steps:

1. plot a table of values.

x

-6

-5

-4

-3

-2

-1

0

1

2

3

4

5

6

2. Using the above formula, calculate the logistic function for each value and each k value.

We add 3 other columns, one for each logistic function.

x

f(x)_1f(x)_2f(x)_3
-60.0024726236.144175e-06

1.522998e-08

-5

0.0066928514.539787e-053.059022e-07
-40.0179862103.353501e-04

6.144175e-06

-3

0.0474258732.472623e-03

1.233946e-04

-2

0.1192029221.798621e-022.472623e-03
-10.268941421

1.192029e-01

4.742587e-02

0

0.5000000005.000000e-01

5.000000e-01

1

0.7310585798.807971e-019.525741e-01
20.8807970789.820138e-01

9.975274e-01

3

0.9525741279.975274e-01

9.998766e-01

4

0.9820137909.996646e-019.999939e-01
50.9933071499.999546e-01

9.999997e-01

6

0.9975273779.999939e-01

1.000000e+00

Where f(x)_1 is the logistic function with k = 1, f(x)_2 is the logistic function with L = 2, and f(x)_3 is the logistic function with L = 3.

and plot the 3 different sigmoid curves.

3 different sigmoid curves plotWith increasing the k value, the sigmoid curve becomes steeper in its growth.

2. How to solve the logistic equation?

The logistic function finds applications in many fields, including ecology, chemistry, economics, sociology, political science, linguistics, and statistics.

We will focus on the application and the solving of logistic function in ecology and statistics.

– Application of logistic function in ecology

A typical application of the logistic equation is to model population growth, where the rate of reproduction is proportional to both the existing population and the amount of available resources.

The logistic differential equation for the population growth is:

dP/dt=rP(1-P/K)

Where:

P is the population size.

t is the time. The units of time can be hours, days, weeks, months, or years.

dP/dt is the instantaneous rate of change of the population as a function of time.
r is the growth rate.

K is the carrying capacity. The carrying capacity of an organism in a given environment is defined to be the maximum population of that organism that the environment can sustain indefinitely. It has the same unit as the population size.

The population growth rate changes over time. Biologists have found that in many biological systems, the population grows until a certain steady-state population is reached.

The concept of carrying capacity allows for the possibility that in a given area, only a certain number of a given organism or animal can thrive without running into resource issues.

Suppose that the initial population is small relative to the carrying capacity. Then P/K is small, possibly close to zero. Thus, the quantity in parentheses on the right-hand side of the logistic equation is close to 1, and the right-hand side of this equation is close to rP. The value of the rate r represents the proportional increase of the population P in one unit of time. So, the population grows rapidly.

However, as the population grows, some members of the population interfere with each other by competing for some critical resource, such as food or living space. The ratio P/K also grows, because K is constant. If the population remains below the carrying capacity, then P/K is less than 1, so 1-(P/K)>0 but less than 1. Therefore the growth rate decreases as a result.

If P=K then the right-hand side is equal to zero, and the population does not change (this is called maturity of the population).

The solution to the equation, with P_0 being the initial population is:

P(t)=K/(1+((K-P_0)/P_0 )e^(-rt) )

Note that K is the limiting value of P:

If the P_0 < K, then population grows till reaching K.

If P_0 >K, then population decreases till approach K.

– Example 1

A population of rabbits in a meadow is observed to be 200 rabbits at time t=0. Using an initial population of 200 and a growth rate of 0.04 per month, with a carrying capacity of 750 rabbits.

Draw the logistic curve of growth for this population.

We follow these steps:

1. The growth rate is per month so the x-axis will be in months. The 0 value will represent the current month, and 1 is the next month, and so on.

We can also plot negative values on the x-axis to represent the previous month and so on.

In a table, we write the next 12 values (next year) and the previous 12 values (previous year).

So the x values will range from -12 to 12.

In a table:

t_months

-12

-11

-10

-9

-8

-7

-6

-5

-4

-3

-2

-1

0

1

2

3

4

5

6

7

8

9

10

11

12

2. We know that the population at the current month or t = 0 is 200. We use the above equation to know the population size for all these months:

P(t)=K/(1+((K-P_0)/P_0 )e^(-rt) )=750/(1+((750-200)/200)e^(-0.04t) )

For example, at time = 0:

The population size at time = 0 = P(0) = 750/1+(750-200/200)Xe^(-0.04X0) = 750/(1+((750-200)/200))= 200.

Using the above formula, calculate the population size for each time value and update the table.

t_months

population

-12

137.7612

-11

142.3165

-10

146.9863
-9

151.7710

-8

156.6710

-7

161.6865

-6

166.8174

-5

172.0634
-4

177.4243

-3

182.8993
-2

188.4877

-1

194.1884
0

200.0000

1

205.9211
2

211.9500

3

218.0847

4

224.3229
5

230.6621

6

237.0997

7

243.6326
8

250.2578

9

256.9716

10

263.7706

11

270.6506
12

277.6077

3. Plot the t values on the x-axis and the population on the y-axis.

Connect the intersecting points with a line to draw the sigmoid curve.

t months plot with sigmoid curve

  • We see that:
    The expected number of rabbits after 12 months or 1 year= P(12) = 278 rabbits approximately.
  • It is a part of the Sigmoid curve and so not perfectly S-shaped.

We can see the full sigmoid curve if we extend the time boundaries to between -100 and 100.

Plot of full sigmoid curve if we extend the time boundaries to between 100 and 100

– Application of logistic function in statistics

The logistic function can be used in logistic regression.

Logistic regression is used to model the probability of a dependent variable with two possible values, such as pass/fail given a set of one or more independent variables (predictors).

If we assume that the first level value is “fail” and the second level value is “pass” of the dependent variable.

The probability of the second level value (“pass”), p(Y = “pass”) can vary between 0 (we are certain it is a “fail” event) and 1 (certainly a “pass” event).

The formula of the logistic function that model the probability is:

p(Y)=1/(1+e^(-(β_0+β_1 x)) )

Where:

p(Y) is the probability of the second level value.

x is the independent variable value.

The quantity β_0+β_1 x is the log(odds).

We estimate the values of β_0 and β_1 such that plugging these estimates into the model for p(Y) yields a probability close to 1 for all data that have the second level value (“pass”) and a probability close to 0 for all other data that have the first level value (“fail”).

The odds of an event is the probability of an event occurring divided by the probability of not occurring.

odds = p/(1-p).

While the probability of an event can range between 0 and 1, the odds of an event can range between 0 and +∞.

When p = 0, odds = 0.

when p = 0.25, odds = 0.25/0.75 = 0.33.

when p = 0.5, odds = 0.5/0.5 = 1.

when p = 0.75, odds = 0.75*0.25 = 3.

when p = 0.95, odds = 0.95/0.05 = 19.

when p = 0.99, odds = 0.99/0.01 = 99.

when p = 1, odds = 1/0 = +∞.

The following plot plots the different probabilities on the x-axis and the resulting odds on the y-axis.

Plot of different probabilities on the x axis and the resulting odds on the y

If we plot the log(odds) on the y-axis, the log(odds) have a range from -∞ to +∞.

Plot of logodds on the y axis the logodds have a range from ∞ to ∞.

Because the log(odds) range from -∞ to +∞, they can now be put in the linear equation for a single predictor (x):

log(odds) = β_0+β_1 x

where:

β_0 is the baseline log(odds) when the predictor(x) is zero.

β_1 is the amount of log(odds) increase for each one-unit increase in x.

If β_1 is positive then increasing x will be associated with increasing the log(odds) or p(Y), and if β_1 is negative then increasing x will be associated with decreasing the log(odds) or p(Y).

For different n predictors,x_1,x_2,…..x_n:

log(odds) = β_0+β_1 x+β_2 x_2+…….+β_n x_n

and the logistic equation:

p(Y)=1/(1+e^(-(β_0+β_1 x+β_2 x_2+…….+β_n x_n)) )

p(Y)=1/(1+e^(-(log(odds))) )

p(Y)=odds/(1+odds)

– Example 1 for one predictor

The following table shows the data of 25 students illustrating the number of hours per week each student spent watching TV (tv column) and whether they passed or failed a certain exam (pass column).

The pass column has 2 values: 1 for passing and 0 for failure.

student

tvpass
10

1

2

01
30

0

4

51
57

1

6101

7

111
812

1

9

141
1015

0

11

15

0

12

161
1317

0

14

190
1520

1

16

20

0

17

200
1823

0

19

24

0

20

250
2127

0

22

28

0

23

300
2430

0

25

32

1

We can see the relation between TV hours and pass in the following plot:

Plot of relation between TV hours and pass

We see that increasing the number of hours watching TV is associated with more failures or less passing.

We estimate a logistic regression model for this data, assuming that pass or 1 is the second level value, and found that:

β_0 = 1.90717

β_1 for TV hours (predictor) = -0.13078.

Calculate the probability of passing for all students in this data.

We will follow these steps:

1. Estimate the log(odds) of passing for each student.

log(odds) = β_0+β_1 x = 1.90717 + -0.13078 X TV hours.

The β_1 for TV hours is negative, so increasing the number of hours watching TV will be associated with decreasing the log(odds) or probability of passing.

Update the table with a column for log(odds) of passing.

student

tvpasslog(odds)
101

1.90717

2

01

1.90717

3

001.90717
451

1.25327

5

71

0.99171

6

1010.59937
7111

0.46859

8

1210.33781
9141

0.07625

10

150-0.05453

11

150

-0.05453

12

161-0.18531
13170

-0.31609

14

190-0.57765
15201

-0.70843

16

200-0.70843
17200

-0.70843

18

230-1.10077
19240

-1.23155

20

250-1.36233
21270

-1.62389

22

280

-1.75467

23

300-2.01623
24300

-2.01623

25

321

-2.27779

2., Calculate the probability of passing for each student using the logistic equation:

p(Y)=1/(1+e^(-(β_0+β_1 x)) )=1/(1+e^(-(1.90717+-0.13078x)) )=1/(1+e^(-(log(odds))) )

Update the table with a column for probability of passing for each student.

student

tvpasslog(odds)

p(pass)

1

011.907170.87070088
2011.90717

0.87070088

3

001.907170.87070088
4511.25327

0.77786540

5710.99171

0.72942555

6

1010.599370.64551216
71110.46859

0.61504997

8

1210.337810.58365845
91410.07625

0.51905327

10

150-0.054530.48637088
11150-0.05453

0.48637088

12

161-0.185310.45380462
13170-0.31609

0.42162894

14

190-0.577650.35947351
15201-0.70843

0.32994585

16200-0.70843

0.32994585

17

200-0.708430.32994585
18230-1.10077

0.24959565

19

240-1.231550.22591025
20250-1.36233

0.20386188

21

270-1.623890.16466909
22280-1.75467

0.14745914

23

300-2.016230.11750938
24300-2.01623

0.11750938

25

321-2.27779

0.09297916

We see that increasing the number of TV hours is associated with decreased probability of passing this exam.

We can plot the tv hours on the x-axis and the probability on the y-axis to see the sigmoid curve of the logistic equation.

The passing and failing students are plotted as black points.

Plot of passing and failing students as black dots

We see that the probability of passing the exam decreases with increasing the number of TV hours.

– Example 2 for two predictor

The following table shows the data of 100 persons from a certain survey.

The table shows the age of each person (age, in years), body mass index (bmi), and whether they had hypertension (hypertension, No or Yes).

age

bmihypertension
6729.02

Yes

63

33.86

Yes

75

34.03Yes
5927.73

Yes

68

28.60Yes
7029.27

Yes

76

30.08Yes
7427.46

No

69

34.31Yes
7036.10

Yes

68

24.13No
6731.44

Yes

62

34.25

Yes

72

25.22Yes
7730.08

Yes

63

25.38Yes
7829.90

Yes

58

35.78Yes
6839.51

Yes

67

29.90Yes
6529.93

Yes

65

34.47Yes
7027.26

No

65

34.41Yes
7728.89

No

60

33.62Yes
6233.24

Yes

62

28.17Yes
6331.67

Yes

68

23.57Yes
5529.49

Yes

74

31.16Yes
5630.80

Yes

67

29.41Yes
6530.13

Yes

60

27.77Yes
5532.10

Yes

68

29.03Yes
7725.47

No

67

29.48Yes
6625.39

Yes

72

23.23Yes
6133.99

No

58

29.83Yes
7525.19

Yes

57

31.64Yes
6527.81

Yes

70

24.07Yes
6830.32

Yes

55

26.40Yes
6234.45

Yes

77

34.62Yes
5630.07

Yes

69

27.29Yes
5927.30

No

68

30.00

Yes

61

29.08No
6129.86

Yes

70

33.87Yes
7531.74

Yes

71

32.38Yes
7724.03

Yes

65

28.83Yes
7637.04

Yes

75

24.18Yes
6728.73

Yes

59

33.76

Yes

65

37.04Yes
6432.59

Yes

71

27.01Yes
5727.73

No

62

32.79

No

68

27.06Yes
7639.95

Yes

61

28.35

Yes

67

29.47Yes
5523.60

No

64

39.97Yes
7230.36

Yes

60

27.79Yes

65

27.94

Yes

6627.81

Yes

69

25.61Yes
6630.67

Yes

68

26.40Yes
6530.52

No

60

33.51Yes
7627.20

Yes

57

30.85Yes
6927.12

Yes

67

34.65

No

62

25.72Yes
72

28.15

Yes

70

26.90Yes
5825.78

Yes

68

31.48No
6142.53

Yes

76

31.45Yes
6425.27

Yes

59

34.19

Yes

We can see the relation between age and hypertension in the following box plot:

Box plot of the relation between age and hypertension

  • We see that:
    Persons with hypertension are plotted as blue dots while the normotensive persons are plotted as red dots.
  • The median age for hypertensive persons (central line of the blue box) is higher than the median age for the normotensive persons (central line of the red box).
  • This may indicate that increasing age is associated with an increased probability of hypertension.

We can also see the relation between bmi and hypertension in the following box plot:

Box plot of relation between bmi and hypertension

  • We see that:
    The median bmi for hypertensive persons (central line of the blue box) is higher than the median bmi for the normotensive persons (central line of the red box).
  • This may indicate that increasing body mass index is associated with increased hypertension probability.

We estimate a logistic regression model for this data, assuming that “Yes” is the second level value, and found that:

β_0 = -2.74974.

β_1 for age (predictor1) = 0.02155.

β_2 for bmi (predictor2) = 0.10631.

We will calculate the probability of developing hypertension for each person in this data by following these steps:

1. Estimate the log(odds) of hypertension for each person.

log(odds) = β_0+β_1 x_1+β_2 x_2 = -2.74974 + (0.02155 X age) + (0.10631 X bmi).

The β_1 for age is positive, so increasing age will be associated with increasing the log(odds) or the probability of hypertension.

The β_2 for bmi is positive also, so increasing the body mass index will be associated with increasing the log(odds) or the probability of hypertension.

Update the table with a column for log(odds) of hypertension.

age

bmihypertensionlog(odds)
6729.02Yes

1.779226

63

33.86Yes

2.207567

7534.03Yes2.484239
5927.73Yes1.469686
6828.60Yes1.756126
7029.27Yes1.870454
7630.08Yes2.085865
7427.46No1.764233
6934.31Yes2.384706
7036.10Yes2.596551
6824.13No1.280920
6731.44Yes2.036496
6234.25Yes2.227477
7225.22Yes1.482998
7730.08Yes2.107415
6325.38Yes1.306058
7829.90Yes2.109829
5835.78Yes2.303932
6839.51Yes2.915968
6729.90Yes1.872779
6529.93Yes1.832868
6534.47Yes2.315516
7027.26No1.656771
6534.41Yes2.309137
7728.89No1.980906
6033.62Yes2.117402
6233.24Yes2.120104
6228.17Yes1.581113
6331.67Yes1.974748
6823.57Yes1.221387
5529.49Yes1.570592
7431.16Yes2.157580
5630.80Yes1.731408
6729.41Yes1.820687
6530.13Yes1.854130
6027.77Yes1.495489
5532.10Yes1.848061
6829.03Yes1.801839
7725.47No1.617326
6729.48Yes1.828129
6625.39Yes1.371771
7223.23Yes1.271441
6133.99No2.178287
5829.83Yes1.671387
7525.19Yes1.544459
5731.64Yes1.842258
6527.81Yes1.607491
7024.07Yes1.317642
6830.32Yes1.938979
5526.40Yes1.242094
6234.45Yes2.248740
7734.62Yes2.590062
5630.07Yes1.653802
6927.29Yes1.638410
5927.30No1.423973
6830.00Yes1.904960
6129.08No1.656305
6129.86Yes1.739227
7033.87Yes2.359480
7531.74Yes2.240789
7132.38Yes2.222628
7724.03Yes1.464239
6528.83Yes1.715927
7637.04Yes2.825782
7524.18Yes1.437086
6728.73Yes1.748396
5933.76Yes2.110736
6537.04Yes2.588732
6432.59Yes2.094103
7127.01Yes1.651743
5727.73No1.426586
6232.79No2.072265
6827.06Yes1.592409
7639.95Yes3.135145
6128.35Yes1.578699
6729.47Yes1.827066
5523.60No0.944426
6439.97Yes2.878671
7230.36Yes2.029432
6027.79Yes1.497615
6527.94Yes1.621311
6627.81Yes1.629041
6925.61Yes1.459809
6630.67Yes1.933088
6826.40Yes1.522244
6530.52No1.895591
6033.51Yes2.105708
7627.20Yes1.779692
5730.85Yes1.758274
6927.12Yes1.620337
6734.65No2.377751
6225.72Yes1.320653
7228.15Yes1.794487
7026.90Yes1.618499
5825.78Yes1.240832
6831.48No2.062299
6142.53Yes3.086174
7631.45Yes2.231510
6425.27Yes1.315914
5934.19Yes2.156449

For example, the first person has an age of 67 years and 29.02 bmi so:

log(odds) = -2.74974 + (0.02155 X67) + (0.10631X 29.02) = 1.779226.

2. Calculate the probability of hypertension for each person using the logistic equation:

p(Y)=1/(1+e^(-(β_0+β_1 x_1+β_2 x_2)) )=1/(1+e^(-(-2.74974+(0.02155Xage)+(0.10631Xbmi))) )=1/(1+e^(-(log(odds))) )

Update the table with a column for the probability of hypertension for each person.

age

bmi

hypertension

log(odds)

p(hypertension)

67

29.02

Yes

1.779226

0.8556013

63

33.86

Yes

2.207567

0.9009269

75

34.03

Yes

2.484239

0.9230295

59

27.73

Yes

1.469686

0.8130097

68

28.60

Yes

1.756126

0.8527238

70

29.27

Yes

1.870454

0.8665108

76

30.08

Yes

2.085865

0.8895217

74

27.46

No

1.764233

0.8537390

69

34.31

Yes

2.384706

0.9156536

70

36.10

Yes

2.596551

0.9306393

68

24.13

No

1.280920

0.7826064

67

31.44

Yes

2.036496

0.8845760

62

34.25

Yes

2.227477

0.9026900

72

25.22

Yes

1.482998

0.8150250

77

30.08

Yes

2.107415

0.8916218

63

25.38

Yes

1.306058

0.7868527

78

29.90

Yes

2.109829

0.8918548

58

35.78

Yes

2.303932

0.9092021

68

39.51

Yes

2.915968

0.9486302

67

29.90

Yes

1.872779

0.8667795

65

29.93

Yes

1.832868

0.8621031

65

34.47

Yes

2.315516

0.9101539

70

27.26

No

1.656771

0.8398040

65

34.41

Yes

2.309137

0.9096309

77

28.89

No

1.980906

0.8787777

60

33.62

Yes

2.117402

0.8925831

62

33.24

Yes

2.120104

0.8928419

62

28.17

Yes

1.581113

0.8293620

63

31.67

Yes

1.974748

0.8781201

68

23.57

Yes

1.221387

0.7723075

55

29.49

Yes

1.570592

0.8278680

74

31.16

Yes

2.157580

0.8963749

56

30.80

Yes

1.731408

0.8495924

67

29.41

Yes

1.820687

0.8606486

65

30.13

Yes

1.854130

0.8646113

60

27.77

Yes

1.495489

0.8169007

55

32.10

Yes

1.848061

0.8638993

68

29.03

Yes

1.801839

0.8583727

77

25.47

No

1.617326

0.8344260

67

29.48

Yes

1.828129

0.8615387

66

25.39

Yes

1.371771

0.7976661

72

23.23

Yes

1.271441

0.7809894

61

33.99

No

2.178287

0.8982827

58

29.83

Yes

1.671387

0.8417607

75

25.19

Yes

1.544459

0.8241120

57

31.64

Yes

1.842258

0.8632156

65

27.81

Yes

1.607491

0.8330628

70

24.07

Yes

1.317642

0.7887891

68

30.32

Yes

1.938979

0.8742400

55

26.40

Yes

1.242094

0.7759283

62

34.45

Yes

2.248740

0.9045418

77

34.62

Yes

2.590062

0.9302193

56

30.07

Yes

1.653802

0.8394042

69

27.29

Yes

1.638410

0.8373185

59

27.30

No

1.423973

0.8059605

68

30.00

Yes

1.904960

0.8704519

61

29.08

No

1.656305

0.8397413

61

29.86

Yes

1.739227

0.8505888

70

33.87

Yes

2.359480

0.9136848

75

31.74

Yes

2.240789

0.9038531

71

32.38

Yes

2.222628

0.9022632

77

24.03

Yes

1.464239

0.8121802

65

28.83

Yes

1.715927

0.8476035

76

37.04

Yes

2.825782

0.9440533

75

24.18

Yes

1.437086

0.8080030

67

28.73

Yes

1.748396

0.8517504

59

33.76

Yes

2.110736

0.8919423

65

37.04

Yes

2.588732

0.9301329

64

32.59

Yes

2.094103

0.8903287

71

27.01

Yes

1.651743

0.8391265

57

27.73

No

1.426586

0.8063689

62

32.79

No

2.072265

0.8881781

68

27.06

Yes

1.592409

0.8309547

76

39.95

Yes

3.135145

0.9583194

61

28.35

Yes

1.578699

0.8290201

67

29.47

Yes

1.827066

0.8614118

55

23.60

No

0.944426

0.7199928

64

39.97

Yes

2.878671

0.9467819

72

30.36

Yes

2.029432

0.8838527

60

27.79

Yes

1.497615

0.8172185

65

27.94

Yes

1.621311

0.8349759

66

27.81

Yes

1.629041

0.8360382

69

25.61

Yes

1.459809

0.8115035

66

30.67

Yes

1.933088

0.8735908

68

26.40

Yes

1.522244

0.8208687

65

30.52

No

1.895591

0.8693917

60

33.51

Yes

2.105708

0.8914567

76

27.20

Yes

1.779692

0.8556588

57

30.85

Yes

1.758274

0.8529933

69

27.12

Yes

1.620337

0.8348416

67

34.65

No

2.377751

0.9151149

62

25.72

Yes

1.320653

0.7892904

72

28.15

Yes

1.794487

0.8574765

70

26.90

Yes

1.618499

0.8345880

58

25.78

Yes

1.240832

0.7757088

68

31.48

No

2.062299

0.8871845

61

42.53

Yes

3.086174

0.9563188

76

31.45

Yes

2.231510

0.9030436

64

25.27

Yes

1.315914

0.7885010

59

34.19

Yes

2.156449

0.8962699

If we plot the age on the x-axis and the hypertension probability on the y-axis, we will not see the sigmoid curve of the logistic equation.

We see a zipped curve because our logistic equation also takes account of the bmi.

Plot with a zipped curve

We can plot the bmi on the x-axis and the probability on the y-axis with a separate line for each age to see the sigmoid curve of the logistic equation.

plot the bmi on the x axis and the probability on the y axis with a separate line for each age to see the sigmoid curve of the logistic equation

We see that increasing age (light blue lines) or bmi is associated with an increased probability of hypertension.

4. Practice questions

1. A population of tigers in a certain forest has a growth rate of 0.06 or 6% per year, with a carrying capacity of 136 tigers.

If the initial population was 30 tigers, draw the logistic curve of growth for this population.

2. In the above example, if the initial population was 200 tigers, draw the logistic curve of growth for this population.

3. From certain data, we estimate a logistic regression model for the effect of age on developing cardiovascular (cv) events, assuming that the presence of cv is the second level value, and found that:

β_0 = -4.426704.

β_1 for age (predictor) = 0.023421.

Calculate the probability of developing cv events for the age range 20-80 years.

4. From certain data, we estimate a logistic regression model for the effect of age on developing Type-2 diabetes (diab), assuming that the presence of Type-2 diabetes is the second level value, and found that:

β_0 = -1.67193.

β_1 for age (predictor) = 0.02343.

Calculate the probability of developing Type-2 diabetes for the age range 20-80 years.

5. The following plot shows the logistic regression curves that determine the effect of the number of characters in the email, in thousands, and the number of times “password” appeared in the email on the probability of spam email.

plot shows the logistic regression curves

How do these 2 predictors affect the probability of an email being spam?

5. Answer key

1. The growth rate is per year so the x-axis will be in years.

We know that the population at the current year or t = 0 is 30. We use the logistic equation to know the population size for any year:

P(t)=K/(1+((K-P_0)/P_0 )e^(-rt) )=136/(1+((136-30)/30)e^(-0.06t) )

For example, at time = 0:

The population size at time = 0 = P(0) = 136/1+(136-30/30)Xe^(-0.06X0) = 136/(1+((136-30)/30))= 30.

Using the above formula, we can calculate the population size for the past and next 20 years and produce that table.

year

population

-20

10.68252

-19

11.28826

-18

11.92508

-17

12.59420

-16

13.29684

-15

14.03422

-14

14.80756

-13

15.61806

-12

16.46689

-11

17.35520

-10

18.28411

-9

19.25466

-8

20.26787

-7

21.32464

-6

22.42585

-5

23.57223

-4

24.76443

-3

26.00299

-2

27.28829

-1

28.62060

0

30.00000

1

31.42643

2

32.89963

3

34.41916

4

35.98437

5

37.59442

6

39.24825

7

40.94455

8

42.68182

9

44.45834

10

46.27213

11

48.12102

12

50.00261

13

51.91432

14

53.85334

15

55.81671

16

57.80130

17

59.80382

18

61.82087

19

63.84895

20

65.88446

Plot the year values on the x-axis and the population size on the y-axis.

Connect the intersecting points with a line to draw the sigmoid curve.

Plot the year values on the x axis and the population size on the y

For example, the expected number of tigers after 20 years= P(20) = 66 tigers approximately.

2. The initial population is larger than the carrying capacity so the population will decrease with years.

We know that the population at the current year or t = 0 is 200. We use the logistic equation to know the population size for any year:

P(t)=K/(1+((K-P_0)/P_0 )e^(-rt) )=136/(1+((136-200)/200)e^(-0.06t) )

For example, at time = 0:

The population size at time = 0 = P(0) = 136/1+(136-200/200)Xe^(-0.06X0) = 136/(1+((136-200)/200))= 200.

Using the above formula, we can calculate the population size for the past and next 10 years and produce that table.

year

population

-10

326.2001

-9

301.6338

-8

281.6574

-7

265.1215

-6

251.2309

-5

239.4176

-4

229.2649

-3

220.4605

-2

212.7656

-1

205.9943

0

200.0000

1

194.6652

2

189.8949

3

185.6114

4

181.7504

5

178.2582

6

175.0900

7

172.2075

8

169.5783

9

167.1746

10

164.9724

Plot the year values on the x-axis and the population size on the y-axis.

Connect the intersecting points with a line to draw the sigmoid curve.

Plot of year values on the x axis and the population size on the y

For example, the expected number of tigers after 10 years= P(10) = 165 tigers approximately.

3. Estimate the log(odds) of developing cv event for each age value.

log(odds) = β_0+β_1 x = -4.426704 + 0.023421 X age.

The β_1 for age is positive, so increasing age will be associated with increasing the log(odds) or probability of cv event.

The following table will be produced.

age

log(odds)

20

-3.958284

21

-3.934863

22

-3.911442

23

-3.888021

24

-3.864600

25

-3.841179

26

-3.817758

27

-3.794337

28

-3.770916

29

-3.747495

30

-3.724074

31

-3.700653

32

-3.677232

33

-3.653811

34

-3.630390

35

-3.606969

36

-3.583548

37

-3.560127

38

-3.536706

39

-3.513285

40

-3.489864

41

-3.466443

42

-3.443022

43

-3.419601

44

-3.396180

45

-3.372759

46

-3.349338

47

-3.325917

48

-3.302496

49

-3.279075

50

-3.255654

51

-3.232233

52

-3.208812

53

-3.185391

54

-3.161970

55

-3.138549

56

-3.115128

57

-3.091707

58

-3.068286

59

-3.044865

60

-3.021444

61

-2.998023

62

-2.974602

63

-2.951181

64

-2.927760

65

-2.904339

66

-2.880918

67

-2.857497

68

-2.834076

69

-2.810655

70

-2.787234

71

-2.763813

72

-2.740392

73

-2.716971

74

-2.693550

75

-2.670129

76

-2.646708

77

-2.623287

78

-2.599866

79

-2.576445

80

-2.553024

Calculate the probability of cv developing for each age using the logistic equation:

p(Y)=1/(1+e^(-(β_0+β_1 x) )=1/(1+e^(-(-4.426704+0.023421x)) )=1/(1+e^(-(log(odds))) )

Update the table with a column for probability of cv developing for each age value.

age

log(odds)

p(cv)

20

-3.958284

0.01873804

21

-3.934863

0.01917357

22

-3.911442

0.01961902

23

-3.888021

0.02007460

24

-3.864600

0.02054055

25

-3.841179

0.02101707

26

-3.817758

0.02150442

27

-3.794337

0.02200280

28

-3.770916

0.02251247

29

-3.747495

0.02303367

30

-3.724074

0.02356665

31

-3.700653

0.02411165

32

-3.677232

0.02466894

33

-3.653811

0.02523878

34

-3.630390

0.02582143

35

-3.606969

0.02641716

36

-3.583548

0.02702626

37

-3.560127

0.02764901

38

-3.536706

0.02828569

39

-3.513285

0.02893659

40

-3.489864

0.02960201

41

-3.466443

0.03028226

42

-3.443022

0.03097764

43

-3.419601

0.03168847

44

-3.396180

0.03241506

45

-3.372759

0.03315775

46

-3.349338

0.03391685

47

-3.325917

0.03469271

48

-3.302496

0.03548566

49

-3.279075

0.03629606

50

-3.255654

0.03712425

51

-3.232233

0.03797059

52

-3.208812

0.03883545

53

-3.185391

0.03971920

54

-3.161970

0.04062221

55

-3.138549

0.04154486

56

-3.115128

0.04248754

57

-3.091707

0.04345063

58

-3.068286

0.04443455

59

-3.044865

0.04543968

60

-3.021444

0.04646645

61

-2.998023

0.04751527

62

-2.974602

0.04858655

63

-2.951181

0.04968072

64

-2.927760

0.05079822

65

-2.904339

0.05193949

66

-2.880918

0.05310496

67

-2.857497

0.05429508

68

-2.834076

0.05551031

69

-2.810655

0.05675111

70

-2.787234

0.05801794

71

-2.763813

0.05931127

72

-2.740392

0.06063157

73

-2.716971

0.06197933

74

-2.693550

0.06335503

75

-2.670129

0.06475916

76

-2.646708

0.06619220

77

-2.623287

0.06765466

78

-2.599866

0.06914704

79

-2.576445

0.07066985

80

-2.553024

0.07222359

We see that increasing age is associated with an increased probability of developing cv.

We can plot the age on the x-axis and the probability on the y-axis to see the sigmoid curve of the logistic equation.

plot the age on the x axis and the probability on the y axis to see the sigmoid curve of the logistic equation

4. Estimate the log(odds) of developing Type-2 diabetes for each age value.

log(odds) = β_0+β_1 x = -1.67193 + 0.02343 X age.

The β_1 for age is positive, so increasing age will be associated with increasing the log(odds) or probability of Type-2 diabetes.

The following table will be produced.

age

log(odds)

20

-1.20333

21

-1.17990

22

-1.15647

23

-1.13304

24

-1.10961

25

-1.08618

26

-1.06275

27

-1.03932

28

-1.01589

29

-0.99246

30

-0.96903

31

-0.94560

32

-0.92217

33

-0.89874

34

-0.87531

35

-0.85188

36

-0.82845

37

-0.80502

38

-0.78159

39

-0.75816

40

-0.73473

41

-0.71130

42

-0.68787

43

-0.66444

44

-0.64101

45

-0.61758

46

-0.59415

47

-0.57072

48

-0.54729

49

-0.52386

50

-0.50043

51

-0.47700

52

-0.45357

53

-0.43014

54

-0.40671

55

-0.38328

56

-0.35985

57

-0.33642

58

-0.31299

59

-0.28956

60

-0.26613

61

-0.24270

62

-0.21927

63

-0.19584

64

-0.17241

65

-0.14898

66

-0.12555

67

-0.10212

68

-0.07869

69

-0.05526

70

-0.03183

71

-0.00840

72

0.01503

73

0.03846

74

0.06189

75

0.08532

76

0.10875

77

0.13218

78

0.15561

79

0.17904

80

0.20247

Calculate the probability of developing Type-2 diabetes for each age value using the logistic equation:

p(Y)=1/(1+e^(-(β_0+β_1 x)) )=1/(1+e^(-(-1.67193+0.02343x)) )=1/(1+e^(-(log(odds))) )

Update the table with a column for probability of developing Type-2 diabetes for each age value.

age

log(odds)

p(diab)

20

-1.20333

0.2308834

21

-1.17990

0.2350702

22

-1.15647

0.2393093

23

-1.13304

0.2436005

24

-1.10961

0.2479436

25

-1.08618

0.2523383

26

-1.06275

0.2567843

27

-1.03932

0.2612812

28

-1.01589

0.2658288

29

-0.99246

0.2704265

30

-0.96903

0.2750739

31

-0.94560

0.2797706

32

-0.92217

0.2845159

33

-0.89874

0.2893095

34

-0.87531

0.2941506

35

-0.85188

0.2990386

36

-0.82845

0.3039729

37

-0.80502

0.3089527

38

-0.78159

0.3139773

39

-0.75816

0.3190459

40

-0.73473

0.3241576

41

-0.71130

0.3293117

42

-0.68787

0.3345071

43

-0.66444

0.3397429

44

-0.64101

0.3450183

45

-0.61758

0.3503320

46

-0.59415

0.3556832

47

-0.57072

0.3610707

48

-0.54729

0.3664934

49

-0.52386

0.3719501

50

-0.50043

0.3774396

51

-0.47700

0.3829608

52

-0.45357

0.3885123

53

-0.43014

0.3940929

54

-0.40671

0.3997013

55

-0.38328

0.4053360

56

-0.35985

0.4109959

57

-0.33642

0.4166794

58

-0.31299

0.4223851

59

-0.28956

0.4281116

60

-0.26613

0.4338574

61

-0.24270

0.4396211

62

-0.21927

0.4454011

63

-0.19584

0.4511959

64

-0.17241

0.4570040

65

-0.14898

0.4628237

66

-0.12555

0.4686537

67

-0.10212

0.4744922

68

-0.07869

0.4803376

69

-0.05526

0.4861885

70

-0.03183

0.4920432

71

-0.00840

0.4979000

72

0.01503

0.5037574

73

0.03846

0.5096138

74

0.06189

0.5154676

75

0.08532

0.5213171

76

0.10875

0.5271607

77

0.13218

0.5329970

78

0.15561

0.5388242

79

0.17904

0.5446408

80

0.20247

0.5504453

We see that increasing age is associated with an increased probability of developing Type-2 diabetes.

We can plot the age on the x-axis and the probability on the y-axis to see the sigmoid curve of the logistic equation.

Box plot the age on the x axis and the probability on the y axis to see the sigmoid curve of the logistic equation

5. Increasing the number of characters in the email is associated with decreased probability of spam email.

Also, increasing the number of times “password” appeared in the email (from 0 to 4) is associated with decreased probability of spam email for a constant small number of characters.

Previous Lesson | Main Page | Next Lesson