Generalised Linear Models 2

Multinomial

If we have more than two categories or groups that we want to model relative to covariates (e.g., we have observations i=1,,ni=1,,n and groups covariates j=1,,Jj=1,,J), multinomial is our candidate model

Let

  • pijpij be the probability that the i-th observation belongs to the j-th group
  • YijYij be the number of observations for individual i in group j; An individual will have observations Yi1,Yi2,YiJYi1,Yi2,YiJ
  • assume the probability of observing this response is given by a multinomial distribution in terms of probabilities pijpij, where Jj=1pij=1Jj=1pij=1 . For interpretation, we have a baseline category pi1=1Jj=2pijpi1=1Jj=2pij

The link between the mean response (probability) pijpij and a linear function of the covariates

ηij=xiβj

which equals

`logpijpi1,j=2,..,J

We compare pij to the baseline pi1, suggesting

pij=exp(ηij)1+Ji=2exp(ηij)

which is known as multinomial logistic model.

Note:

  • Softmax coding for multinomial logistic regression: rather than selecting a baseline class, we treat all K class symmetrically - equally important (no baseline).

P(Y=k|X=x)=exp(βk1++βkpxp)Kl=1exp(βl0++βlpxp) then the log odds ratio between kth and ktth classes is

log(P(Y=k|X=x)P(Y=k|X=x))=(βk0βk0)++(βkpβkp)xp

Poisson GLMs

The Poisson distribution

The Poisson distribution specifies the probability of a discrete random variable Y and is given by:

f(y,μ)=Pr(Y=y)=μy×eμy!

E(Y)=Var(Y)=μ

where μ is the parameter of the Poisson distribution

The Poisson distribution is particularly relevant to model count data because it:

  • specifies the probability only for integer values
  • P(y<0)=0, hence the probability of any negative value is null
  • Var(Y)=μ (the mean-variance relationship) allows for heterogeneity (e.g. when variance generally increases with the mean)

A Poisson GLM will model the value of μ as a function of different explanatory variables:

Step 1.

We assume Yi follows a Poisson distribution with mean and variance μi.

YiPoisson(μi)

E(Yi)=Var(Yi)=μi

f(yi,μi)=μyii×eμiy!

μi corresponds to the expected number of individuals.

Step 2.

We specify the linear predictor of the model just as in a linear model.

αIntercept+β1slope of 'variable'×variablei

Step 3.

The link between the mean of Yi and linear predictor is a logarithmic function can be written as:

log(μi)=α+β1×variablei

It can also be written as:

μi=eα+β×variablei

This shows that the impact of each explanatory variable is multiplicative. Increasing variable by one increases μ by factor of exp( βvariable ).

We can also write it as:

μi=eα×eβvariablei1

If βj=0 then exp(βj)=1 and μ is not related to xj. If βj>0 then μ increases if xj increases; if βj<0 then μ decreases if xj increases.

setup

df<-readxl::read_xlsx("multinomial.xlsx")
df<-df |> 
  separate(`y,party,race,gender`,into=c("y","party","race","gender"),",") |> 
  mutate_if(is.character,as.factor) |> 
  mutate(y=as.numeric(y)) |> 
  rename(income="y")
df
#> # A tibble: 12 x 4
#>    income party       race  gender
#>     <dbl> <fct>       <fct> <fct> 
#>  1      5 Democrat    White Male  
#>  2      8 Republican  White Male  
#>  3      2 Independent White Male  
#>  4     10 Democrat    Black Male  
#>  5     12 Republican  Black Male  
#>  6      1 Independent Black Male  
#>  7      7 Democrat    White Female
#>  8      3 Republican  White Female
#>  9      4 Independent White Female
#> 10     11 Democrat    Black Female
#> 11      9 Republican  Black Female
#> 12      6 Independent Black Female

write_csv(df,"multinom.csv")

summary(df)
#>      income              party      race      gender 
#>  Min.   : 1.00   Democrat   :4   Black:6   Female:6  
#>  1st Qu.: 3.75   Independent:4   White:6   Male  :6  
#>  Median : 6.50   Republican :4                       
#>  Mean   : 6.50                                       
#>  3rd Qu.: 9.25                                       
#>  Max.   :12.00

table(df$party,df$race)
#>              
#>               Black White
#>   Democrat        2     2
#>   Independent     2     2
#>   Republican      2     2

model with all variables

model<-nnet::multinom(party~.,data=df)
#> # weights:  15 (8 variable)
#> initial  value 13.183347 
#> iter  10 value 8.524672
#> iter  20 value 8.227196
#> iter  30 value 8.224114
#> iter  40 value 8.223844
#> iter  50 value 8.223812
#> final  value 8.223811 
#> converged
model
#> Call:
#> nnet::multinom(formula = party ~ ., data = df)
#> 
#> Coefficients:
#>             (Intercept)     income  raceWhite  genderMale
#> Independent    12.33053 -1.5500393 -6.0737795 -0.07011659
#> Republican      1.12294 -0.1137472 -0.5209413  0.12415647
#> 
#> Residual Deviance: 16.44762 
#> AIC: 32.44762

summary(model)
#> Call:
#> nnet::multinom(formula = party ~ ., data = df)
#> 
#> Coefficients:
#>             (Intercept)     income  raceWhite  genderMale
#> Independent    12.33053 -1.5500393 -6.0737795 -0.07011659
#> Republican      1.12294 -0.1137472 -0.5209413  0.12415647
#> 
#> Std. Errors:
#>             (Intercept)    income raceWhite genderMale
#> Independent    8.023601 0.9713783  4.955594   2.783173
#> Republican     4.891305 0.4793859  2.613579   1.510782
#> 
#> Residual Deviance: 16.44762 
#> AIC: 32.44762

log(πIπD)=12.330531.55income6.07377raceWhite0.07011659genderMale log(πRπD)=1.122940.113income0.52094raceWhite0.124genderMale

p1<-exp(2.181+11.67-0.096*5000)/(1+exp(2.181+11.67-0.096*5000)+exp(1.58+6.55-0.053*5000))
p1
#> [1] 3.581472e-203
p2<-exp(1.58+6.55-0.053*5000)/(1+exp(2.181+11.67-0.096*5000)+exp(2.181+11.67-0.096*5000))
p2
#> [1] 2.771893e-112
1-p1-p2
#> [1] 1

odds<-exp(1.12294-0.1137472*400)
odds
#> [1] 5.342864e-20

NULL MODEL

mod_null<-nnet::multinom(party~1,data=df)
#> # weights:  6 (2 variable)
#> initial  value 13.183347 
#> final  value 13.183347 
#> converged
mod_null
#> Call:
#> nnet::multinom(formula = party ~ 1, data = df)
#> 
#> Coefficients:
#>              (Intercept)
#> Independent 4.440892e-16
#> Republican  4.440892e-16
#> 
#> Residual Deviance: 26.36669 
#> AIC: 30.36669

model with gender

(model_gend<-nnet::multinom(party~gender,data=df))
#> # weights:  9 (4 variable)
#> initial  value 13.183347 
#> final  value 13.183347 
#> converged
#> Call:
#> nnet::multinom(formula = party ~ gender, data = df)
#> 
#> Coefficients:
#>              (Intercept)   genderMale
#> Independent 4.440892e-16 2.220446e-16
#> Republican  4.440892e-16 2.220446e-16
#> 
#> Residual Deviance: 26.36669 
#> AIC: 34.36669

compare the models

anova(mod_null,model_gend,test="Chisq")
#>    Model Resid. df Resid. Dev   Test    Df LR stat. Pr(Chi)
#> 1      1        22   26.36669           NA       NA      NA
#> 2 gender        20   26.36669 1 vs 2     2        0       1

model with race

(model_race<-nnet::multinom(party~race,data=df))
#> # weights:  9 (4 variable)
#> initial  value 13.183347 
#> final  value 13.183347 
#> converged
#> Call:
#> nnet::multinom(formula = party ~ race, data = df)
#> 
#> Coefficients:
#>              (Intercept)    raceWhite
#> Independent 4.440892e-16 2.220446e-16
#> Republican  4.440892e-16 2.220446e-16
#> 
#> Residual Deviance: 26.36669 
#> AIC: 34.36669

cmpare race model to null model

anova(mod_null,model_race,test="Chisq")
#>   Model Resid. df Resid. Dev   Test    Df LR stat. Pr(Chi)
#> 1     1        22   26.36669           NA       NA      NA
#> 2  race        20   26.36669 1 vs 2     2        0       1

model with income

(model_inc<-nnet::multinom(party~income,data=df))
#> # weights:  9 (4 variable)
#> initial  value 13.183347 
#> iter  10 value 9.866125
#> final  value 9.865368 
#> converged
#> Call:
#> nnet::multinom(formula = party ~ income, data = df)
#> 
#> Coefficients:
#>             (Intercept)      income
#> Independent   3.8280816 -0.69719923
#> Republican    0.2589209 -0.03185745
#> 
#> Residual Deviance: 19.73074 
#> AIC: 27.73074

compare income model to null model

anova(mod_null,model_inc,test="Chisq")
#>    Model Resid. df Resid. Dev   Test    Df LR stat.    Pr(Chi)
#> 1      1        22   26.36669           NA       NA         NA
#> 2 income        20   19.73074 1 vs 2     2 6.635959 0.03622595

model with race and income

(model_inc_race<-nnet::multinom(party~income+race,data=df))
#> # weights:  12 (6 variable)
#> initial  value 13.183347 
#> iter  10 value 8.499512
#> iter  20 value 8.244097
#> iter  30 value 8.229818
#> iter  40 value 8.228592
#> iter  50 value 8.228569
#> final  value 8.228567 
#> converged
#> Call:
#> nnet::multinom(formula = party ~ income + race, data = df)
#> 
#> Coefficients:
#>             (Intercept)     income  raceWhite
#> Independent   12.312994 -1.5477287 -6.0862942
#> Republican     1.050129 -0.1008768 -0.4603361
#> 
#> Residual Deviance: 16.45713 
#> AIC: 28.45713

compare models

anova(model_inc_race,model_inc,test="Chisq")
#>           Model Resid. df Resid. Dev   Test    Df LR stat.   Pr(Chi)
#> 1        income        20   19.73074           NA       NA        NA
#> 2 income + race        18   16.45713 1 vs 2     2 3.273602 0.1946016

check using chi square distribution

chisq.test(table(df$party,df$race))
#> 
#> 	Pearson's Chi-squared test
#> 
#> data:  table(df$party, df$race)
#> X-squared = 0, df = 2, p-value = 1

poisson dataset

df<-read.table("canada.txt",sep="",header=TRUE) |> 
  mutate_if(is.character,as.factor) |> 
  as_tibble()
df
#> # A tibble: 36 x 5
#>       id age   smoke           pop  dead
#>    <int> <fct> <fct>         <int> <int>
#>  1     1 40-44 no              656    18
#>  2     2 45-59 no              359    22
#>  3     3 50-54 no              249    19
#>  4     4 55-59 no              632    55
#>  5     5 60-64 no             1067   117
#>  6     6 65-69 no              897   170
#>  7     7 70-74 no              668   179
#>  8     8 75-79 no              361   120
#>  9     9 80+   no              274   120
#> 10    10 40-44 cigarPipeOnly   145     2
#> # i 26 more rows

ggplot(df,aes(x=dead))+
  geom_density()

model with all variables

model<-glm(dead~age+smoke+pop,data=df,family = poisson)
model
#> 
#> Call:  glm(formula = dead ~ age + smoke + pop, family = poisson, data = df)
#> 
#> Coefficients:
#>         (Intercept)             age45-59             age50-54  
#>           2.7717494            0.6242419            1.0038164  
#>            age55-59             age60-64             age65-69  
#>           1.3818228            1.5011996            2.1365714  
#>            age70-74             age75-79               age80+  
#>           2.3556629            2.2041331            1.9506334  
#> smokecigarretteOnly  smokecigarrettePlus              smokeno  
#>           0.5704051            0.2915664           -0.2113845  
#>                 pop  
#>           0.0004172  
#> 
#> Degrees of Freedom: 35 Total (i.e. Null);  23 Residual
#> Null Deviance:	    8434 
#> Residual Deviance: 584.9 	AIC: 850.9

extract_eq(model,terms_per_line = 2,wrap=TRUE,use_coefs = TRUE, greek_colors = "blue")

log(^E(dead))=2.77+0.62(age45-59) +1(age50-54)+1.38(age55-59) +1.5(age60-64)+2.14(age65-69) +2.36(age70-74)+2.2(age75-79) +1.95(age80+)+0.57(smokecigarretteOnly) +0.29(smokecigarrettePlus)0.21(smokeno) +0(pop)

model_null

mod1<-glm(dead~1,data=df,family = poisson)

model with age

mod2<-glm(dead~age,data=df,family = poisson)

COMPARE THE MODELS

anova(mod1,mod2,test="LRT")
#> Analysis of Deviance Table
#> 
#> Model 1: dead ~ 1
#> Model 2: dead ~ age
#>   Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
#> 1        35     8434.4                          
#> 2        27     4845.0  8   3589.4 < 2.2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

model with population

mod3<-glm(dead~pop,data=df,family = poisson)

COMPARE NULL TO POPULATION MODEL

anova(mod1,mod3,test="LRT")
#> Analysis of Deviance Table
#> 
#> Model 1: dead ~ 1
#> Model 2: dead ~ pop
#>   Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
#> 1        35     8434.4                          
#> 2        34     4270.2  1   4164.2 < 2.2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

model with smoking

mod4<-glm(dead~smoke,data=df,family = poisson)

COMPARE SMOKING MODEL TO NULL MODEL

anova(mod1,mod4,test="LRT")
#> Analysis of Deviance Table
#> 
#> Model 1: dead ~ 1
#> Model 2: dead ~ smoke
#>   Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
#> 1        35     8434.4                          
#> 2        32     4850.4  3   3584.1 < 2.2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

REPORT BEST MODEL

report::report(model)
#> We fitted a poisson model (estimated using ML) to predict dead with age, smoke
#> and pop (formula: dead ~ age + smoke + pop). The model's explanatory power is
#> substantial (Nagelkerke's R2 = 1.00). The model's intercept, corresponding to
#> age = 40-44, smoke = cigarPipeOnly and pop = 0, is at 2.77 (95% CI [2.62,
#> 2.92], p < .001). Within this model:
#> 
#>   - The effect of age [45-59] is statistically significant and positive (beta =
#> 0.62, 95% CI [0.46, 0.79], p < .001; Std. beta = 0.62, 95% CI [0.46, 0.79])
#>   - The effect of age [50-54] is statistically significant and positive (beta =
#> 1.00, 95% CI [0.84, 1.16], p < .001; Std. beta = 1.00, 95% CI [0.84, 1.16])
#>   - The effect of age [55-59] is statistically significant and positive (beta =
#> 1.38, 95% CI [1.26, 1.51], p < .001; Std. beta = 1.38, 95% CI [1.26, 1.51])
#>   - The effect of age [60-64] is statistically significant and positive (beta =
#> 1.50, 95% CI [1.38, 1.63], p < .001; Std. beta = 1.50, 95% CI [1.38, 1.63])
#>   - The effect of age [65-69] is statistically significant and positive (beta =
#> 2.14, 95% CI [2.01, 2.26], p < .001; Std. beta = 2.14, 95% CI [2.01, 2.26])
#>   - The effect of age [70-74] is statistically significant and positive (beta =
#> 2.36, 95% CI [2.22, 2.50], p < .001; Std. beta = 2.36, 95% CI [2.22, 2.50])
#>   - The effect of age [75-79] is statistically significant and positive (beta =
#> 2.20, 95% CI [2.05, 2.36], p < .001; Std. beta = 2.20, 95% CI [2.05, 2.36])
#>   - The effect of age [80+] is statistically significant and positive (beta =
#> 1.95, 95% CI [1.79, 2.12], p < .001; Std. beta = 1.95, 95% CI [1.79, 2.12])
#>   - The effect of smoke [cigarretteOnly] is statistically significant and
#> positive (beta = 0.57, 95% CI [0.49, 0.65], p < .001; Std. beta = 0.57, 95% CI
#> [0.49, 0.65])
#>   - The effect of smoke [cigarrettePlus] is statistically significant and
#> positive (beta = 0.29, 95% CI [0.19, 0.40], p < .001; Std. beta = 0.29, 95% CI
#> [0.19, 0.40])
#>   - The effect of smoke [no] is statistically significant and negative (beta =
#> -0.21, 95% CI [-0.30, -0.12], p < .001; Std. beta = -0.21, 95% CI [-0.30,
#> -0.12])
#>   - The effect of pop is statistically significant and positive (beta = 4.17e-04,
#> 95% CI [3.85e-04, 4.50e-04], p < .001; Std. beta = 0.65, 95% CI [0.60, 0.70])
#> 
#> Standardized parameters were obtained by fitting the model on a standardized
#> version of the dataset. 95% Confidence Intervals (CIs) and p-values were
#> computed using a Wald z-distribution approximation.
Bongani Ncube
Bongani Ncube
Data Scientist/Statistics/Public Health

My research interests include Casual Inference , Public Health , Survival Analysis, bayesian Statistics, Machine learning and Longitudinal Data Analysis.