Generalised Linear Models 2

Last updated on Sep 9, 2023 11 min read classification

Multinomial

If we have more than two categories or groups that we want to model relative to covariates (e.g., we have observations $i = 1,\cdots,n$ and groups covariates $j = 1,\cdots,J$), multinomial is our candidate model

Let

$p_{ij}$ be the probability that the i-th observation belongs to the j-th group
$Y_{ij}$ be the number of observations for individual i in group j; An individual will have observations $Y_{i1},Y_{i2},…Y_{iJ}$
assume the probability of observing this response is given by a multinomial distribution in terms of probabilities $p_{ij}$, where $\sum_{j = 1}^J p_{ij} = 1$ . For interpretation, we have a baseline category $p_{i1} = 1 - \sum_{j = 2}^J p_{ij}$

The link between the mean response (probability) $p_{ij}$ and a linear function of the covariates

$$ \eta_{ij} = \mathbf{x’_i \beta_j} $$

which equals

`$$\log \frac{p_{ij}}{p_{i1}}, j = 2,..,J $$

We compare $p_{ij}$ to the baseline $p_{i1}$, suggesting

$$ p_{ij} = \frac{\exp(\eta_{ij})}{1 + \sum_{i=2}^J \exp(\eta_{ij})} $$

which is known as multinomial logistic model.

Note:

Softmax coding for multinomial logistic regression: rather than selecting a baseline class, we treat all K class symmetrically - equally important (no baseline).

$$ P(Y = k | X = x) = \frac{exp(\beta_{k1} + \dots + \beta_{k_p x_p})}{\sum_{l = 1}^K exp(\beta_{l0} + \dots + \beta_{l_p x_p})} $$ then the log odds ratio between $k-th$ and $k^{t}th$ classes is

$$ \log (\frac{P(Y=k|X=x)}{P(Y = k’ | X=x)}) = (\beta_{k0} - \beta_{k'0}) + \dots + (\beta_{kp} - \beta_{k’p}) x_p $$

Poisson GLMs

The Poisson distribution

The Poisson distribution specifies the probability of a discrete random variable $Y$ and is given by:

$$f(y, \,\mu)\, =\, Pr(Y = y)\, =\, \frac{\mu^y \times e^{-\mu}}{y!}$$

$$E(Y)\, =\, Var(Y)\, =\, \mu$$

where $\mu$ is the parameter of the Poisson distribution

The Poisson distribution is particularly relevant to model count data because it:

specifies the probability only for integer values
$P(y<0) = 0$, hence the probability of any negative value is null
$Var(Y) = \mu$ (the mean-variance relationship) allows for heterogeneity (e.g. when variance generally increases with the mean)

A Poisson GLM will model the value of $\mu$ as a function of different explanatory variables:

Step 1.

We assume $Y_i$ follows a Poisson distribution with mean and variance $\mu_i$.

$$Y_i \sim Poisson(\mu_i)$$

$$E(Y_i) = Var(Y_i) = \mu_i$$

$$f(y_i, \, \mu_i) = \frac{\mu^{y_i}_i \times e^{-\mu_i}}{y!}$$

$\mu_i$ corresponds to the expected number of individuals.

Step 2.

We specify the linear predictor of the model just as in a linear model.

$$\underbrace{\alpha}_\text{Intercept} + \underbrace{\beta_1}_\text{slope of 'variable'} \times \text{variable}_i$$

Step 3.

The link between the mean of $Y_i$ and linear predictor is a logarithmic function can be written as:

$$log(\mu_i) = \alpha + \beta_1 \times \text{variable}_i $$

It can also be written as:

$$\mu_i = e^{ \alpha + \beta \times \text{variable}_i}$$

This shows that the impact of each explanatory variable is multiplicative. Increasing variable by one increases $\mu$ by factor of exp( $\beta_\text{variable}$ ).

We can also write it as:

$$\mu_i = e^{\alpha} \times e^{\beta_1^{\text{variable}_i}}$$

If $\beta_j = 0$ then $exp(\beta_j) = 1$ and $\mu$ is not related to $x_j$. If $\beta_j > 0$ then $\mu$ increases if $x_j$ increases; if $\beta_j < 0$ then $\mu$ decreases if $x_j$ increases.

setup

df<-readxl::read_xlsx("multinomial.xlsx")
df<-df |> 
  separate(`y,party,race,gender`,into=c("y","party","race","gender"),",") |> 
  mutate_if(is.character,as.factor) |> 
  mutate(y=as.numeric(y)) |> 
  rename(income="y")
df
#> # A tibble: 12 x 4
#>    income party       race  gender
#>     <dbl> <fct>       <fct> <fct> 
#>  1      5 Democrat    White Male  
#>  2      8 Republican  White Male  
#>  3      2 Independent White Male  
#>  4     10 Democrat    Black Male  
#>  5     12 Republican  Black Male  
#>  6      1 Independent Black Male  
#>  7      7 Democrat    White Female
#>  8      3 Republican  White Female
#>  9      4 Independent White Female
#> 10     11 Democrat    Black Female
#> 11      9 Republican  Black Female
#> 12      6 Independent Black Female

write_csv(df,"multinom.csv")

summary(df)
#>      income              party      race      gender 
#>  Min.   : 1.00   Democrat   :4   Black:6   Female:6  
#>  1st Qu.: 3.75   Independent:4   White:6   Male  :6  
#>  Median : 6.50   Republican :4                       
#>  Mean   : 6.50                                       
#>  3rd Qu.: 9.25                                       
#>  Max.   :12.00

table(df$party,df$race)
#>              
#>               Black White
#>   Democrat        2     2
#>   Independent     2     2
#>   Republican      2     2

model with all variables

model<-nnet::multinom(party~.,data=df)
#> # weights:  15 (8 variable)
#> initial  value 13.183347 
#> iter  10 value 8.524672
#> iter  20 value 8.227196
#> iter  30 value 8.224114
#> iter  40 value 8.223844
#> iter  50 value 8.223812
#> final  value 8.223811 
#> converged
model
#> Call:
#> nnet::multinom(formula = party ~ ., data = df)
#> 
#> Coefficients:
#>             (Intercept)     income  raceWhite  genderMale
#> Independent    12.33053 -1.5500393 -6.0737795 -0.07011659
#> Republican      1.12294 -0.1137472 -0.5209413  0.12415647
#> 
#> Residual Deviance: 16.44762 
#> AIC: 32.44762

summary(model)
#> Call:
#> nnet::multinom(formula = party ~ ., data = df)
#> 
#> Coefficients:
#>             (Intercept)     income  raceWhite  genderMale
#> Independent    12.33053 -1.5500393 -6.0737795 -0.07011659
#> Republican      1.12294 -0.1137472 -0.5209413  0.12415647
#> 
#> Std. Errors:
#>             (Intercept)    income raceWhite genderMale
#> Independent    8.023601 0.9713783  4.955594   2.783173
#> Republican     4.891305 0.4793859  2.613579   1.510782
#> 
#> Residual Deviance: 16.44762 
#> AIC: 32.44762

$$\log(\frac{\pi_I}{\pi_D})=12.33053-1.55income-6.07377raceWhite-0.07011659genderMale$$ $$\log(\frac{\pi_R}{\pi_D})=1.12294-0.113income-0.52094raceWhite-0.124genderMale$$

p1<-exp(2.181+11.67-0.096*5000)/(1+exp(2.181+11.67-0.096*5000)+exp(1.58+6.55-0.053*5000))
p1
#> [1] 3.581472e-203

p2<-exp(1.58+6.55-0.053*5000)/(1+exp(2.181+11.67-0.096*5000)+exp(2.181+11.67-0.096*5000))
p2
#> [1] 2.771893e-112

1-p1-p2
#> [1] 1

odds<-exp(1.12294-0.1137472*400)
odds
#> [1] 5.342864e-20

NULL MODEL

mod_null<-nnet::multinom(party~1,data=df)
#> # weights:  6 (2 variable)
#> initial  value 13.183347 
#> final  value 13.183347 
#> converged
mod_null
#> Call:
#> nnet::multinom(formula = party ~ 1, data = df)
#> 
#> Coefficients:
#>              (Intercept)
#> Independent 4.440892e-16
#> Republican  4.440892e-16
#> 
#> Residual Deviance: 26.36669 
#> AIC: 30.36669

model with gender

(model_gend<-nnet::multinom(party~gender,data=df))
#> # weights:  9 (4 variable)
#> initial  value 13.183347 
#> final  value 13.183347 
#> converged
#> Call:
#> nnet::multinom(formula = party ~ gender, data = df)
#> 
#> Coefficients:
#>              (Intercept)   genderMale
#> Independent 4.440892e-16 2.220446e-16
#> Republican  4.440892e-16 2.220446e-16
#> 
#> Residual Deviance: 26.36669 
#> AIC: 34.36669

compare the models

anova(mod_null,model_gend,test="Chisq")
#>    Model Resid. df Resid. Dev   Test    Df LR stat. Pr(Chi)
#> 1      1        22   26.36669           NA       NA      NA
#> 2 gender        20   26.36669 1 vs 2     2        0       1

model with race

(model_race<-nnet::multinom(party~race,data=df))
#> # weights:  9 (4 variable)
#> initial  value 13.183347 
#> final  value 13.183347 
#> converged
#> Call:
#> nnet::multinom(formula = party ~ race, data = df)
#> 
#> Coefficients:
#>              (Intercept)    raceWhite
#> Independent 4.440892e-16 2.220446e-16
#> Republican  4.440892e-16 2.220446e-16
#> 
#> Residual Deviance: 26.36669 
#> AIC: 34.36669

cmpare race model to null model

anova(mod_null,model_race,test="Chisq")
#>   Model Resid. df Resid. Dev   Test    Df LR stat. Pr(Chi)
#> 1     1        22   26.36669           NA       NA      NA
#> 2  race        20   26.36669 1 vs 2     2        0       1

model with income

(model_inc<-nnet::multinom(party~income,data=df))
#> # weights:  9 (4 variable)
#> initial  value 13.183347 
#> iter  10 value 9.866125
#> final  value 9.865368 
#> converged
#> Call:
#> nnet::multinom(formula = party ~ income, data = df)
#> 
#> Coefficients:
#>             (Intercept)      income
#> Independent   3.8280816 -0.69719923
#> Republican    0.2589209 -0.03185745
#> 
#> Residual Deviance: 19.73074 
#> AIC: 27.73074

compare income model to null model

anova(mod_null,model_inc,test="Chisq")
#>    Model Resid. df Resid. Dev   Test    Df LR stat.    Pr(Chi)
#> 1      1        22   26.36669           NA       NA         NA
#> 2 income        20   19.73074 1 vs 2     2 6.635959 0.03622595

model with race and income

(model_inc_race<-nnet::multinom(party~income+race,data=df))
#> # weights:  12 (6 variable)
#> initial  value 13.183347 
#> iter  10 value 8.499512
#> iter  20 value 8.244097
#> iter  30 value 8.229818
#> iter  40 value 8.228592
#> iter  50 value 8.228569
#> final  value 8.228567 
#> converged
#> Call:
#> nnet::multinom(formula = party ~ income + race, data = df)
#> 
#> Coefficients:
#>             (Intercept)     income  raceWhite
#> Independent   12.312994 -1.5477287 -6.0862942
#> Republican     1.050129 -0.1008768 -0.4603361
#> 
#> Residual Deviance: 16.45713 
#> AIC: 28.45713

compare models

anova(model_inc_race,model_inc,test="Chisq")
#>           Model Resid. df Resid. Dev   Test    Df LR stat.   Pr(Chi)
#> 1        income        20   19.73074           NA       NA        NA
#> 2 income + race        18   16.45713 1 vs 2     2 3.273602 0.1946016

check using chi square distribution

chisq.test(table(df$party,df$race))
#> 
#> 	Pearson's Chi-squared test
#> 
#> data:  table(df$party, df$race)
#> X-squared = 0, df = 2, p-value = 1

poisson dataset

df<-read.table("canada.txt",sep="",header=TRUE) |> 
  mutate_if(is.character,as.factor) |> 
  as_tibble()
df
#> # A tibble: 36 x 5
#>       id age   smoke           pop  dead
#>    <int> <fct> <fct>         <int> <int>
#>  1     1 40-44 no              656    18
#>  2     2 45-59 no              359    22
#>  3     3 50-54 no              249    19
#>  4     4 55-59 no              632    55
#>  5     5 60-64 no             1067   117
#>  6     6 65-69 no              897   170
#>  7     7 70-74 no              668   179
#>  8     8 75-79 no              361   120
#>  9     9 80+   no              274   120
#> 10    10 40-44 cigarPipeOnly   145     2
#> # i 26 more rows

ggplot(df,aes(x=dead))+
  geom_density()

model with all variables

model<-glm(dead~age+smoke+pop,data=df,family = poisson)
model
#> 
#> Call:  glm(formula = dead ~ age + smoke + pop, family = poisson, data = df)
#> 
#> Coefficients:
#>         (Intercept)             age45-59             age50-54  
#>           2.7717494            0.6242419            1.0038164  
#>            age55-59             age60-64             age65-69  
#>           1.3818228            1.5011996            2.1365714  
#>            age70-74             age75-79               age80+  
#>           2.3556629            2.2041331            1.9506334  
#> smokecigarretteOnly  smokecigarrettePlus              smokeno  
#>           0.5704051            0.2915664           -0.2113845  
#>                 pop  
#>           0.0004172  
#> 
#> Degrees of Freedom: 35 Total (i.e. Null);  23 Residual
#> Null Deviance:	    8434 
#> Residual Deviance: 584.9 	AIC: 850.9

extract_eq(model,terms_per_line = 2,wrap=TRUE,use_coefs = TRUE, greek_colors = "blue")

$$ \begin{aligned} \log ({ \widehat{E( \operatorname{dead} )} }) &= 2.77 + 0.62(\operatorname{age}_{\operatorname{45-59}})\ + \\ &\quad 1(\operatorname{age}_{\operatorname{50-54}}) + 1.38(\operatorname{age}_{\operatorname{55-59}})\ + \\ &\quad 1.5(\operatorname{age}_{\operatorname{60-64}}) + 2.14(\operatorname{age}_{\operatorname{65-69}})\ + \\ &\quad 2.36(\operatorname{age}_{\operatorname{70-74}}) + 2.2(\operatorname{age}_{\operatorname{75-79}})\ + \\ &\quad 1.95(\operatorname{age}_{\operatorname{80+}}) + 0.57(\operatorname{smoke}_{\operatorname{cigarretteOnly}})\ + \\ &\quad 0.29(\operatorname{smoke}_{\operatorname{cigarrettePlus}}) - 0.21(\operatorname{smoke}_{\operatorname{no}})\ + \\ &\quad 0(\operatorname{pop}) \end{aligned} $$

model_null

mod1<-glm(dead~1,data=df,family = poisson)

model with age

mod2<-glm(dead~age,data=df,family = poisson)

COMPARE THE MODELS

anova(mod1,mod2,test="LRT")
#> Analysis of Deviance Table
#> 
#> Model 1: dead ~ 1
#> Model 2: dead ~ age
#>   Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
#> 1        35     8434.4                          
#> 2        27     4845.0  8   3589.4 < 2.2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

model with population

mod3<-glm(dead~pop,data=df,family = poisson)

COMPARE NULL TO POPULATION MODEL

anova(mod1,mod3,test="LRT")
#> Analysis of Deviance Table
#> 
#> Model 1: dead ~ 1
#> Model 2: dead ~ pop
#>   Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
#> 1        35     8434.4                          
#> 2        34     4270.2  1   4164.2 < 2.2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

model with smoking

mod4<-glm(dead~smoke,data=df,family = poisson)

COMPARE SMOKING MODEL TO NULL MODEL

anova(mod1,mod4,test="LRT")
#> Analysis of Deviance Table
#> 
#> Model 1: dead ~ 1
#> Model 2: dead ~ smoke
#>   Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
#> 1        35     8434.4                          
#> 2        32     4850.4  3   3584.1 < 2.2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

REPORT BEST MODEL

report::report(model)
#> We fitted a poisson model (estimated using ML) to predict dead with age, smoke
#> and pop (formula: dead ~ age + smoke + pop). The model's explanatory power is
#> substantial (Nagelkerke's R2 = 1.00). The model's intercept, corresponding to
#> age = 40-44, smoke = cigarPipeOnly and pop = 0, is at 2.77 (95% CI [2.62,
#> 2.92], p < .001). Within this model:
#> 
#>   - The effect of age [45-59] is statistically significant and positive (beta =
#> 0.62, 95% CI [0.46, 0.79], p < .001; Std. beta = 0.62, 95% CI [0.46, 0.79])
#>   - The effect of age [50-54] is statistically significant and positive (beta =
#> 1.00, 95% CI [0.84, 1.16], p < .001; Std. beta = 1.00, 95% CI [0.84, 1.16])
#>   - The effect of age [55-59] is statistically significant and positive (beta =
#> 1.38, 95% CI [1.26, 1.51], p < .001; Std. beta = 1.38, 95% CI [1.26, 1.51])
#>   - The effect of age [60-64] is statistically significant and positive (beta =
#> 1.50, 95% CI [1.38, 1.63], p < .001; Std. beta = 1.50, 95% CI [1.38, 1.63])
#>   - The effect of age [65-69] is statistically significant and positive (beta =
#> 2.14, 95% CI [2.01, 2.26], p < .001; Std. beta = 2.14, 95% CI [2.01, 2.26])
#>   - The effect of age [70-74] is statistically significant and positive (beta =
#> 2.36, 95% CI [2.22, 2.50], p < .001; Std. beta = 2.36, 95% CI [2.22, 2.50])
#>   - The effect of age [75-79] is statistically significant and positive (beta =
#> 2.20, 95% CI [2.05, 2.36], p < .001; Std. beta = 2.20, 95% CI [2.05, 2.36])
#>   - The effect of age [80+] is statistically significant and positive (beta =
#> 1.95, 95% CI [1.79, 2.12], p < .001; Std. beta = 1.95, 95% CI [1.79, 2.12])
#>   - The effect of smoke [cigarretteOnly] is statistically significant and
#> positive (beta = 0.57, 95% CI [0.49, 0.65], p < .001; Std. beta = 0.57, 95% CI
#> [0.49, 0.65])
#>   - The effect of smoke [cigarrettePlus] is statistically significant and
#> positive (beta = 0.29, 95% CI [0.19, 0.40], p < .001; Std. beta = 0.29, 95% CI
#> [0.19, 0.40])
#>   - The effect of smoke [no] is statistically significant and negative (beta =
#> -0.21, 95% CI [-0.30, -0.12], p < .001; Std. beta = -0.21, 95% CI [-0.30,
#> -0.12])
#>   - The effect of pop is statistically significant and positive (beta = 4.17e-04,
#> 95% CI [3.85e-04, 4.50e-04], p < .001; Std. beta = 0.65, 95% CI [0.60, 0.70])
#> 
#> Standardized parameters were obtained by fitting the model on a standardized
#> version of the dataset. 95% Confidence Intervals (CIs) and p-values were
#> computed using a Wald z-distribution approximation.

Bongani Ncube

Data Scientist/Statistics/Public Health

My research interests include Casual Inference , Public Health , Survival Analysis, bayesian Statistics, Machine learning and Longitudinal Data Analysis.