Hypothesis 2

Research questions

  1. Is there a mediation effect between item difficulty and CFA, where forum attendance is a mediator?
  2. Is there a mediation effect between item type and CFA, where forum attendance is a mediator?

Database desciption

  • assignments - database with data on the the level individual assignments

Variables description

Important:

  • forum_attendance_tw - Forum attendance during time window (start of assignment-first attempt). Binary.
  • forum_visits_tw - the number of forum visits during time window. Counted.
  • CFA - Correctness of the first attempt. Binary.
  • difficulty - the number of students failed / the number of student participated. From 0 to 1.

Less important:

  • hse_user_id - a student token;
  • course_item_name and course_item_type_desc - item token/name and item type;

Analysis is influenced by https://towardsdatascience.com/doing-and-reporting-your-first-mediation-analysis-in-r-2fe423b92171

Mediation analyses

Show the code
# p-value stars transformer
transform_pvalue <- function(p_value) {
  if (p_value < 0.001) {
    transformed <- "***"
  } else if (p_value < 0.01) {
    transformed <- "**"
  } else if (p_value < 0.05) {
    transformed <- "*"
  } else {
    transformed <- ""
  }
  
  return(transformed)
}

Diffculty as DV

Show the code
# grouping by user
tw1 <- assignments %>% group_by(hse_user_id) %>% 
  summarise(difficulty = median(difficulty), 
            forum_attendance_tw = median(as.numeric(forum_attendance_tw) - 1), 
            CFA = median(CFA)) 

# grouping by course
tw2 <- assignments %>% group_by(course_item_name) %>% 
  summarise(difficulty = median(difficulty), 
            forum_attendance_tw = median(as.numeric(forum_attendance_tw) - 1),  
            CFA = median(CFA)) 

Grouping by users

Show the code
# Step 1: Y ~ X
fit.totaleffect <- glm(data = data.table(tw1), CFA ~ 1 + difficulty , family = "binomial") 
#summary(fit.totaleffect)

# Step 2: M ~ X
fit.mediator = glm(data = data.table(tw1), forum_attendance_tw ~ 1 + difficulty , family = "binomial") 
#summary(fit.mediator)
# there is an effect

# Step 3: Y ~ X + M 
fit.dv = glm(data = data.table(tw1), CFA ~ 1 + forum_attendance_tw + difficulty, family = "binomial")
#summary(fit.dv)
# there is an effect, but mediation is incomplete

# Step 4: Run the mediation analysis
results_m1 = mediate(fit.mediator, fit.dv, treat='difficulty', mediator='forum_attendance_tw')

# Step 5: View the mediation results
summary(results_m1)

Causal Mediation Analysis 

Quasi-Bayesian Confidence Intervals

                          Estimate 95% CI Lower 95% CI Upper p-value    
ACME (control)           -0.002785    -0.004658         0.00  <2e-16 ***
ACME (treated)           -0.001798    -0.002913         0.00  <2e-16 ***
ADE (control)            -0.974248    -0.978050        -0.97  <2e-16 ***
ADE (treated)            -0.973262    -0.977262        -0.97  <2e-16 ***
Total Effect             -0.976047    -0.979453        -0.97  <2e-16 ***
Prop. Mediated (control)  0.002811     0.001359         0.00  <2e-16 ***
Prop. Mediated (treated)  0.001831     0.000895         0.00  <2e-16 ***
ACME (average)           -0.002292    -0.003722         0.00  <2e-16 ***
ADE (average)            -0.973755    -0.977644        -0.97  <2e-16 ***
Prop. Mediated (average)  0.002321     0.001119         0.00  <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Sample Size Used: 40188 


Simulations: 1000 
Show the code
# total
cat(sprintf("1. Y ~ X\nTotal effect is %.4f.\n\n", summary(results_m1)$tau.coef))
1. Y ~ X
Total effect is -0.9760.
Show the code
cat(sprintf("Check 1: %.4f.\n",   fit.totaleffect$coefficients[2] ))
Check 1: -8.7531.
Show the code
cat(sprintf("Check 2: %.4f.\n\n",    summary(results_m1)$z0 + round(summary(results_m1)$d0), 5)) # ADE + ACME
Check 2: -0.9742.
Show the code
#ACME
cat(sprintf("2: Y ~ x * M\nAverage causal mediation effect (ACME) is %.5f.\nThis is the indirect effect of the IV (item difficulty) \non the DV (CFA) that goes through the mediator (forum attendance).\n\n", summary(results_m1)$d0))
2: Y ~ x * M
Average causal mediation effect (ACME) is -0.00278.
This is the indirect effect of the IV (item difficulty) 
on the DV (CFA) that goes through the mediator (forum attendance).
Show the code
cat(sprintf("Check: %.4f * %.4f = %.4f.\n\n",  fit.mediator$coefficients[2], fit.dv$coefficients[2], fit.mediator$coefficients[2] * fit.dv$coefficients[2] ))
Check: 6.8079 * -0.2144 = -1.4594.
Show the code
#ADE
cat(sprintf("3: Y ~ X - M\nAverage direct effect (ADE) of the IV on the DV is %.4f.\n\n",  summary(results_m1)$z0))
3: Y ~ X - M
Average direct effect (ADE) of the IV on the DV is -0.9742.
Show the code
cat(sprintf("Check 1: %.4f",  fit.dv$coefficients[3]))
Check 1: -8.7046
Show the code
med_data <-
  data.frame(
    lab_x   = "Item\\nDifficulty",
    lab_m   = "Forum\\nattendance",
    lab_y   = "CFA",
    coef_xm = sprintf('%.2f %s', round(fit.mediator$coefficients[2], 2), transform_pvalue(summary(fit.mediator)$coefficients[8])),
    coef_my = sprintf('%.2f %s', round(fit.dv$coefficients[2], 2), 
transform_pvalue(summary(fit.dv)$coefficients[8])),
    coef_xy = sprintf('%.2f %s(%.2f)', fit.totaleffect$coefficients[2], transform_pvalue(summary(fit.totaleffect)$coefficients[8]),
fit.dv$coefficients[3])
  )

med_diagram(med_data)

Grouping by courses

Show the code
# Step 1: Y ~ X
fit.totaleffect <- glm(data = data.table(tw1), CFA ~ 1 + difficulty , family = "binomial") 
#summary(fit.totaleffect)

# Step 2: M ~ X
fit.mediator = glm(data = data.table(tw1), forum_attendance_tw ~ 1 + difficulty , family = "binomial") 
#summary(fit.mediator)
# there is an effect

# Step 3: Y ~ X + M 
fit.dv = glm(data = data.table(tw1), CFA ~ 1 + forum_attendance_tw + difficulty, family = "binomial")
#summary(fit.dv)
# there is an effect, but mediation is incomplete

# Step 4: Run the mediation analysis
results_m2 = mediate(fit.mediator, fit.dv, treat='difficulty', mediator='forum_attendance_tw')

# Step 5: View the mediation results
summary(results_m2)

Causal Mediation Analysis 

Quasi-Bayesian Confidence Intervals

                          Estimate 95% CI Lower 95% CI Upper p-value    
ACME (control)           -0.002775    -0.004658         0.00  <2e-16 ***
ACME (treated)           -0.001776    -0.002837         0.00  <2e-16 ***
ADE (control)            -0.974483    -0.978013        -0.97  <2e-16 ***
ADE (treated)            -0.973484    -0.977103        -0.97  <2e-16 ***
Total Effect             -0.976259    -0.979504        -0.97  <2e-16 ***
Prop. Mediated (control)  0.002781     0.001230         0.00  <2e-16 ***
Prop. Mediated (treated)  0.001806     0.000884         0.00  <2e-16 ***
ACME (average)           -0.002275    -0.003728         0.00  <2e-16 ***
ADE (average)            -0.973984    -0.977518        -0.97  <2e-16 ***
Prop. Mediated (average)  0.002294     0.001067         0.00  <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Sample Size Used: 40188 


Simulations: 1000 
Show the code
cat(sprintf("Step 1: X ~ Y\nTotal effect is %.4f.\n\n", summary(results_m2)$tau.coef))
Step 1: X ~ Y
Total effect is -0.9763.
Show the code
cat(sprintf("Step 2: M ~ Y\nAverage causal mediation effect (ACME) is %.5f.\nThis is the indirect effect of the IV (item difficulty) \non the DV (CFA) that goes through the mediator (forum attendance).\n\n", summary(results_m2)$d0))
Step 2: M ~ Y
Average causal mediation effect (ACME) is -0.00278.
This is the indirect effect of the IV (item difficulty) 
on the DV (CFA) that goes through the mediator (forum attendance).
Show the code
cat(sprintf("Step 3: X + M ~ Y\nAverage direct effect of the IV on the DV is %.4f.\n\n",  summary(results_m2)$z0))
Step 3: X + M ~ Y
Average direct effect of the IV on the DV is -0.9745.

Type as DV

Show the code
tw3 <- assignments %>% group_by(hse_user_id) %>% summarise(course_item_type_id = median(as.numeric(course_item_type_id) - 1), forum_attendance_tw = median(as.numeric(forum_attendance_tw) - 1), CFA = median(CFA)) 

tw4 <- assignments %>% group_by(course_item_name) %>% summarise(course_item_type_id = median(as.numeric(course_item_type_id) - 1), forum_attendance_tw = median(as.numeric(forum_attendance_tw) - 1), CFA = median(CFA)) 

Groupung by users

Show the code
fit.totaleffect <- glm(data = data.table(tw3), CFA ~ 1 + course_item_type_id , family = "binomial") 
#summary(fit.totaleffect)

fit.mediator = glm(data = data.table(tw3), forum_attendance_tw ~ 1 + course_item_type_id , family = "binomial") 
#summary(fit.mediator)
# there is an effect

fit.dv = glm(data = data.table(tw3), CFA ~ 1 + forum_attendance_tw + course_item_type_id, family = "binomial")
#summary(fit.dv)

results_m1 = mediate(fit.mediator, fit.dv, treat='course_item_type_id', mediator='forum_attendance_tw')
summary(results_m1)

Causal Mediation Analysis 

Quasi-Bayesian Confidence Intervals

                         Estimate 95% CI Lower 95% CI Upper p-value    
ACME (control)           -0.00995     -0.01211        -0.01  <2e-16 ***
ACME (treated)           -0.01028     -0.01250        -0.01  <2e-16 ***
ADE (control)            -0.01008     -0.01922         0.00   0.034 *  
ADE (treated)            -0.01042     -0.01975         0.00   0.034 *  
Total Effect             -0.02037     -0.02923        -0.01  <2e-16 ***
Prop. Mediated (control)  0.48565      0.30911         0.91  <2e-16 ***
Prop. Mediated (treated)  0.50301      0.32901         0.91  <2e-16 ***
ACME (average)           -0.01012     -0.01228        -0.01  <2e-16 ***
ADE (average)            -0.01025     -0.01949         0.00   0.034 *  
Prop. Mediated (average)  0.49433      0.31938         0.91  <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Sample Size Used: 40188 


Simulations: 1000 
Show the code
cat(sprintf("Step 1: X ~ Y\nTotal effect is %.4f.\n\n", summary(results_m1)$tau.coef))
Step 1: X ~ Y
Total effect is -0.0204.
Show the code
cat(sprintf("Step 2: M ~ Y\nAverage causal mediation effect (ACME) is %.5f.\nThis is the indirect effect of the IV (item type) \non the DV (CFA) that goes through the mediator (forum attendance).\n\n", summary(results_m1)$d0))
Step 2: M ~ Y
Average causal mediation effect (ACME) is -0.00995.
This is the indirect effect of the IV (item type) 
on the DV (CFA) that goes through the mediator (forum attendance).
Show the code
cat(sprintf("Step 3: X + M ~ Y\nAverage direct effect of the IV on the DV is %.4f.\n\n",  summary(results_m1)$z0))
Step 3: X + M ~ Y
Average direct effect of the IV on the DV is -0.0101.

Grouping by courses

Show the code
# Step 1: X ~ Y
fit.totaleffect <- glm(data = data.table(tw4), CFA ~ 1 + course_item_type_id , family = "binomial") 
#summary(fit.totaleffect)
# there is an effect

# Step 2: M ~ Y
fit.mediator = glm(data = data.table(tw4), forum_attendance_tw ~ 1 + course_item_type_id , family = "binomial") 
#summary(fit.mediator)
# there is an effect

# Step 3: X + M ~ Y
fit.dv = glm(data = data.table(tw4), CFA ~ 1 + forum_attendance_tw + course_item_type_id, family = "binomial")
#summary(fit.dv)

# there is an effect, but mediation is incomplete

# Step 4: Run the mediation analysis
results_m2 = mediate(fit.mediator, fit.dv, treat='course_item_type_id', mediator='forum_attendance_tw')
summary(results_m2)

Causal Mediation Analysis 

Quasi-Bayesian Confidence Intervals

                         Estimate 95% CI Lower 95% CI Upper p-value    
ACME (control)             0.0955      -0.0458         0.38    0.99    
ACME (treated)             0.1470      -0.0601         0.47    0.99    
ADE (control)             -0.2433      -0.3910        -0.13  <2e-16 ***
ADE (treated)             -0.1918      -0.2575        -0.13  <2e-16 ***
Total Effect              -0.0963      -0.2709         0.22    0.57    
Prop. Mediated (control)   0.1215     -14.5613        15.24    0.44    
Prop. Mediated (treated)   0.1909     -25.0173        25.37    0.44    
ACME (average)             0.1212      -0.0527         0.42    0.99    
ADE (average)             -0.2175      -0.3195        -0.13  <2e-16 ***
Prop. Mediated (average)   0.1562     -19.3450        20.20    0.44    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Sample Size Used: 669 


Simulations: 1000 
Show the code
cat(sprintf("Step 1: X ~ Y\nTotal effect is %.4f.\n\n", summary(results_m2)$tau.coef))
Step 1: X ~ Y
Total effect is -0.0963.
Show the code
cat(sprintf("Step 2: M ~ Y\nAverage causal mediation effect (ACME) is %.5f.\nThis is the indirect effect of the IV (item type) \non the DV (CFA) that goes through the mediator (forum attendance).\n\n", summary(results_m2)$d0))
Step 2: M ~ Y
Average causal mediation effect (ACME) is 0.09549.
This is the indirect effect of the IV (item type) 
on the DV (CFA) that goes through the mediator (forum attendance).
Show the code
cat(sprintf("Step 3: X + M ~ Y\nAverage direct effect of the IV on the DV is %.4f.\n\n",  summary(results_m2)$z0))
Step 3: X + M ~ Y
Average direct effect of the IV on the DV is -0.2433.

2 random and 2 fixed effects

Show the code
# Step 1: Mediator Model (X → M)
m_2re_mediator <- glmmTMB(
  forum_attendance_tw  ~ 1 + difficulty + (1 | hse_user_id) + (1 | course_item_type_id), 
  data = assignments, 
  family = gaussian()
)
summary(m_2re_mediator)
 Family: gaussian  ( identity )
Formula:          
forum_attendance_tw ~ 1 + difficulty + (1 | hse_user_id) + (1 |  
    course_item_type_id)
Data: assignments

      AIC       BIC    logLik  deviance  df.resid 
 713100.9  713161.2 -356545.5  713090.9   1258385 

Random effects:

Conditional model:
 Groups              Name        Variance Std.Dev.
 hse_user_id         (Intercept) 0.01712  0.1308  
 course_item_type_id (Intercept) 0.01092  0.1045  
 Residual                        0.09859  0.3140  
Number of obs: 1258390, groups:  hse_user_id, 40188; course_item_type_id, 2

Dispersion estimate for gaussian family (sigma^2): 0.0986 

Conditional model:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) 0.082189   0.073914    1.11    0.266    
difficulty  0.202799   0.001872  108.32   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Show the code
# Step 2: Total Effect Model (X → Y)
m_2re_total<- glmmTMB(
  CFA  ~ 1 + difficulty + (1 | hse_user_id) + (1 | course_item_type_id), 
  data = assignments, 
  family = gaussian()
)
summary(m_2re_total)
 Family: gaussian  ( identity )
Formula:          
CFA ~ 1 + difficulty + (1 | hse_user_id) + (1 | course_item_type_id)
Data: assignments

      AIC       BIC    logLik  deviance  df.resid 
1427548.4 1427608.7 -713769.2 1427538.4   1258385 

Random effects:

Conditional model:
 Groups              Name        Variance  Std.Dev.
 hse_user_id         (Intercept) 1.734e-02 0.131673
 course_item_type_id (Intercept) 1.375e-06 0.001173
 Residual                        1.759e-01 0.419453
Number of obs: 1258390, groups:  hse_user_id, 40188; course_item_type_id, 2

Dispersion estimate for gaussian family (sigma^2): 0.176 

Conditional model:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.996992   0.001497   666.1   <2e-16 ***
difficulty  -0.997215   0.002496  -399.6   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Show the code
# Step 3: Direct & Indirect Effects Model (X, M → Y)
m_2re_di <- glmmTMB(
  CFA ~ 1 + difficulty + forum_attendance_tw + (1 | hse_user_id) + (1 | course_item_type_id), 
  data = assignments, 
  family = binomial()
)
summary(m_2re_di)
 Family: binomial  ( logit )
Formula:          
CFA ~ 1 + difficulty + forum_attendance_tw + (1 | hse_user_id) +  
    (1 | course_item_type_id)
Data: assignments

      AIC       BIC    logLik  deviance  df.resid 
1358950.9 1359011.2 -679470.5 1358940.9   1258385 

Random effects:

Conditional model:
 Groups              Name        Variance  Std.Dev. 
 hse_user_id         (Intercept) 6.165e-01 7.852e-01
 course_item_type_id (Intercept) 8.519e-11 9.230e-06
Number of obs: 1258390, groups:  hse_user_id, 40188; course_item_type_id, 2

Conditional model:
                      Estimate Std. Error z value Pr(>|z|)    
(Intercept)           2.636563   0.007286   361.8   <2e-16 ***
difficulty           -5.278928   0.015031  -351.2   <2e-16 ***
forum_attendance_tw1 -0.070954   0.006329   -11.2   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

checking the difference

Show the code
summary(fit.dv)

Call:
glm(formula = CFA ~ 1 + forum_attendance_tw + course_item_type_id, 
    family = "binomial", data = data.table(tw4))

Coefficients:
                    Estimate Std. Error z value Pr(>|z|)    
(Intercept)           2.3570     0.1794  13.135  < 2e-16 ***
forum_attendance_tw  -1.5843     0.4150  -3.818 0.000135 ***
course_item_type_id  -1.3350     0.2304  -5.794 6.89e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 620.55  on 668  degrees of freedom
Residual deviance: 552.58  on 666  degrees of freedom
AIC: 566.06

Number of Fisher Scoring iterations: 5
Show the code
summary(m_2re_di)
 Family: binomial  ( logit )
Formula:          
CFA ~ 1 + difficulty + forum_attendance_tw + (1 | hse_user_id) +  
    (1 | course_item_type_id)
Data: assignments

      AIC       BIC    logLik  deviance  df.resid 
1358950.9 1359011.2 -679470.5 1358940.9   1258385 

Random effects:

Conditional model:
 Groups              Name        Variance  Std.Dev. 
 hse_user_id         (Intercept) 6.165e-01 7.852e-01
 course_item_type_id (Intercept) 8.519e-11 9.230e-06
Number of obs: 1258390, groups:  hse_user_id, 40188; course_item_type_id, 2

Conditional model:
                      Estimate Std. Error z value Pr(>|z|)    
(Intercept)           2.636563   0.007286   361.8   <2e-16 ***
difficulty           -5.278928   0.015031  -351.2   <2e-16 ***
forum_attendance_tw1 -0.070954   0.006329   -11.2   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1