Hypothesis 1

Research questions

  1. What is the relationship between Correctness of the first attempt and forum attendance during task performance?

Database desciption

  • assignments - database with data on the the level individual assignments

Variables description

Important:

  • forum_attendance_tw - Forum attendance during time window (start of assignment-first attempt). Binary.
  • forum_visits_tw - the number of forum visits during time window. Counted.
  • CFA - Correctness of the first attempt. Binary.

Less important:

  • hse_user_id - a student token;
  • course_item_name and course_item_type_desc - item token/name and item type;

Hypothesis 1

Model 1

Checking the variance of the CFA (correctness of the first attempt) variable

Show the code
# m10 <- glm(data = assignments, factor(CFA) ~ 1, family = "binomial")
# saveRDS(m10, "data/m10.rds")
[1] 255
Show the code
summary(m10)

Call:
glm(formula = factor(CFA) ~ 1, family = "binomial", data = assignments)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.5064  -1.5064   0.8808   0.8808   0.8808  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) 0.746768   0.001909   391.3   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1580574  on 1258389  degrees of freedom
Residual deviance: 1580574  on 1258389  degrees of freedom
AIC: 1580576

Number of Fisher Scoring iterations: 4
Show the code
# m1 <- glmer(data = assignments, factor(CFA) ~ 1 + (1|hse_user_id) + (1|course_item_name), family = "binomial")
# saveRDS(m1, "data/m1.rds")
[1] 255
Show the code
summary(m1)
Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: binomial  ( logit )
Formula: factor(CFA) ~ 1 + (1 | hse_user_id) + (1 | course_item_name)
   Data: assignments

      AIC       BIC    logLik  deviance  df.resid 
1367971.5 1368007.6 -683982.7 1367965.5   1258387 

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-14.0776  -0.7882   0.3922   0.6316  13.7935 

Random effects:
 Groups           Name        Variance Std.Dev.
 hse_user_id      (Intercept) 0.6372   0.7982  
 course_item_name (Intercept) 1.1000   1.0488  
Number of obs: 1258390, groups:  hse_user_id, 40188; course_item_name, 669

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.85735    0.03983   21.53   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Show the code
sd_proficieny <- data.frame(VarCorr(m1))[5][1, 1]
sd_difficulty <- data.frame(VarCorr(m1))[5][2, 1]
coef <- summary(m1)$coef[1] # intercept of m0
Student standart deviation(proficiency) - 0.80.
Assignment standart deviation (difficulty) - 1.05.
Show the code
c_proficieny <- c(round(invlogit(coef - sd_proficieny), digits = 2), round(invlogit(coef + sd_proficieny), digits = 2))
c_difficulty <- c(round(invlogit(coef - sd_proficieny - sd_difficulty), digits = 2), round(invlogit(coef - sd_proficieny + sd_difficulty), digits = 2))
Correctness of first assignemnt (CFA) or Probability to pass an average by average student assignment with the first attempt  - 0.70, beta - 0.86.
-sd 0.51 and +sd 0.84 for student proficiency.
-sd 0.27 and +sd 0.75 for assignemtn difficulty.
Show the code
ll1 <- logLik(m10)*-2
ll2 <- logLik(m1)*-2
chi <- ll1[1] - ll2[1]
df <- 3-1

chi
[1] 212608.4
Show the code
df
[1] 2
Show the code
# the results are significant if qchisq greater than chi 
qchisq(p=.0001, df=df, lower.tail=FALSE)
[1] 18.42068
Show the code
assignments  %>% filter(forum_visits_tw < 1000) %>%   ggplot(aes(x = forum_visits_tw, y = CFA, color = course_item_type_desc)) +
  geom_point()   +  

  geom_smooth(method = "glm", method.args=list(family="binomial")) +   
  ggtitle("Probability to pass the test from the first attempt\nand forum visits during first attempt") +
  labs(x = "Forum visists during first attempt", y = "Probability to pass test during first attempt", color = "Assignment type") +
  theme_bw() +  
  theme(plot.title = element_text(hjust = 0.5))

Show the code
anova(m10, m1)
Analysis of Deviance Table

Model: binomial, link: logit

Response: factor(CFA)

Terms added sequentially (first to last)

     Df Deviance Resid. Df Resid. Dev
NULL               1258389    1580574

Model 2

checking the realtionship between forum attendance and CFA with two random effects, on student proficience and course difficulty.

Show the code
# m2 <- glmer(data = assignments, factor(CFA) ~ 1 + forum_attendance_tw  + (1|hse_user_id) + (1|course_item_name), family = "binomial")
# saveRDS(m2, "data/m2.rds")
Show the code
system('7z x data/m2.zip -odata/.')
[1] 255
Show the code
m2 <- readRDS("data/m2.rds")
summary(m2)
Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: binomial  ( logit )
Formula: factor(CFA) ~ 1 + forum_attendance_tw + (1 | hse_user_id) + (1 |  
    course_item_name)
   Data: assignments

      AIC       BIC    logLik  deviance  df.resid 
1367854.9 1367903.1 -683923.4 1367846.9   1258386 

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-13.9826  -0.7876   0.3923   0.6316  13.6277 

Random effects:
 Groups           Name        Variance Std.Dev.
 hse_user_id      (Intercept) 0.6328   0.7955  
 course_item_name (Intercept) 1.0895   1.0438  
Number of obs: 1258390, groups:  hse_user_id, 40188; course_item_name, 669

Fixed effects:
                      Estimate Std. Error z value Pr(>|z|)    
(Intercept)           0.867672   0.039311   22.07   <2e-16 ***
forum_attendance_tw1 -0.071917   0.006578  -10.93   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr)
frm_ttndn_1 -0.021
Show the code
m20 <- summary(m2)
Probability of CFA if a student didn't visit a forum before 0.70.
Probability of CFA if a student visited a forum at least once - 0.69

Visualisation

Show the code
assignments  %>%  ggplot(., aes(factor(forum_attendance_tw), fill = factor(CFA))) +
  geom_bar(position = "dodge2") +
  labs(x = "Forum visited or not", fill = "First attempt") + theme_classic() +  
  scale_fill_discrete(labels=c("Not passed", "Passed")) +
  theme(plot.title = element_text(hjust = 0.5))