Hypothesis 1
Research questions
- What is the relationship between Correctness of the first attempt and forum attendance during task performance?
Database desciption
- assignments - database with data on the the level individual assignments
Variables description
Important:
- forum_attendance_tw - Forum attendance during time window (start of assignment-first attempt). Binary.
- forum_visits_tw - the number of forum visits during time window. Counted.
- CFA - Correctness of the first attempt. Binary.
Less important:
- hse_user_id - a student token;
- course_item_name and course_item_type_desc - item token/name and item type;
Hypothesis 1
Model 1
Checking the variance of the CFA (correctness of the first attempt) variable
Show the code
# m10 <- glm(data = assignments, factor(CFA) ~ 1, family = "binomial")
# saveRDS(m10, "data/m10.rds")
[1] 255
Show the code
summary(m10)
Call:
glm(formula = factor(CFA) ~ 1, family = "binomial", data = assignments)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.5064 -1.5064 0.8808 0.8808 0.8808
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.746768 0.001909 391.3 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1580574 on 1258389 degrees of freedom
Residual deviance: 1580574 on 1258389 degrees of freedom
AIC: 1580576
Number of Fisher Scoring iterations: 4
Show the code
# m1 <- glmer(data = assignments, factor(CFA) ~ 1 + (1|hse_user_id) + (1|course_item_name), family = "binomial")
# saveRDS(m1, "data/m1.rds")
[1] 255
Show the code
summary(m1)
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: binomial ( logit )
Formula: factor(CFA) ~ 1 + (1 | hse_user_id) + (1 | course_item_name)
Data: assignments
AIC BIC logLik deviance df.resid
1367971.5 1368007.6 -683982.7 1367965.5 1258387
Scaled residuals:
Min 1Q Median 3Q Max
-14.0776 -0.7882 0.3922 0.6316 13.7935
Random effects:
Groups Name Variance Std.Dev.
hse_user_id (Intercept) 0.6372 0.7982
course_item_name (Intercept) 1.1000 1.0488
Number of obs: 1258390, groups: hse_user_id, 40188; course_item_name, 669
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.85735 0.03983 21.53 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Show the code
<- data.frame(VarCorr(m1))[5][1, 1]
sd_proficieny <- data.frame(VarCorr(m1))[5][2, 1]
sd_difficulty <- summary(m1)$coef[1] # intercept of m0 coef
Student standart deviation(proficiency) - 0.80.
Assignment standart deviation (difficulty) - 1.05.
Show the code
<- c(round(invlogit(coef - sd_proficieny), digits = 2), round(invlogit(coef + sd_proficieny), digits = 2))
c_proficieny <- c(round(invlogit(coef - sd_proficieny - sd_difficulty), digits = 2), round(invlogit(coef - sd_proficieny + sd_difficulty), digits = 2)) c_difficulty
Correctness of first assignemnt (CFA) or Probability to pass an average by average student assignment with the first attempt - 0.70, beta - 0.86.
-sd 0.51 and +sd 0.84 for student proficiency.
-sd 0.27 and +sd 0.75 for assignemtn difficulty.
Show the code
<- logLik(m10)*-2
ll1 <- logLik(m1)*-2
ll2 <- ll1[1] - ll2[1]
chi <- 3-1
df
chi
[1] 212608.4
Show the code
df
[1] 2
Show the code
# the results are significant if qchisq greater than chi
qchisq(p=.0001, df=df, lower.tail=FALSE)
[1] 18.42068
Show the code
%>% filter(forum_visits_tw < 1000) %>% ggplot(aes(x = forum_visits_tw, y = CFA, color = course_item_type_desc)) +
assignments geom_point() +
geom_smooth(method = "glm", method.args=list(family="binomial")) +
ggtitle("Probability to pass the test from the first attempt\nand forum visits during first attempt") +
labs(x = "Forum visists during first attempt", y = "Probability to pass test during first attempt", color = "Assignment type") +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5))
Show the code
anova(m10, m1)
Analysis of Deviance Table
Model: binomial, link: logit
Response: factor(CFA)
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev
NULL 1258389 1580574
Model 2
checking the realtionship between forum attendance and CFA with two random effects, on student proficience and course difficulty.
Show the code
# m2 <- glmer(data = assignments, factor(CFA) ~ 1 + forum_attendance_tw + (1|hse_user_id) + (1|course_item_name), family = "binomial")
# saveRDS(m2, "data/m2.rds")
Show the code
system('7z x data/m2.zip -odata/.')
[1] 255
Show the code
<- readRDS("data/m2.rds")
m2 summary(m2)
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: binomial ( logit )
Formula: factor(CFA) ~ 1 + forum_attendance_tw + (1 | hse_user_id) + (1 |
course_item_name)
Data: assignments
AIC BIC logLik deviance df.resid
1367854.9 1367903.1 -683923.4 1367846.9 1258386
Scaled residuals:
Min 1Q Median 3Q Max
-13.9826 -0.7876 0.3923 0.6316 13.6277
Random effects:
Groups Name Variance Std.Dev.
hse_user_id (Intercept) 0.6328 0.7955
course_item_name (Intercept) 1.0895 1.0438
Number of obs: 1258390, groups: hse_user_id, 40188; course_item_name, 669
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.867672 0.039311 22.07 <2e-16 ***
forum_attendance_tw1 -0.071917 0.006578 -10.93 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
frm_ttndn_1 -0.021
Show the code
<- summary(m2) m20
Probability of CFA if a student didn't visit a forum before 0.70.
Probability of CFA if a student visited a forum at least once - 0.69
Visualisation
Show the code
%>% ggplot(., aes(factor(forum_attendance_tw), fill = factor(CFA))) +
assignments geom_bar(position = "dodge2") +
labs(x = "Forum visited or not", fill = "First attempt") + theme_classic() +
scale_fill_discrete(labels=c("Not passed", "Passed")) +
theme(plot.title = element_text(hjust = 0.5))