Bayesian knowledge tracing (BKT)
Ingredients
Here are some key definitions, basically following [Corbett1995].
- knowledge component (KC)
A skill, a rule, or a piece of principle that a student is supposed to learn through the prepared activity.
- lesson
A sequence of activities, indexed by
, whose goal is to teach a KC.The probability that the student knows the KC after step
and before step. Defined as such, this is the prior probability, as opposed to the posterior probability defined below (Eq. (2)). The ideal outcome of the lesson is that the series converges to 1. In such a case, the prior probability and the posterior probability become indistinguishable and define the knowledge state.
Clearly, one must make some estimate of the initial knowledge,
Symbol |
Meaning |
Definition |
---|---|---|
Initial knowing |
The probability that the student already knows prior to lesson. |
|
Transition |
The probability of becoming knowledgeable at a step. |
|
Guess |
The probability of guessing correctly without knowledge. |
|
Slip |
The probability of make a mistaken choice with knowledge. |
Here, all parameters could be assumed to be independent of student, or some or
all parameters can be assumed to be dependent on student. For instance, it is
reasonable that
Inference chain
The following inference chain is what makes it possible to trace the student knowledge:
The core mechanism of this inference chain is the posterior probability that follows from Bayes’ theorem:
Here, evidence refers to either “getting the correct answer at step
Then, the following two equations follow directly from (2)
and the definition of
The probability to make the correct choice is given by
So the posterior probability can be calculated from the prior probability
assuming that we know what the values of
Now, in order to complete the problem, we must specify how to go from the
posterior probability at step
So, now, one can see that Equations (2), (3), (4), and (5) completely specify the Bayesian inference chain for the student knowledge.
Convergence
In the above, it is clear that the ideal outcome
where in the second expression, “evidence” can be either
For the future
How about the perturbation theory near convergence?
Is any other convergence possible? That is, is there a non-trivial fixed point for the mapping
, regardless of the evidence (in some average sense)? In a simple minded mathematical way, the answer is no. But, it seems possible that the evidence can fluctuate up and down and can stay at the same value on average. Such semi-convergence might occur if the student is not really trying, but is randomly choosing answers, out of boredom or fatigue.