This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Theory of self-learning QQ-matrix

Jingchen Liulabel=e1,mark]jcliu@stat.columbia.edu [ Gongjun Xulabel=e2,mark]gongjun@stat.columbia.edu [ Zhiliang Yinglabel=e3,mark]zying@stat.columbia.edu [ Department of Statistics, Columbia University, 1255 Amsterdam Avenue, New York, NY 10027, USA.
(2013; 3 2011; 11 2011)
Abstract

Cognitive assessment is a growing area in psychological and educational measurement, where tests are given to assess mastery/deficiency of attributes or skills. A key issue is the correct identification of attributes associated with items in a test. In this paper, we set up a mathematical framework under which theoretical properties may be discussed. We establish sufficient conditions to ensure that the attributes required by each item are learnable from the data.

classification model,
cognitive assessment,
consistency,
diagnostic,
QQ-matrix,
self-learning,
doi:
10.3150/12-BEJ430
keywords:
volume: 19issue: 5A

, and

1 Introduction

Cognitive diagnosis has recently gained prominence in educational assessment, psychiatric evaluation, and many other disciplines. A key task is the correct specification of item-attribute relationships. A widely used mathematical formulation is the well known QQ-matrix [27]. Under the setting of the QQ-matrix, a typical modeling approach assumes a latent variable structure in which each subject possesses a vector of kk attributes and responds to mm items. The so-called QQ-matrix is an m×km\times k binary matrix establishing the relationship between responses and attributes by indicating the required attributes for each item. The entry in the iith row and jjth column indicates if item ii requires attribute jj (see Example 2.3 for a demonstration of a QQ-matrix). A short list of further developments of cognitive diagnosis models (CDMs) based on the QQ-matrix includes the rule space method [28, 29], the reparameterized unified/fusion model (RUM) [5, 7, 30], the conjunctive (noncompensatory) DINA and NIDA models [12, 26, 4, 31, 3], the compensatory DINO and NIDO models [32, 31], the attribute hierarchy method [13], and clustering methods [1]; see also [11, 33, 23] for more approaches to cognitive diagnosis.

Statistical analysis with CDMs typically assumes a known QQ-matrix provided by experts such as those who developed the questions [20, 10, 19, 25]. Such a priori knowledge when correct is certainly very helpful for both model estimation and eventually identification of subjects’ latent attributes. On the other hand, model fitting is usually sensitive to the choice of QQ-matrix and its misspecification could seriously affect the goodness of fit. This is one of the main sources for lack of fit. Various diagnostic tools and testing procedures have been developed [21, 2, 8, 14, 9]. A comprehensive review of diagnostic classification models can be found in [22].

Despite the importance of the QQ-matrix in cognitive diagnosis, its estimation problem is largely an unexplored area. Unlike typical inference problems, the inference for the QQ-matrix is particularly challenging for the following reasons. First, in many cases, the QQ-matrix is simply nonidentifiable. One typical situation is that multiple QQ-matrices lead to an identical response distribution. Therefore, we only expect to identify the QQ-matrix up to some equivalence relation (Definition 2.2). In other words, two QQ-matrices in the same equivalence class are not distinguishable based on data. Our first task is to define a meaningful and identifiable equivalence class. Second, the QQ-matrix lives on a discrete space – the set of m×km\times k matrices with binary entries. This discrete nature makes analysis particularly difficult because calculus tools are not applicable. In fact, most analyses are combinatorics based. Third, the model makes explicit distributional assumptions on the (unobserved) attributes, dictating the law of observed responses. The dependence of responses on attributes via QQ-matrix is a highly nonlinear discrete function. The nonlinearity also adds to the difficulty of the analysis.

The primary purpose of this paper is to provide theoretical analyses on the learnability of the underlying QQ-matrix. In particular, we obtain definitive answers to the identifiability of QQ-matrix for one of the most commonly used models – the DINA model – by specifying a set of sufficient conditions under which the QQ-matrix is identifiable up to an explicitly defined equivalence class. We also present the corresponding consistent estimators. We believe that the results (especially the intermediate results) and analysis strategies can be extended to other conjunctive models [15, 12, 31, 32, 18].

The rest of this paper is organized as follows. In Section 2, we present the basic inference result for QQ-matrices in a conjunctive model with no slipping or guessing. In addition, we introduce all the necessary terminologies and technical conditions. In Section 3, we extend the results in Section 2 to the DINA model with known slipping and guessing parameters. In Section 4, we further generalize the results to the DINA model with unknown slipping parameters. Further discussion is provided in Section 5. Proofs are given in Section 6. Lastly, the proofs of two key propositions are given in the Appendix.

2 Model specifications and basic results

We start the discussion with a simplified situation, under which the responses depend on the attribute profile deterministically (with no uncertainty). We describe our estimation procedure under this simple scenario. The results for the general cases are given in Sections 3 and 4.

2.1 Basic model specifications

The model specifications consist of the following concepts.

Attributes: subject’s (unobserved) mastery of certain skills. Consider that there are kk attributes. Let 𝐀=(A1,,Ak)\mathbf{A}=(A^{1},\ldots,A^{k})^{\top} be the vector of attributes and Aj{0,1}A^{j}\in\{0,1\} be the indicator of the presence or absence of the jjth attribute.

Responses: subject’s binary responses to items. Consider that there are mm items. Let 𝐑=(R1,,Rm)\mathbf{R}=(R^{1},\ldots,R^{m})^{\top} be the vector of responses and Ri{0,1}R^{i}\in\{0,1\} be the response to the iith item.

Both 𝐀\mathbf{A} and 𝐑\mathbf{R} are subject specific. We assume that the integers mm and kk are known.

QQ-matrix: the link between item responses and attributes. We define an m×km\times k matrix Q=(Qij)m×kQ=(Q_{ij})_{m\times k}. For each ii and jj, Qij=1Q_{ij}=1 when item ii requires attribute jj and 0 otherwise.

Furthermore, we define

ξi=j=1k(Aj)Qij=𝟏(AjQij:j=1,,k),\xi^{i}=\prod_{j=1}^{k}(A^{j})^{Q_{ij}}={\mathbf{1}}(A^{j}\geq Q_{ij}\mathchoice{\nobreak\,\colon}{\nobreak\,\colon}{\nobreak\,\colon\;}{\nobreak\,\colon\;}j=1,\ldots,k), (1)

which indicates whether a subject with attribute 𝐀\mathbf{A} is capable of providing a positive response to item ii. This model is conjunctive, meaning that it is necessary and sufficient to master all the necessary skills to be capable of solving one problem. Possessing additional attributes does not compensate for the absence of necessary attributes. In this section, we consider the simplest situation that there is no uncertainty in the response, that is,

Ri=ξiR^{i}=\xi^{i} (2)

for i=1,,mi=1,\ldots,m. Therefore, the responses are completely determined by the attributes. We assume that all items require at least one attribute. Equivalently, the QQ-matrix does not have zero row vectors. Subjects who do not possess any attribute are not capable of responding positively to any item.

We use subscripts to indicate different subjects. For instance, 𝐑r=(Rr1,,Rrm)\mathbf{R}_{r}=(R^{1}_{r},\ldots,R^{m}_{r})^{\top} is the response vector of subject rr. Similarly, 𝐀r\mathbf{A}_{r} is the attribute vector of subject rr. We observe 𝐑1,,𝐑N\mathbf{R}_{1},\ldots,\mathbf{R}_{N}, where we use NN to denote sample size. The attributes 𝐀r\mathbf{A}_{r} are not observed. Our objective is to make inference on the QQ-matrix based on the observed responses.

2.2 Estimation of QQ-matrix

We first introduce a few quantities for the presentation of an estimator.

TT-matrix

In order to provide an estimator of QQ, we first introduce one central quantity, the TT-matrix, which connects the QQ-matrix with the response and attribute distributions. Matrix T(Q)T(Q) has 2k12^{k}-1 columns each of which corresponds to one nonzero attribute vector, 𝐀{0,1}k{(0,,0)}\mathbf{A}\in\{0,1\}^{k}\setminus\{(0,\ldots,0)\}. Instead of labeling the columns of T(Q)T(Q) by ordinal numbers, we label them by binary vectors of length kk. For instance, the 𝐀\mathbf{A}th column of T(Q)T(Q) is the column that corresponds to attribute 𝐀\mathbf{A}, for all 𝐀(0,,0)\mathbf{A}\neq(0,\ldots,0).

Let IiI_{i} be a generic notation for positive responses to item ii. Let “\wedge” stand for “and” combination. For instance, Ii1Ii2I_{i_{1}}\wedge I_{i_{2}} denotes positive responses to both items i1i_{1} and i2i_{2}. Each row of T(Q)T(Q) corresponds to one item or one “and” combination of items, for instance, Ii1I_{i_{1}}, Ii1Ii2I_{i_{1}}\wedge I_{i_{2}} or Ii1Ii2Ii3I_{i_{1}}\wedge I_{i_{2}}\wedge I_{i_{3}}, …. If T(Q)T(Q) contains all the single items and all “and” combinations, T(Q)T(Q) contains 2m12^{m}-1 rows. We will later say that such a T(Q)T(Q) is saturated (Definition 2.1 in Section 2.4).

We now describe each row vector of T(Q)T(Q). We define that BQ(Ii)B_{Q}(I_{i}) is a 2k12^{k}-1 dimensional row vector. Using the same labeling system as that of the columns of T(Q)T(Q), the 𝐀\mathbf{A}th element of BQ(Ii)B_{Q}(I_{i}) is defined as j=1k(Aj)Qij\prod_{j=1}^{k}(A^{j})^{Q_{ij}}, which indicates if a subject with attribute 𝐀\mathbf{A} is able to solve item ii.

Using a similar notation, we define that

BQ(Ii1Iil)=Υh=1lBQ(Iih),B_{Q}(I_{i_{1}}\wedge\cdots\wedge I_{i_{l}})=\Upsilon_{h=1}^{l}B_{Q}(I_{i_{h}}), (3)

where the operator “Υh=1l\Upsilon_{h=1}^{l}” is element-by-element multiplication from BQ(Ii1)B_{Q}(I_{i_{1}}) to BQ(Iil)B_{Q}(I_{i_{l}}). For instance,

W=Υh=1lVhW=\Upsilon_{h=1}^{l}V_{h}

means that Wj=h=1lVhjW^{j}=\prod_{h=1}^{l}V_{h}^{j}, where W=(W1,,W2k1)W=(W^{1},\ldots,W^{2^{k}-1}) and Vh=(Vh1,,Vh2k1)V_{h}=(V_{h}^{1},\ldots,V_{h}^{2^{k}-1}). Therefore, BQ(Ii1Iil)B_{Q}(I_{i_{1}}\wedge\cdots\wedge I_{i_{l}}) is the vector indicating the attributes that are capable of responding positively to items i1,,ili_{1},\ldots,i_{l}. The row in T(Q)T(Q) corresponding to Ii1IilI_{i_{1}}\wedge\cdots\wedge I_{i_{l}} is BQ(Ii1Iil)B_{Q}(I_{i_{1}}\wedge\cdots\wedge I_{i_{l}}).

α\alpha-vector

We let α\alpha be a column vector the length of which equals to the number of rows of T(Q)T(Q). Each element of α\alpha corresponds to one row vector of T(Q)T(Q). The element in α\alpha corresponding to Ii1IilI_{i_{1}}\wedge\cdots\wedge I_{i_{l}} is defined as NIi1Iil/NN_{I_{i_{1}}\wedge\cdots\wedge I_{i_{l}}}/N, where NIi1IilN_{I_{i_{1}}\wedge\cdots\wedge I_{i_{l}}} denotes the number of people who have positive responses to items i1,,ili_{1},\ldots,i_{l}, that is

NIi1Iil=r=1NI(Rrij=1:j=1,,l).N_{I_{i_{1}}\wedge\cdots\wedge I_{i_{l}}}=\sum_{r=1}^{N}I(R_{r}^{i_{j}}=1\mathchoice{\nobreak\,\colon}{\nobreak\,\colon}{\nobreak\,\colon\;}{\nobreak\,\colon\;}j=1,\ldots,l).

For each 𝐀{0,1}k\mathbf{A}\in\{0,1\}^{k}, we let

p^𝐀=1Nr=1NI(𝐀r=𝐀).\hat{p}_{\mathbf{A}}=\frac{1}{N}\sum_{r=1}^{N}I(\mathbf{A}_{r}=\mathbf{A}). (4)

If (2) is strictly respected, then

T(Q)𝐩^=α,T(Q)\hat{\mathbf{p}}=\alpha, (5)

where 𝐩^=(p^𝐀:𝐀{0,1}k{(0,,0)})\hat{\mathbf{p}}=(\hat{p}_{\mathbf{A}}\mathchoice{\nobreak\,\colon}{\nobreak\,\colon}{\nobreak\,\colon\;}{\nobreak\,\colon\;}\mathbf{A}\in\{0,1\}^{k}\setminus\{(0,\ldots,0)\}) is arranged in the same order as the columns of T(Q)T(Q). This is because each row of T(Q)T(Q) indicates the attribute profiles corresponding to subjects capable of responding positively to that set of item(s). Vector 𝐩^\hat{\mathbf{p}} contains the proportions of subjects with each attribute profile. For each set of items, matrix multiplication sums up the proportions corresponding to each attribute profile capable of responding positively to that set of items, giving us the total proportion of subjects who respond positively to the corresponding items.

The estimator of the QQ-matrix

For each m×km\times k binary matrix QQ^{\prime}, we define

S(Q)=inf𝐩[0,1]2k1|T(Q)𝐩α|,S(Q^{\prime})=\inf_{\mathbf{p}\in[0,1]^{2^{k}-1}}|T(Q^{\prime})\mathbf{p}-\alpha|, (6)

where 𝐩=(p𝐀:𝐀(0,,0))\mathbf{p}=(p_{\mathbf{A}}\mathchoice{\nobreak\,\colon}{\nobreak\,\colon}{\nobreak\,\colon\;}{\nobreak\,\colon\;}\mathbf{A}\neq(0,\ldots,0)). The above minimization is subject to the constraint that 𝐀(0,,0)p𝐀[0,1]\sum_{\mathbf{A}\neq\mathbf{(}0,\ldots,0)}p_{\mathbf{A}}\in[0,1]. |||\cdot| is the Euclidean distance. An estimator of QQ can be obtained by minimizing S(Q)S(Q^{\prime}),

Q^=arginfQS(Q),\hat{Q}=\arg\inf_{Q^{\prime}}S(Q^{\prime}), (7)

where “arginf\arg\inf” is the minimizer of the minimization problem over all m×km\times k binary matrices. Note that the minimizers are not unique. We will later prove that the minimizers are in the same meaningful equivalence class. Because of (5), the true QQ-matrix is always among the minimizers because S(Q)=0S(Q)=0.

2.3 Example

We illustrate the above construction by one simple example. We emphasize that this example is discussed to explain the estimation procedure for a concrete and simple example. The proposed estimator is certainly able to handle much larger QQ-matrices. We consider the following 3×23\times 2 QQ-matrix,

2+3105×201(2+3)×211Q=\,\begin{tabular}[]{@{}lll@{}}\hline\cr&Addition&Multiplication\\ \hline\cr$2+3$&$1$&$0$\\ $5\times 2$&$0$&$1$\\ $(2+3)\times 2$&$1$&$1$\\ \hline\cr\end{tabular}
Q= AdditionMultiplication (8)

There are two attributes and three items. We consider the contingency table of attributes,

Multiplication
Addition p^00\hat{p}_{00} p^01\hat{p}_{01}
p^10\hat{p}_{10} p^11\hat{p}_{11}

In the above table, p^00\hat{p}_{00} is the proportional of people who do not master either addition or multiplication. Similarly, we define p^01\hat{p}_{01}, p^10\hat{p}_{10} and p^11\hat{p}_{11}. {p^ij;j=0,1}\{\hat{p}_{ij};j=0,1\} is not observed.

Just for illustration, we construct a simple nonsaturated TT-matrix. Suppose the relationship in (2) is strictly respected. Then, we should be able to establish the following identities:

N(p^10+p^11)=NI1,N(p^01+p^11)=NI2,Np^11=NI3.N(\hat{p}_{10}+\hat{p}_{11})=N_{I_{1}},\qquad N(\hat{p}_{01}+\hat{p}_{11})=N_{I_{2}},\qquad N\hat{p}_{11}=N_{I_{3}}. (9)

Therefore, if we let 𝐩^=(p^10,p^01,p^11)\hat{\mathbf{p}}=(\hat{p}_{10},\hat{p}_{01},\hat{p}_{11}), the above display imposes three linear constraints on the vector 𝐩^\hat{\mathbf{p}}. Together with the natural constraint that ijp^ij=1\sum_{ij}\hat{p}_{ij}=1, 𝐩^\hat{\mathbf{p}} solves linear equation,

T(Q)𝐩^=α,T(Q)\hat{\mathbf{p}}=\alpha, (10)

subject to the constraints that 𝐩^[0,1]3\hat{\mathbf{p}}\in[0,1]^{3} and p^10+p^01+p^11[0,1]\hat{p}_{10}+\hat{p}_{01}+\hat{p}_{11}\in[0,1], where

T(Q)=(101011001),α=(NI1/NNI2/NNI3/N).T(Q)=\pmatrix{1&0&1\cr 0&1&1\cr 0&0&1},\qquad\alpha=\pmatrix{N_{I_{1}}/N\cr N_{I_{2}}/N\cr N_{I_{3}}/N}. (11)

Each column of T(Q)T(Q) corresponds to one attribute profile. The first column corresponds to 𝐀=(1,0)\mathbf{A}=(1,0), the second column to 𝐀=(0,1)\mathbf{A}=(0,1), and the third column to 𝐀=(1,1)\mathbf{A}=(1,1). The first row corresponds to item 2+32+3, the second row to 5×25\times 2 and the last row to (2+3)×2(2+3)\times 2. For this particular situation, T(Q)T(Q) has full rank and there exists one unique solution to (10). In fact, we would not expect the constrained solution to the linear equation in (10) to always exist unless (2) is strictly followed. This is the topic of the next section.

The identities in (9) only consider the marginal rate of each question. There are additional constraints if one considers “combinations” among items. For instance,

Np^11=NI1I2.N\hat{p}_{11}=N_{I_{1}\wedge I_{2}}.

People who are able to solve problem 3 must have both attributes and therefore are able to solve both problems 1 and 2. Again, if (2) is not strictly followed, this is not necessarily respected in the real data, though it is a logical conclusion. The DINA in the next section handles such a case. Upon considering I1I_{1}, I2I_{2}, I3I_{3} and I1I2I_{1}\wedge I_{2}, the new TT-matrix is

T(Q)=(101011001001),α=(NI1/NNI2/NNI3/NNI1I2/N).T(Q)=\pmatrix{1&0&1\cr 0&1&1\cr 0&0&1\cr 0&0&1},\qquad\alpha=\pmatrix{N_{I_{1}}/N\cr N_{I_{2}}/N\cr N_{I_{3}}/N\cr N_{I_{1}\wedge I_{2}}/N}. (12)

The last row is added corresponding to I1I2I_{1}\wedge I_{2}. With (2) in force, we have

S(Q)=inf𝐩[0,1]3|T(Q)𝐩α|=|T(Q)𝐩^α|=0,S(Q)=\inf_{\mathbf{p}\in[0,1]^{3}}|T(Q)\mathbf{p}-\alpha|=|T(Q)\hat{\mathbf{p}}-\alpha|=0, (13)

if QQ is the true matrix.

2.4 Basic results

Before stating the main result, we provide a list of notations, which will be used in the discussions.

  • Linear space spanned by vectors V1,,VlV_{1},\ldots,V_{l}:

    (V1,,Vl)={j=1lajVj:aj}.\mathcal{L}(V_{1},\ldots,V_{l})=\Biggl{\{}\sum_{j=1}^{l}a_{j}V_{j}\mathchoice{\nobreak\,\colon}{\nobreak\,\colon}{\nobreak\,\colon\;}{\nobreak\,\colon\;}a_{j}\in\mathbb{R}\Biggr{\}}.
  • For a matrix MM, M1:lM_{1:l} denotes the submatrix containing the first ll rows and all columns of MM.

  • Vector eie_{i} denotes a column vector such that the iith element is one and the rest are zero. When there is no ambiguity, we omit the length index of eie_{i}.

  • Matrix l\mathcal{I}_{l} denotes the l×ll\times l identity matrix.

  • For a matrix MM, C(M)C(M) is the linear space generated by the column vectors of MM. It is usually called the column space of MM.

  • CMC_{M} denotes the set of column vectors of MM.

  • RMR_{M} denotes the set of row vectors of MM.

  • Vector 𝟎\mathbf{0} denotes the zero vector, (0,,0)(0,\ldots,0). When there is no ambiguity, we omit the index of length.

  • Scalar p𝐀p_{\mathbf{A}} denotes the probability that a subject has attribute profile 𝐀\mathbf{A}. For instance, p10p_{10} is the probability that a subject has attribute one but not attribute two.

  • Define a 2k12^{k}-1 dimensional vector

    𝐩=(p𝐀:𝐀{0,1}k{𝟎}).\mathbf{p}=(p_{\mathbf{A}}\mathchoice{\nobreak\,\colon}{\nobreak\,\colon}{\nobreak\,\colon\;}{\nobreak\,\colon\;}\mathbf{A}\in\{0,1\}^{k}\setminus\{\mathbf{0}\}).
  • Let cc and gg be two mm dimensional vectors. We write cgc\succ g if ci>gic_{i}>g_{i} for all 1im1\leq i\leq m.

  • We write cgc\ncong g if cigic_{i}\neq g_{i} for all i=1,,mi=1,\ldots,m.

  • Matrix QQ denotes the true matrix and QQ^{\prime} denotes a generic m×km\times k binary matrix.

The following definitions will be used in subsequent discussions.

Definition 2.1.

We say that T(Q)T(Q) is saturated if all combinations of form Ii1IilI_{i_{1}}\wedge\cdots\wedge I_{i_{l}}, for l=1,,ml=1,\ldots,m, are included in T(Q)T(Q).

Definition 2.2.

We write QQQ\sim Q^{\prime} if and only if QQ and QQ^{\prime} have identical column vectors, which could be arranged in different orders; otherwise, we write QQQ\nsim Q^{\prime}.

Remark 2.1.

It is not hard to show that “\sim” is an equivalence relation. QQQ\sim Q^{\prime} if and only if they are identical after an appropriate permutation of the columns. Each column of QQ is interpreted as an attribute. Permuting the columns of QQ is equivalent to relabeling the attributes. For QQQ\sim Q^{\prime}, we are not able to distinguish QQ from QQ^{\prime} based on data.

Definition 2.3.

A QQ-matrix is said to be complete if {ei:i=1,,k}RQ\{e_{i}\mathchoice{\nobreak\,\colon}{\nobreak\,\colon}{\nobreak\,\colon\;}{\nobreak\,\colon\;}i=1,\ldots,k\}\subset R_{Q} (RQR_{Q} is the set of row vectors of QQ); otherwise, we say that QQ is incomplete.

A QQ-matrix is complete if and only if for each attribute there exists an item only requiring that attribute. Completeness implies that mkm\geq k. We will show that completeness is among the sufficient conditions to identify QQ.

Remark 2.2.

One of the main objectives of cognitive assessment is to identify the subjects’ attributes; see [22] for other applications. It has been established in [1] that the completeness of the QQ-matrix is a sufficient and necessary condition for a set of items to consistently identify attributes if (2) is strictly followed. Thus, it is usually recommended to use a complete QQ-matrix. For a precise formulation, see [1].

Listed below are assumptions which will be used in subsequent development.

  1. [C3]

  2. C1

    QQ is complete.

  3. C2

    T(Q)T(Q) is saturated.

  4. C3

    𝐀1,,𝐀N\mathbf{A}_{1},\ldots,\mathbf{A}_{N} are i.i.d. random vectors following distribution

    P(𝐀r=𝐀)=p𝐀.P(\mathbf{A}_{r}=\mathbf{A})=p^{*}_{\mathbf{A}}.

We further let 𝐩=(p𝐀:𝐀{0,1}{𝟎})\mathbf{p}^{*}=(p^{*}_{\mathbf{A}}\mathchoice{\nobreak\,\colon}{\nobreak\,\colon}{\nobreak\,\colon\;}{\nobreak\,\colon\;}\mathbf{A}\in\{0,1\}\setminus\{\mathbf{0}\}).

  1. [C5]

  2. C4

    (p𝟎,𝐩)𝟎(p_{\mathbf{0}}^{*},\mathbf{p}^{*})\succ\mathbf{0}.

  3. C5

    Each attribute has been required by at least two items.

With these preparations, we are ready to introduce the first theorem, the proof of which is given in Section 6.

Theorem 2.4.

Assume that conditions C1C5 are in force. Suppose that for subject rr the response corresponding to item ii follows

Rri=ξri=j=1k(Arj)Qij.R^{i}_{r}=\xi^{i}_{r}=\prod_{j=1}^{k}(A^{j}_{r})^{Q_{ij}}.

Let Q^\hat{Q}, defined in (7), be a minimizer of S(Q)S(Q^{\prime}) among all m×km\times k binary matrices, where S(Q)S(Q^{\prime}) is defined in (6). Then,

limNP(Q^Q)=1.\lim_{N\rightarrow\infty}P(\hat{Q}\sim Q)=1. (14)

Further, let

𝐩~=arginf𝐩|T(Q^)𝐩α|2.\tilde{\mathbf{p}}=\arg\inf_{\mathbf{p}}|T(\hat{Q})\mathbf{p}-\alpha|^{2}. (15)

With an appropriate rearrangement of the columns of Q^\hat{Q}, for any ε>0\varepsilon>0

limNP(|𝐩~𝐩|ε)=1.\lim_{N\rightarrow\infty}P(|\tilde{\mathbf{p}}-\mathbf{p}^{*}|\leq\varepsilon)=1.
Remark 2.3.

If Q1Q2Q_{1}\sim Q_{2}, the two matrices only differ by a column permutation and will be considered to be the “same”. Therefore, we expect to identify the equivalence class that QQ belongs to. Also, note that S(Q1)=S(Q2)S(Q_{1})=S(Q_{2}) if Q1Q2Q_{1}\sim Q_{2}.

Remark 2.4.

In order to obtain the consistency of Q^\hat{Q} (subject to a column permutation), it is necessary to have 𝐩\mathbf{p}^{*} not living on some sub-manifold. To see a counter example, suppose that P(𝐀r=(1,,1))=p11=1P(\mathbf{A}_{r}=(1,\ldots,1)^{\top})=p^{*}_{1\ldots 1}=1. Then, for all QQ, P(𝐑r=(1,,1))=1P(\mathbf{R}_{r}=(1,\ldots,1)^{\top})=1, that is, all subjects are able to solve all problems. Therefore, the distribution of 𝐑\mathbf{R} is independent of QQ. In other words, the QQ-matrix is not identifiable. More generally, if there exit AriA_{r}^{i} and ArjA_{r}^{j} such that P(Ari=Arj)=1P(A_{r}^{i}=A_{r}^{j})=1, then the QQ-matrix is not identifiable based on the data. This is because one cannot tell if an item requires attribute ii alone, attribute jj alone, or both; see [16, 17] for similar cases for the multidimensional IRT models.

Remark 2.5.

Note that the estimator of the attribute distribution, 𝐩~\tilde{\mathbf{p}}, in (15) depends on the order of columns of Q^\hat{Q}. In order to achieve consistency, we will need to arrange the columns of Q^\hat{Q} such that Q^=Q\hat{Q}=Q whenever Q^Q\hat{Q}\sim Q.

Remark 2.6.

One practical issue associated with the proposed procedure is the computation. For a specific QQ, the computation of S(Q)S(Q) only involves a constraint minimization of a quadratic function. However, if mm or kk is large, the computation overhead of searching the minimizer of S(Q)S(Q) over the space of m×km\times k matrices could be substantial. One practical solution is to break the QQ-matrix into smaller sub-matrices. For instance, one may divide the mm items in to ll groups (possibly with nonempty overlap across different groups). Then apply the proposed estimator to each of the ll group of items. This is equivalent to breaking a big mm by kk QQ-matrix into several smaller matrices and estimating each of them separately. Lastly, combine the ll estimated sub-matrices together to form a single estimate. The consistency results can be applied to each of the ll sub-matrices and therefore the combined matrix is also a consistent estimator. A similar technique has been discussed in Chapter 8.6 of [29].

Remark 2.7.

Conditions C1 and C2 are imposed to guarantee consistency. They may not be always necessary. Furthermore, constructing a saturated TT-matrix is sometimes computationally not feasible, especially when the number of items is large. In practice, one may include the combinations of one item, two items, …, jj items. The choice of jj depends on the sample size and the computation resources. The condition C5 is required for technical purposes. Nonetheless, one can in fact construct counterexamples, that is, the QQ-matrix is not identifiable up to the relationship “\sim”, if C5 is violated.

3 DINA model with known slipping and guessing parameters

3.1 Model specification

In this section, we extend the inference results in the previous section to the situation under which the responses do not deterministically depend on the attributes. In particular, we consider the DINA (Deterministic Input, Noisy Output “AND” gate) model [12]. We would like to introduce two parameters: the slipping parameter (sis_{i}) and the guessing parameter (gi)(g_{i}). Here 1si1-s_{i} (gig_{i}) is the probability of a subject’s responding positively to item ii given that s/he is capable of solving that problem. To simplify the notations, we denote 1si1-s_{i} by cic_{i}. An extension of (2) to include slipping and guessing specifies the response probabilities as

P(Ri=1|ξi)=ciξigi1ξi,P(R^{i}=1|\xi^{i})=c_{i}^{\xi^{i}}g_{i}^{1-\xi^{i}}, (16)

where ξi\xi^{i} is the capability indicator defined in (1). In addition, conditional on {ξ1,,ξm}\{\xi^{1},\ldots,\xi^{m}\}, {R1,,Rm}\{R^{1},\ldots,R^{m}\} are jointly independent.

In this context, the TT-matrix needs to be modified accordingly. Throughout this section, we assume that both cic_{i}’s and gig_{i}’s are known. We discuss the case that cic_{i}’s are unknown in the next section.

We first consider the case that gi=0g_{i}=0 for all i=1,,mi=1,\ldots,m. We introduce a diagonal matrix DcD_{c}. If the hhth row of matrix Tc(Q)T_{c}(Q) corresponds to Ii1IilI_{i_{1}}\wedge\cdots\wedge I_{i_{l}}, then the hhth diagonal element of DcD_{c} is ci1××cilc_{i_{1}}\times\cdots\times c_{i_{l}}. Then, we let

Tc(Q)=DcT(Q),T_{c}(Q)=D_{c}T(Q), (17)

where T(Q)T(Q) is the binary matrix defined previously. In other words, we multiply each row of T(Q)T(Q) by a common factor and obtain Tc(Q)T_{c}(Q). Note that in absence of slipping (ci=1c_{i}=1 for each ii) we have that Tc(Q)=T(Q)T_{c}(Q)=T(Q).

There is another equivalent way of constructing Tc(Q)T_{c}(Q). We define

Bc,Q(Ij)=cjBQ(Ij)B_{c,Q}(I_{j})=c_{j}B_{Q}(I_{j})

and

Bc,Q(Ii1Iil)=Υh=1lBc,Q(Iih),B_{c,Q}(I_{i_{1}}\wedge\cdots\wedge I_{i_{l}})=\Upsilon_{h=1}^{l}B_{c,Q}(I_{i_{h}}), (18)

where “Υ\Upsilon” refers to element by element multiplication. Let the row vector in Tc(Q)T_{c}(Q) corresponding to Ii1IilI_{i_{1}}\wedge\cdots\wedge I_{i_{l}} be Bc,Q(Ii1Iil)B_{c,Q}(I_{i_{1}}\wedge\cdots\wedge I_{i_{l}}).

For instance, with c=(c1,c2,c3)c=(c_{1},c_{2},c_{3}), the Tc(Q)T_{c}(Q) corresponding to the TT-matrix in (12) would be

Tc(Q)=(c10c10c2c200c300c1c2).T_{c}(Q)=\pmatrix{c_{1}&0&c_{1}\cr 0&c_{2}&c_{2}\cr 0&0&c_{3}\cr 0&0&c_{1}c_{2}}. (19)

Lastly, we consider the situation that both the probability of making a mistake and the probability of guessing correctly could be strictly positive. By this, we mean that the probability that a subject responds positively to item ii is cic_{i} if s/he is capable of doing so; otherwise the probability is gig_{i}. We create a corresponding Tc,g(Q)T_{c,g}(Q) by slightly modifying Tc(Q)T_{c}(Q). We define row vector

𝔼=(1,,1).\mathbb{E}=(1,\ldots,1).

When there is no ambiguity, we omit the length index of 𝔼\mathbb{E}. Now, let

Bc,g,Q(Ii)=gi𝔼+(cigi)BQ(Ii)B_{c,g,Q}(I_{i})=g_{i}\mathbb{E}+(c_{i}-g_{i})B_{Q}(I_{i})

and

Bc,g,Q(Ii1Iil)=Υh=1lBc,g,Q(Iih).B_{c,g,Q}(I_{i_{1}}\wedge\cdots\wedge I_{i_{l}})=\Upsilon_{h=1}^{l}B_{c,g,Q}(I_{i_{h}}). (20)

Let the row vector in Tc,g(Q)T_{c,g}(Q) corresponding to Ii1IilI_{i_{1}}\wedge\cdots\wedge I_{i_{l}} be Bc,g,Q(Ii1Iil)B_{c,g,Q}(I_{i_{1}}\wedge\cdots\wedge I_{i_{l}}). For instance, the matrix Tc,gT_{c,g} corresponding to the Tc(Q)T_{c}(Q) in (19) is

Tc,g(Q)=(c1g1c1g2c2c2g3g3c3c1g2g1c2c1c2).T_{c,g}(Q)=\pmatrix{c_{1}&g_{1}&c_{1}\cr g_{2}&c_{2}&c_{2}\cr g_{3}&g_{3}&c_{3}\cr c_{1}g_{2}&g_{1}c_{2}&c_{1}c_{2}}. (21)

3.2 Estimation of the QQ-matrix and consistency results

Having concluded our preparations, we are now ready to introduce our estimators for QQ. Given cc and gg, we define

Sc,g(Q)=inf𝐩[0,1]2k1|Tc,g(Q)𝐩+p𝟎𝐠α|,S_{c,g}(Q)=\inf_{\mathbf{p}^{\prime}\in[0,1]^{2^{k}-1}}|T_{c,g}(Q)\mathbf{p}^{\prime}+p_{\mathbf{0}}^{\prime}\mathbf{g}-\alpha|, (22)

where 𝐩=(p𝐀:𝐀{0,1}k{𝟎})\mathbf{p}^{\prime}=(p^{\prime}_{\mathbf{A}}\mathchoice{\nobreak\,\colon}{\nobreak\,\colon}{\nobreak\,\colon\;}{\nobreak\,\colon\;}\mathbf{A}\in\{0,1\}^{k}\setminus\{\mathbf{0}\}), p𝟎=p00p^{\prime}_{\mathbf{0}}=p^{\prime}_{0\ldots 0} and

𝐠=(g1gkg1g2gk1gkg1g2g3)I1IkI1I2Ik1IkI1I2I3\mathbf{g}=\pmatrix{g_{1}\cr\vdots\cr g_{k}\cr g_{1}g_{2}\cr\vdots\cr g_{k-1}g_{k}\cr g_{1}g_{2}g_{3}\cr\vdots}\begin{array}[]{l}I_{1}\\ \vdots\\ I_{k}\\ I_{1}\wedge I_{2}\\ \vdots\\ I_{k-1}\wedge I_{k}\\ I_{1}\wedge I_{2}\wedge I_{3}\\ \vdots\end{array} (23)

The labels to the right of the vector indicate the corresponding row vectors in Tc,g(Q)T_{c,g}(Q). The minimization in (22) is subject to constraints that

p𝐀[0,1]and𝐀p𝐀=1.p^{\prime}_{\mathbf{A}}\in[0,1]\quad\mbox{and}\quad\sum_{\mathbf{A}}p_{\mathbf{A}}^{\prime}=1.

The vector 𝐠\mathbf{g} contains the probabilities of providing positive responses to items simply by guessing. We propose an estimator of the QQ-matrix through a minimization problem, that is,

Q^(c,g)=arginfQSc,g(Q).\hat{Q}(c,g)=\arg\inf_{Q^{\prime}}S_{c,g}(Q^{\prime}). (24)

We write cc and gg in the argument to emphasize that the estimator depends on cc and gg. The computation of the minimization in (22) consists of minimizing a quadratic function subject to finitely many linear constraints. Therefore, it can be done efficiently.

Theorem 3.1.

Suppose that cc and gg are known and that conditions C1C5 are in force. For subject rr, the responses are generated independently such that

P(Rri=1|ξri)=ciξrigi1ξri,P(R_{r}^{i}=1|\xi^{i}_{r})=c_{i}^{\xi^{i}_{r}}g_{i}^{1-\xi^{i}_{r}}, (25)

where ξri\xi^{i}_{r} is defined as in Theorem 2.4. Let Q^(c,g)\hat{Q}(c,g) be defined as in (24). If cigic_{i}\neq g_{i} for all ii and Tcg(Q)𝐩T_{c-g}(Q)\mbox{$\mathbf{p}$}^{*} does not have zero elements, then

limNP(Q^(c,g)Q)=1.\lim_{N\rightarrow\infty}P\bigl{(}\hat{Q}(c,g)\sim Q\bigr{)}=1.

Furthermore, let

𝐩~(c,g)=arginf𝐩|Tc,g(Q^(c,g))𝐩+p𝟎𝐠α|2,\tilde{\mathbf{p}}(c,g)=\arg\inf_{\mathbf{p}}|T_{c,g}(\hat{Q}(c,g))\mathbf{p}+p_{\mathbf{0}}\mathbf{g}-\alpha|^{2},

subject to constraint that 𝐀p𝐀=1\sum_{\mathbf{A}}p_{\mathbf{A}}=1. Then, with an appropriate rearrangement of the columns of Q^\hat{Q}, for any ε>0\varepsilon>0,

limNP(|𝐩~(c,g)𝐩|ε)=1.\lim_{N\rightarrow\infty}P\bigl{(}|\tilde{\mathbf{p}}(c,g)-\mathbf{p}^{*}|\leq\varepsilon\bigr{)}=1.
Remark 3.1.

There are various metrics one can employ to measure the distance between the vectors Tc,g(Q^(c,g))𝐩+p𝟎𝐠T_{c,g}(\hat{Q}(c,g))\mathbf{p}+p_{\mathbf{0}}\mathbf{g} and α\alpha. In fact, any metric that generates the same topology as the Euclidian metric is sufficient to obtain the consistency results in the theorem. For instance, a principled choice of objective function would be the likelihood with 𝐩\mathbf{p} profiled out. The reason we prefer the Euclidian metric (versus, for instance, the full likelihood) is that the evaluation of S(Q)S(Q) is easier than the evaluation based on other metrics. More specifically, the computation of current S(Q)S(Q) consists of quadratic programming types of well oiled optimization techniques.

4 Extension to the situation with unknown slipping probabilities

In this section, we further extend our results to the situation where the slipping probabilities are unknown and the guessing probabilities are known. In the context of standard exams, the guessing probabilities can typically be set to zero for open problems. For instance, the chance of guessing the correct answer to (3+2)×2=``(3+2)\times 2=\,?” is very small. On the other hand, for multiple choice problems, the guessing probabilities cannot be ignored. In that case, gig_{i} can be considered as 1/n1/n when there are nn choices; see Remark 4.2 for more description.

4.1 Estimator of cc

We provide two estimators of cc given QQ and gg. One is applicable to all QQ-matrices, but computationally intensive. The other is computationally easy, but requires certain structures of QQ. Then, we combine them into a single estimator.

A general estimator

We first provide an estimator of cc that is applicable to all QQ-matrices. Considering that the estimator of QQ minimizes the objective function Sc,g(Q)S_{c,g}(Q), we propose the following estimator of cc:

c~(Q,g)=arginfc[0,1]mSc,g(Q).\tilde{c}(Q,g)=\arg\inf_{c\in[0,1]^{m}}S_{c,g}(Q). (26)

A moment estimator

The computation of c~(Q,g)\tilde{c}(Q,g) is typically intensive. When the QQ-matrix has a certain structure, we are able to estimate cc consistently based on estimating equations.

For a particular item ii, suppose that there exist items i1,,il{i_{1}},\ldots,{i_{l}} (different from ii) such that

BQ(IiIi1Iil)=BQ(Ii1Iil),B_{Q}(I_{i}\wedge I_{i_{1}}\wedge\cdots\wedge I_{i_{l}})=B_{Q}(I_{i_{1}}\wedge\cdots\wedge I_{i_{l}}), (27)

that is, the attributes required by item ii are a subset of the attributes required by i1,,il{i_{1}},\ldots,{i_{l}}.

Let cg=(c1g1,,cmgm){c-g}=(c_{1}-g_{1},\ldots,c_{m}-g_{m}) and

T~c,g(Q)=(𝐠Tc,g(Q)1𝔼).\tilde{T}_{c,g}(Q)=\pmatrix{\mathbf{g}&T_{c,g}(Q)\cr 1&\mathbb{E}}.

We borrow a result which will be given in the proof of Proposition 6.6 (Section 6.1) to say that there exists a matrix DD (only depending on gg) such that

DT~c,g(Q)=(𝟎,Tcg(Q)).D\tilde{T}_{c,g}(Q)=(\mathbf{0},T_{{c-g}}(Q)).

Let aga_{g} and aga_{*g} be the row vectors in DD corresponding to Ii1IiiI_{i_{1}}\wedge\cdots\wedge I_{i_{i}} and IiIi1IiiI_{i}\wedge I_{i_{1}}\wedge\cdots\wedge I_{i_{i}} (in Tcg(Q)T_{{c-g}}(Q)).

Then,

ag(α1)ag(α1)\displaystyle\frac{a_{\ast g}^{\top}{\alpha\choose 1}}{a_{g}^{\top}{\alpha\choose 1}} =\displaystyle= agT~c,g(Q)(p𝟎𝐩)agT~c,g(Q)(p𝟎𝐩)+op(1)\displaystyle\frac{a_{\ast g}^{\top}\tilde{T}_{c,g}(Q){p_{\mathbf{0}}^{*}\choose\mathbf{p}^{*}}}{a_{g}^{\top}\tilde{T}_{c,g}(Q){p_{\mathbf{0}}^{*}\choose\mathbf{p}^{*}}}+\mathrm{o}_{p}(1)
=\displaystyle= Bcg,Q(IiIi1Iil)𝐩Bcg,Q(Ii1Iil)𝐩+op(1)p(cigi),\displaystyle\frac{B_{{c-g},Q}(I_{i}\wedge I_{i_{1}}\wedge\cdots\wedge I_{i_{l}})\mathbf{p}^{*}}{B_{{c-g},Q}(I_{i_{1}}\wedge\cdots\wedge I_{i_{l}})\mathbf{p}^{*}}+\mathrm{o}_{p}(1)\mathop{\rightarrow}^{p}(c_{i}-g_{i}),

where the vectors aga_{g} and aga_{*g} only depend on gg.

Therefore, the corresponding estimator of cic_{i} would be

c¯i(Q,g)=gi+ag(α1)ag(α1).\bar{c}_{i}(Q,g)=g_{i}+\frac{a^{\top}_{*g}{\alpha\choose 1}}{a^{\top}_{g}{\alpha\choose 1}}. (29)

Note that the computation of c¯i(Q,g)\bar{c}_{i}(Q,g) only consists of affine transformations and therefore is very fast.

Proposition 4.1.

Suppose conditions C3, (25) and (27) are true. Then c¯ici\bar{c}_{i}\rightarrow c_{i} in probability as NN\rightarrow\infty.

Proof.

By the law of large numbers,

ag(α1)agT~c,g(Q)(p𝟎𝐩)0,ag(α1)agT~c,g(Q)(p𝟎𝐩)0,a_{\ast g}^{\top}\pmatrix{\alpha\cr 1}-a_{\ast g}^{\top}\tilde{T}_{c,g}(Q)\pmatrix{p_{\mathbf{0}}^{*}\vskip 2.0pt\cr\mathbf{p}^{*}}\rightarrow 0,\qquad a_{g}^{\top}\pmatrix{\alpha\cr 1}-a_{g}^{\top}\tilde{T}_{c,g}(Q)\pmatrix{p_{\mathbf{0}}^{*}\vskip 2.0pt\cr\mathbf{p}^{*}}\rightarrow 0,

in probability as NN\rightarrow\infty. By the construction of aga_{*g} and aga_{g}, we have

agT~c,g(Q)(p𝟎𝐩)\displaystyle a_{\ast g}^{\top}\tilde{T}_{c,g}(Q)\pmatrix{p_{\mathbf{0}}^{*}\vskip 2.0pt\cr\mathbf{p}^{*}} =\displaystyle= Bcg,Q(IiIi1Iil)𝐩,\displaystyle B_{{c-g},Q}(I_{i}\wedge I_{i_{1}}\wedge\cdots\wedge I_{i_{l}})\mathbf{p}^{*},
agT~c,g(Q)(p𝟎𝐩)\displaystyle a_{g}^{\top}\tilde{T}_{c,g}(Q)\pmatrix{p_{\mathbf{0}}^{*}\vskip 2.0pt\cr\mathbf{p}^{*}} =\displaystyle= Bcg,Q(Ii1Iil)𝐩.\displaystyle B_{{c-g},Q}(I_{i_{1}}\wedge\cdots\wedge I_{i_{l}})\mathbf{p}^{*}.

Thanks to (27), we have

ag(α1)ag(α1)cigi.\frac{a_{\ast g}^{\top}{\alpha\choose 1}}{a_{g}^{\top}{\alpha\choose 1}}\rightarrow c_{i}-g_{i}.
\upqed

Combined estimator

Lastly, we combine c¯i\bar{c}_{i} and c~i\tilde{c}_{i}. For each QQ, we write c=(c,c)c=(c^{*},c^{**}). For each cic_{i} in the sub-vector cc^{*}, (27) holds. Let c¯(Q,g)\bar{c}^{*}(Q,g) be defined in (29) (element by element). For cc^{**}, we let c~(Q,g)=arginfcS(c¯(Q,g),c),g(Q)\tilde{c}^{**}(Q,g)=\arg\inf_{c^{**}}S_{(\bar{c}^{*}(Q,g),c^{**}),g}(Q). Finally, let c^(Q,g)=(c¯(Q,g),c~(Q,g))\hat{c}(Q,g)=(\bar{c}^{*}(Q,g),\tilde{c}^{**}(Q,g)). Furthermore, each element of c^(Q,g)\hat{c}(Q,g) greater than one is set to be one and each element less than zero is set to be zero. Equivalently, we impose the constraint that c^(Q,g)[0,1]m\hat{c}(Q,g)\in[0,1]^{m}.

4.2 Consistency result

Theorem 4.2.

Suppose that gg is known and the conditions in Theorem 3.1 hold. Let

Q^c^(g)=arginfQSc^(Q,g),g(Q),𝐩~c^(g)=arginf𝐩|Tc^(Q^,g),g(Q^c^(g))𝐩+p𝟎𝐠α|.\hat{Q}_{\hat{c}}(g)=\arg\inf_{Q^{\prime}}S_{\hat{c}(Q^{\prime},g),g}(Q^{\prime}),\qquad\tilde{\mathbf{p}}_{\hat{c}}(g)=\arg\inf_{\mathbf{p}}\bigl{|}T_{\hat{c}(\hat{Q},g),g}(\hat{Q}_{\hat{c}}(g))\mathbf{p}+p_{\mathbf{0}}\mathbf{g}-\alpha\bigr{|}.

The second optimization is subject to constraint that 𝐀p𝐀=1\sum_{\mathbf{A}}p_{\mathbf{A}}=1. Then,

limNP(Q^c^(g)Q)=1.\lim_{N\rightarrow\infty}P\bigl{(}\hat{Q}_{\hat{c}}(g)\sim Q\bigr{)}=1.

Furthermore, if the estimator c~(Q,g)\tilde{c}(Q,g), defined in (26), is consistent, then by appropriately rearranging the columns of Q^c^(g)\hat{Q}_{\hat{c}}(g), for any ε>0\varepsilon>0,

limNP(|𝐩~c^(g)𝐩|ε)=1.\lim_{N\rightarrow\infty}P\bigl{(}|\tilde{\mathbf{p}}_{\hat{c}}(g)-\mathbf{p}^{*}|\leq\varepsilon\bigr{)}=1.
Remark 4.1.

The consistency of Q^c^(g)\hat{Q}_{\hat{c}}(g) does not rely on the consistency of c~(Q,g)\tilde{c}(Q,g), which is mainly because of the central intermediate result in Proposition 6.6. The consistency of c~(Q,g)\tilde{c}(Q,g) is a necessary condition for the consistency of 𝐩~c^(g)\tilde{\mathbf{p}}_{\hat{c}}(g).

For most usual situations, (𝐩,c)(\mathbf{p}^{*},c) is estimable based on the data given a correctly specified QQ-matrix. Nonetheless, there are some rare occasions in which nonidentifiability does exist. We provide one example, explained at the intuitive level, to illustrate that it is not always possible to consistently estimate cc and 𝐩\mathbf{p}^{*}. This example is simply to justify that the existence of the consistent estimator for cc in the above theorem is not an empty assumption. Consider a complete matrix Q=kQ=\mathcal{I}_{k}. The degrees of freedom of a kk-way binary table is 2k12^{k}-1. On the other hand, the dimension of parameters (𝐩,c)(\mathbf{p}^{*},c) is 2k1+k2^{k}-1+k. Therefore, 𝐩\mathbf{p}^{*} and cc cannot be consistently identified without additional information. This problem is typically tackled by introducing addition parametric assumptions such as 𝐩\mathbf{p}^{*} satisfying certain functional form or in the Bayesian setting (weakly) informative prior distributions [6]. Given that the emphasis of this paper is the inference of QQ-matrix, we do not further investigate the identifiability of (𝐩,c)(\mathbf{p}^{*},c). Nonetheless, estimation for (𝐩,c)(\mathbf{p}^{*},c) is definitely an important issue.

Remark 4.2.

Assuming that the guessing probability gig_{i} being known is somewhat strong. For complicated situations, such as for multiple choice problems the incorrect choices do not look “equally incorrect”, the guessing probability is typically not 1/n1/n. In Theorem 4.2, we make this assumption mostly for technical reasons.

One can certainly provide the same treatment to the unknown guessing probabilities just as to the slipping probabilities by plugging in a consistent estimator of gig_{i} or profiling it out (like c~\tilde{c}). However, the rigorous establishment of the consistency results is certainly much more difficult and additional technical conditions may be needed. We leave the analysis of the problem with unknown guessing probability to the future study.

5 Discussion

This paper provides basic theoretical results of the estimation of QQ-matrix, a key element in modern cognitive diagnosis. Under the conjunctive model assumption, sufficient conditions are developed for the QQ-matrix to be identifiable up to an equivalence relation and the corresponding consistent estimators are constructed. The equivalence relation defines a natural partition of the space of QQ-matrices and may be viewed as the finest “resolution” that is possibly distinguishable based on the data, unless there is additional information about the specific meaning of each attribute. Our results provide the first steps for statistical inference about QQ-matrices by explicitly specifying the conditions under which two QQ-matrices lead to different response distributions. We believe that these results, especially the intermediate results in Section 6, can also be applied to general conjunctive models.

There are several directions along which further exploration may be pursued. First, some conditions may be modified to reflect practical circumstance. For instance, if the population is not fully diversified, meaning that certain attribute profiles may never exist, then condition C4 cannot be expected to hold. To ensure identifiability, we will need to impose certain structures on the QQ-matrix. In the addition-multiplication example of Section 2.3, if individuals capable of multiplication are also capable of addition, then we may need to impose the natural constraint that every item that requires multiplication should also require addition, which also implies that the QQ-matrix is never complete.

Second, when an a priori “expert’s” knowledge of the QQ-matrix is available, we may wish to incorporate such information into the estimation. This could be in the form of an additive penalty function attached to the objective function SS. Such information, if correct, not only improves estimation accuracy but also reduces the computational complexity – one can just perform a minimization of S(Q)S(Q) in a neighborhood around the expert’s QQ-matrix.

Third, throughout this paper we assume that the number of attributes (dimension) is known. In practice, it would be desirable to develop a data driven way to estimate the dimension, not only to deal with the situation of unknown dimension, but also to check if the assumed dimension is correct. One possible way to tackle the problem is to introduce a penalty function similar to that of BIC [24] which would give a consistent estimator of the QQ-matrix even if the dimension is unknown.

Fourth, one issue of both theoretical and practical importance is the inference of the parameters additional to the QQ-matrix, such as the slipping (s=1cs=1-c), guessing (gg) parameters and the attribute distribution 𝐩\mathbf{p}^{*}. In the current paper, given that the main interesting parameter is the QQ-matrix, the estimations of 𝐩\mathbf{p}^{*} and cc are treated as by-product of the main results. On the other hand, given a known QQ, the identifiability and estimation of these parameters are important topics. In the previous discussion, we provided a few examples for potential identifiability issues. Further careful investigation is definitely of great importance and challenges.

Fifth, the rate of convergence of the estimator Q^\hat{Q} is not only of theoretical importance. From a practical point of few, it is crucial to study the rate of convergence as the scale of the problem becomes large in terms of the number of attributes and number of items.

Lastly, the optimization of S(Q)S(Q) over the space of m×km\times k binary matrices is a nontrivial problem. It consists of evaluating the function SS 2m×k2^{m\times k} times. This is a substantial computational load if mm and kk are reasonably large. As mentioned previously, this computation might be reduced by additional information about the QQ-matrix or splitting the QQ-matrix into small sub-matrices. Nevertheless, it would be highly desirable to explore the structures of the QQ-matrix and the function SS so as to compute Q^\hat{Q} more efficiently.

6 Proofs of the theorems

6.1 Several propositions and lemmas

To make the discussion smooth, we postpone several long proofs to the Appendix.

Proposition 6.1.

Suppose that QQ is complete and matrix T(Q)T(Q) is saturated. Then, we are able to arrange the columns and rows of QQ and T(Q)T(Q) such that T(Q)1:(2k1)T(Q)_{1:(2^{k}-1)} has full rank and T(Q)T(Q) has full column rank.

Proof.

Provided that QQ is complete, without loss of generality we assume that the iith row vector of QQ is eie_{i}^{\top} for i=1,,ki=1,\ldots,k, that is, item ii only requires attribute ii for each i=1,,ki=1,\ldots,k. Let the first 2k12^{k}-1 rows of T(Q)T(Q) be associated with {I1,,Ik}\{I_{1},\ldots,I_{k}\}. In particular, we let the first kk rows correspond to I1,,IkI_{1},\ldots,I_{k} and the first kk columns of T(Q)T(Q) correspond to 𝐀\mathbf{A}’s that only have one attribute. We further arrange the next C2kC^{k}_{2} rows of T(Q)T(Q) to correspond to combinations of two items, IiIjI_{i}\wedge I_{j}, iji\neq j. The next C2kC^{k}_{2} columns of T(Q)T(Q) correspond to 𝐀\mathbf{A}’s that only have two positive attributes. Similarly, we arrange T(Q)T(Q) for combinations of three, four, and up to kk items. Therefore, the first 2k12^{k}-1 rows of T(Q)T(Q) admit a block upper triangle form. In addition, we are able to further arrange the columns within each block such that the diagonal matrices are identities, so that T(Q)T(Q) has form

I1,I2,I1I2,I1I3,I1I2I3,(k0C2k00C3k).\begin{array}[]{c}I_{1},I_{2},\ldots\\ I_{1}\wedge I_{2},I_{1}\wedge I_{3},\ldots\\ I_{1}\wedge I_{2}\wedge I_{3},\ldots\\ \vdots\end{array}\pmatrix{\mathcal{I}_{k}&\ast&\ast&\ast&\cdots&\cr 0&\mathcal{I}_{C_{2}^{k}}&\ast&\ast&&\cr 0&0&\mathcal{I}_{C_{3}^{k}}&\ast&&\cr\vdots&\vdots&\vdots&&&}. (30)

Note that T(Q)T(Q) has 2k12^{k}-1 columns and T(Q)1:(2k1)T(Q)_{1:(2^{k}-1)} obviously has full rank, therefore T(Q)T(Q) has full column rank. ∎

From now on, we assume that Q1:k=kQ_{1:k}=\mathcal{I}_{k} and the first 2k12^{k}-1 rows of T(Q)T(Q) are arranged in the order as in (30).

Proposition 6.2.

Suppose that QQ is complete, T(Q)T(Q) is saturated, and c𝟎c\ncong\mathbf{0}. Then, Tc(Q)T_{c}(Q) and Tc(Q)1:(2k1)T_{c}(Q)_{1:(2^{k}-1)} have full column rank.

Proof.

By Proposition 6.1, (17) and the fact that DcD_{c} is a diagonal matrix of full rank as long as c𝟎c\ncong\mathbf{0},

Tc(Q)=DcT(Q),T_{c}(Q)=D_{c}T(Q),

is of full column rank. ∎

The following two propositions, which compare the column spaces of Tc(Q)T_{c}(Q) and Tc(Q)T_{c}(Q^{\prime}), are central to the proof of all the theorems. Their proofs are delayed to the Appendix.

The first proposition discusses the case where Q1:kQ^{\prime}_{1:k} is complete. We can always rearrange the columns of QQ^{\prime} so that Q1:k=Q1:kQ_{1:k}=Q^{\prime}_{1:k}. In addition, according to the proof of Proposition 6.1, the last column vector of Tc(Q)T_{c}(Q) corresponds to attribute 𝐀=(1,,1)\mathbf{A}=(1,\ldots,1)^{\top}. Therefore, this column vector is all of nonzero entries.

Proposition 6.3.

Assume that QQ is a complete matrix and T(Q)T(Q) is saturated. Without loss of generality, let Q1:k=kQ_{1:k}=\mathcal{I}_{k}. Assume that the first kk rows of QQ^{\prime} form a complete matrix. Further, assume that Q1:k=Q1:k=kQ_{1:k}=Q^{\prime}_{1:k}=\mathcal{I}_{k}. If QQQ^{\prime}\neq Q and c𝟎c\ncong\mathbf{0}, under the conditions in Theorem 4.2, Tc(Q)𝐩T_{c}(Q)\mbox{$\mathbf{p}$}^{*} is not in the column space C(Tc(Q))C(T_{c^{\prime}}(Q^{\prime})) for all cmc^{\prime}\in\mathbb{R}^{m}.

The next proposition discusses the case where Q1:kQ^{\prime}_{1:k} is incomplete.

Proposition 6.4.

Assume that QQ is a complete matrix and T(Q)T(Q) is saturated. Without loss of generality, let Q1:k=kQ_{1:k}=\mathcal{I}_{k}. If c𝟎c\ncong\mathbf{0} and Q1:kQ^{\prime}_{1:k} is incomplete, under the conditions in Theorem 4.2, Tc(Q)𝐩T_{c}(Q)\mbox{$\mathbf{p}$}^{*} is not in the column space C(Tc(Q))C(T_{c^{\prime}}(Q^{\prime})) for all cRmc^{\prime}\in R^{m}.

The next result is a direct corollary of these two propositions. It follows by setting ci=1c_{i}=1 and gi=0g_{i}=0 for all i=1,,mi=1,\ldots,m.

Corollary 6.5.

If QQQ\nsim Q^{\prime}, under the conditions of Theorem 4.2, Tc(Q)𝐩T_{c}(Q)\mbox{$\mathbf{p}$}^{*} is not in the column space C(Tc(Q))C(T_{c^{\prime}}(Q^{\prime})) for all c[0,1]mc^{\prime}\in[0,1]^{m}.

To obtain a similar proposition for the cases where the gig_{i}’s are nonzero, we will need to expand the Tc,g(Q)T_{c,g}(Q) as follows. As previously defined, let

T~c,g(Q)=(𝐠Tc,g(Q)1𝔼).\tilde{T}_{c,g}(Q)=\pmatrix{\mathbf{g}&T_{c,g}(Q)\cr 1&\mathbb{E}}. (31)

The last row of T~c,g(Q)\tilde{T}_{c,g}(Q) consists entirely of ones. Vector 𝐠\mathbf{g} is defined as in (23).

Proposition 6.6.

Suppose that QQ is a complete matrix, QQQ^{\prime}\nsim Q, TT is saturated and cgc\ncong g. Let 𝐩0=(p𝟎,(𝐩))\mbox{$\mathbf{p}$}^{*}_{0}=(p^{*}_{\mathbf{0}},(\mbox{$\mathbf{p}$}^{*})^{\top})^{\top}. Under the conditions of Theorem 4.2, T~c,g(Q)𝐩0\tilde{T}_{c,g}(Q)\mbox{$\mathbf{p}$}^{*}_{0} is not in the column space C(T~c,g(Q))C(\tilde{T}_{c^{\prime},g}(Q^{\prime})) for all c[0,1]mc^{\prime}\in[0,1]^{m}. In addition, T~c,g(Q)\tilde{T}_{c,g}(Q) is of full column rank.

To prove Proposition 6.6, we will need the following lemma.

Lemma 6.7.

Consider two matrices T1T_{1} and T2T_{2} of the same dimension. If T1𝐩C(T2)T_{1}\mbox{$\mathbf{p}$}\in C(T_{2}), then for any matrix DD of appropriate dimension for multiplication, we have

DT1𝐩C(DT2).DT_{1}\mbox{$\mathbf{p}$}\in C(DT_{2}).

Conversely, if for some DD, DT1𝐩DT_{1}\mbox{$\mathbf{p}$} does not belong to C(DT2)C(DT_{2}), then T1𝐩T_{1}\mbox{$\mathbf{p}$} does not belong to C(T2)C(T_{2}).

Proof.

Note that DTiDT_{i} is just a linear row transform of TiT_{i} for i=1,2i=1,2. The conclusion is immediate by basic linear algebra. ∎

{pf*}

Proof of Proposition 6.6 Thanks to Lemma 6.7, we only need to find a matrix DD such that DT~c,g(Q)𝐩0D\tilde{T}_{c,g}(Q)\mbox{$\mathbf{p}$}^{*}_{0} does not belong to the column space of DT~c,g(Q)D\tilde{T}_{c^{\prime},g}(Q^{\prime}) for all c[0,1]mc^{\prime}\in[0,1]^{m}.

We define

cg\displaystyle{c-g} =\displaystyle= (c1g1,,cmgm),\displaystyle(c_{1}-g_{1},\ldots,c_{m}-g_{m}),
cg\displaystyle c^{\prime}-{g} =\displaystyle= (c1g1,,cmgm).\displaystyle(c_{1}^{\prime}-g_{1},\ldots,c_{m}^{\prime}-g_{m}).

We claim that there exists a matrix DD such that

DT~c,g(Q)=(0,Tcg(Q))D\tilde{T}_{c,g}(Q)=(0,T_{{c-g}}(Q))

and

DT~c,g(Q)=(0,Tcg(Q)),D\tilde{T}_{c^{\prime},g}(Q^{\prime})=(0,T_{c^{\prime}-{g}}(Q^{\prime})),

where the choice of DD does not depends on cc or cc^{\prime}. In the rest of the proof, we construct such a DD-matrix for T~c,g(Q)\tilde{T}_{c,g}(Q). The verification for T~c,g(Q)\tilde{T}_{c^{\prime},g}(Q^{\prime}) is completely analogous. Note that each row in DT~c,g(Q)D\tilde{T}_{c,g}(Q) is just a linear combination of rows of T~c,g(Q)\tilde{T}_{c,g}(Q). Therefore, it suffices to show that every row vector of the form

(0,Bcg,Q(Ii1Iil))\bigl{(}0,B_{{c-g},Q}(I_{i_{1}}\wedge\cdots\wedge I_{i_{l}})\bigr{)}

can be written as a linear combination of the row vectors of T~c,g(Q)\tilde{T}_{c,g}(Q). We prove this by induction. First note that for each 1im1\leq i\leq m,

(0,Bcg,Q(Ii))=(cigi)(0,BQ(Ii))=(gi,Bc,g,Q(Ii))gi𝔼.(0,B_{{c-g},Q}(I_{i}))=(c_{i}-g_{i})(0,B_{Q}(I_{i}))=(g_{i},B_{c,g,Q}(I_{i}))-g_{i}\mathbb{E}. (32)

Suppose that all rows of the form

(0,Bcg,Q(Ii1Iil))\bigl{(}0,B_{{c-g},Q}(I_{i_{1}}\wedge\cdots\wedge I_{i_{l}})\bigr{)}

for all 1lj1\leq l\leq j can be written as linear combinations of the row vectors of T~c,g(Q)\tilde{T}_{c,g}(Q) with coefficients only depending on g1,,gmg_{1},\ldots,g_{m}. Thanks to (32), the case of j=1j=1 holds. Suppose the statement holds for some general jj. We consider the case of j+1j+1. By definition,

(gi1gij+1,Bc,g,Q(Ii1Iij+1))\displaystyle\bigl{(}g_{i_{1}}\ldots g_{i_{j+1}},B_{c,g,Q}(I_{i_{1}}\wedge\cdots\wedge I_{i_{j+1}})\bigr{)} =\displaystyle= Υh=1j+1(gih,Bc,g,Q(Iih))\displaystyle\Upsilon_{h=1}^{j+1}(g_{i_{h}},B_{c,g,Q}(I_{i_{h}}))
=\displaystyle= Υh=1j+1(gih𝔼+(0,Bcg,Q(Iih))).\displaystyle\Upsilon_{h=1}^{j+1}\bigl{(}g_{i_{h}}\mathbb{E}+(0,B_{{c-g},Q}(I_{i_{h}}))\bigr{)}.

Let “\ast” denote element-by-element multiplication. For every generic vector VV^{\prime} of appropriate length,

𝔼V=V.\mathbb{E}\ast V^{\prime}=V^{\prime}.

We expand the right-hand side of (6.1). The last term would be

(0,Bcg,Q(Ii1Iij+1))=Υh=1j+1(0,Bcg,Q(Iih)).\bigl{(}0,B_{{c-g},Q}(I_{i_{1}}\wedge\cdots\wedge I_{i_{j+1}})\bigr{)}=\Upsilon_{h=1}^{j+1}(0,B_{{c-g},Q}(I_{i_{h}})).

From the induction assumption and definition (18), the other terms on both sides of (6.1) belong to the row space of T~c,g(Q)\tilde{T}_{c,g}(Q). Therefore, (0,Bcg,Q(Ii1Iij+1))(0,B_{{c-g},Q}(I_{i_{1}}\wedge\cdots\wedge I_{i_{j+1}})) is also in the row space of T~c,g(Q)\tilde{T}_{c,g}(Q). In addition, all the corresponding coefficients only consist of gig_{i}. Therefore, one can construct a (2m1)×2m(2^{m}-1)\times 2^{m} matrix DD such that

DT~c,g(Q)=(0,Tcg(Q)).D\tilde{T}_{c,g}(Q)=(0,T_{{c-g}}(Q)).

Because DD is free of cc and QQ, we have

DT~c,g(Q)=(0,Tcg(Q)).D\tilde{T}_{c^{\prime},g}(Q^{\prime})=(0,T_{c^{\prime}-g}(Q^{\prime})).

In addition, thanks to Propositions 6.3 and 6.4, DT~c,g(Q)𝐩0=Tcg(Q)𝐩D\tilde{T}_{c,g}(Q)\mbox{$\mathbf{p}$}^{*}_{0}=T_{c-g}(Q)\mbox{$\mathbf{p}$}^{*} is not in the column space C(Tcg(Q))=C(DT~c,g(Q))C(T_{c^{\prime}-g}(Q^{\prime}))=C(D\tilde{T}_{c^{\prime},g}(Q^{\prime})) for all c[0,1]mc^{\prime}\in[0,1]^{m}. Therefore, by Lemma 6.7, T~c,g(Q)𝐩0\tilde{T}_{c,g}(Q)\mbox{$\mathbf{p}$}^{*}_{0} is not in the column space C(T~c,g(Q))C(\tilde{T}_{c^{\prime},g}(Q^{\prime})) for all c[0,1]mc^{\prime}\in[0,1]^{m}.

In addition,

(De2m)T~c,g(Q)\pmatrix{D\cr e_{2^{m}}^{\top}}\tilde{T}_{c,g}(Q)

is of full column rank, where e2me^{\top}_{2^{m}} is a 2m2^{m} dimension row vector with last element being one and rest being zero. Therefore, T~c,g(Q)\tilde{T}_{c,g}(Q) is also of full column rank.

6.2 Proof of the theorems

Using the results of the previous propositions and lemmas, we now proceed to prove our theorems.

{pf*}

Proof of Theorem 2.4 Consider QQQ^{\prime}\nsim Q and T()T(\cdot) saturated. Recall that 𝐩^\hat{\mathbf{p}} is the vector containing p^𝐀\hat{p}_{\mathbf{A}}’s with 𝐀𝟎\mathbf{A}\ncong\mathbf{0}, where

p^𝐀=1Nr=1N𝟏(𝐀r=𝐀).\hat{p}_{\mathbf{A}}=\frac{1}{N}\sum_{r=1}^{N}\mathbf{1}(\mathbf{A}_{r}=\mathbf{A}).

For any 𝐩𝟎\mathbf{p}^{*}\succ\mathbf{0}, since 𝐩^𝐩\hat{\mathbf{p}}\rightarrow\mathbf{p}^{*} almost surely, according to Corollary 6.5, α=T(Q)𝐩^\alpha=T(Q)\hat{\mathbf{p}} by (5), and T(Q)𝐩C(T(Q))T(Q)\mathbf{p}^{*}\notin C(T(Q^{\prime})), there exists δ>0\delta>0 such that,

limNP(inf𝐩[0,1]2k1|T(Q)𝐩α|>δ)=1\lim_{N\rightarrow\infty}P\Bigl{(}\inf_{\mathbf{p}\in[0,1]^{2^{k}-1}}|T(Q^{\prime})\mathbf{p}-\alpha|>\delta\Bigr{)}=1

and

P(inf𝐩[0,1]2k1|T(Q)𝐩α|=0)=1.P\Bigl{(}\inf_{\mathbf{p}\in[0,1]^{2^{k}-1}}|T(Q)\mathbf{p}-\alpha|=0\Bigr{)}=1.

Given that there are finitely many m×km\times k binary matrices, P(Q^Q)1P(\hat{Q}\sim Q)\rightarrow 1 as NN\rightarrow\infty. In fact, we can arrange the columns of Q^\hat{Q} such that P(Q^=Q)1P(\hat{Q}=Q)\rightarrow 1 as NN\rightarrow\infty.

Note that 𝐩^\hat{\mathbf{p}} satisfies the identity

T(Q)𝐩^=α.T(Q)\hat{\mathbf{p}}=\alpha.

In addition, since T(Q)T(Q) is of full rank (Proposition 6.1), the solution to the above linear equation is unique. Therefore, the solution to the optimization problem inf𝐩|T(Q)𝐩α|\inf_{\mathbf{p}}|T(Q)\mathbf{p}-\alpha| is unique and is 𝐩^\hat{\mathbf{p}}. Notice that when Q^=Q\hat{Q}=Q, 𝐩~=arginf𝐩|T(Q^)𝐩α|=𝐩^\tilde{\mathbf{p}}=\arg\inf_{\mathbf{p}}|T(\hat{Q})\mathbf{p}-\alpha|=\hat{\mathbf{p}}. Therefore,

limNP(𝐩~=𝐩^)=1.\lim_{N\rightarrow\infty}P(\tilde{\mathbf{p}}=\hat{\mathbf{p}})=1.

Together with the consistency of 𝐩^\hat{\mathbf{p}}, the conclusion of the theorem follows immediately.

{pf*}

Proof of Theorem 3.1 Note that for all QQ^{\prime}

Tc,g(Q)𝐩+p𝟎𝐠α=(𝐠,Tc,g(Q))(p𝟎𝐩)α.T_{c,g}(Q^{\prime})\mathbf{p}+p_{\mathbf{0}}\mathbf{g}-\alpha=(\mathbf{g},T_{c,g}(Q^{\prime}))\pmatrix{p_{\mathbf{0}}\cr\mathbf{p}}-\alpha.

By the law of large numbers,

|Tc,g(Q)𝐩+p𝟎𝐠α|=|(𝐠,Tc,g(Q))(p𝟎𝐩)α|0|T_{c,g}(Q)\mathbf{p}^{\ast}+p_{\mathbf{0}}^{\ast}\mathbf{g}-\alpha|=\biggl{|}(\mathbf{g},T_{c,g}(Q))\pmatrix{p_{\mathbf{0}}^{\ast}\vskip 2.0pt\cr\mathbf{p}^{\ast}}-\alpha\biggr{|}\rightarrow 0

almost surely as NN\rightarrow\infty. Therefore,

Sc,g(Q)0S_{c,g}(Q)\rightarrow 0

almost surely as NN\rightarrow\infty.

For any QQQ^{\prime}\nsim Q, note that

(α1)T~c,g(Q)(p𝟎𝐩).\pmatrix{\alpha\cr 1}\rightarrow\tilde{T}_{c,g}(Q)\pmatrix{p_{\mathbf{0}}^{\ast}\vskip 2.0pt\cr\mathbf{p}^{\ast}}.

According to Proposition 6.6 and the fact that 𝐩𝟎\mathbf{p}^{\ast}\succ\mathbf{0}, there exists δ(c)>0\delta(c^{\prime})>0 such that δ(c)\delta(c^{\prime}) is continuous in cc^{\prime} and

inf𝐩,p𝟎|T~c,g(Q)(p𝟎𝐩)T~c,g(Q)(p𝟎𝐩)|>δ(c).\inf_{\mathbf{p},p_{\mathbf{0}}}\biggl{|}\tilde{T}_{c^{\prime},g}(Q^{\prime})\pmatrix{p_{\mathbf{0}}\cr\mathbf{p}}-\tilde{T}_{c,g}(Q)\pmatrix{p_{\mathbf{0}}^{\ast}\vskip 2.0pt\cr\mathbf{p}^{\ast}}\biggr{|}>\delta(c^{\prime}).

By elementary calculus,

δinfc[0,1]mδ(c)>0\delta\triangleq\inf_{c^{\prime}\in[0,1]^{m}}\delta(c^{\prime})>0

and

infc,𝐩,p𝟎|T~c,g(Q)(p𝟎𝐩)T~c,g(Q)(p𝟎𝐩)|>δ.\inf_{c^{\prime},\mathbf{p},p_{\mathbf{0}}}\biggl{|}\tilde{T}_{c^{\prime},g}(Q^{\prime})\pmatrix{p_{\mathbf{0}}\cr\mathbf{p}}-\tilde{T}_{c,g}(Q)\pmatrix{p_{\mathbf{0}}^{\ast}\vskip 2.0pt\cr\mathbf{p}^{\ast}}\biggr{|}>\delta.

Therefore,

P(infc,𝐩,p𝟎|T~c,g(Q)(p𝟎𝐩)(α1)|>δ/2)1,P\biggl{(}\inf_{c^{\prime},\mathbf{p},p_{\mathbf{0}}}\biggl{|}\tilde{T}_{c^{\prime},g}(Q^{\prime})\pmatrix{p_{\mathbf{0}}\cr\mathbf{p}}-\pmatrix{\alpha\cr 1}\biggr{|}>\delta/2\biggr{)}\rightarrow 1,

as NN\rightarrow\infty. For the same δ\delta, we have

P(infc,𝐩,p𝟎|(𝐠,Tc,g(Q))(p𝟎𝐩)α|>δ/2)=P(infcSc,g(Q)>δ/2)1.P\biggl{(}\inf_{c^{\prime},\mathbf{p},p_{\mathbf{0}}}\biggl{|}(\mathbf{g},T_{c^{\prime},g}(Q^{\prime}))\pmatrix{p_{\mathbf{0}}\cr\mathbf{p}}-\alpha\biggr{|}>\delta/2\biggr{)}=P\Bigl{(}\inf_{c^{\prime}}S_{c^{\prime},g}(Q^{\prime})>\delta/2\Bigr{)}\rightarrow 1.

The above minimization on the left of the equation is subject to the constraint that

𝐀{0,1}kp𝐀=1.\sum_{\mathbf{A}\in\{0,1\}^{k}}p_{\mathbf{A}}=1.

Together with the fact that there are only finitely many m×km\times k binary matrices, we have

P(Q^(c,g)Q)=1.P\bigl{(}\hat{Q}(c,g)\sim Q\bigr{)}=1.

We arrange the columns of Q^(c,g)\hat{Q}(c,g) so that P(Q^(c,g)=Q)1P(\hat{Q}(c,g)=Q)\rightarrow 1 as NN\rightarrow\infty.

Now we proceed to the proof of consistency for 𝐩~(c,g)\tilde{\mathbf{p}}(c,g). Note that

|T~c,g(Q^(c,g))(p~𝟎(c,g)𝐩~(c,g))(α1)|\displaystyle\biggl{|}\tilde{T}_{c,g}(\hat{Q}(c,g))\pmatrix{\tilde{p}_{\mathbf{0}}(c,g)\cr\tilde{\mathbf{p}}(c,g)}-\pmatrix{\alpha\cr 1}\biggr{|} p\displaystyle\mathop{\rightarrow}\limits^{p} 0,\displaystyle 0,
|T~c,g(Q)(p𝟎𝐩)(α1)|\displaystyle\biggl{|}\tilde{T}_{c,g}(Q)\pmatrix{p_{\mathbf{0}}^{\ast}\vskip 2.0pt\cr\mathbf{p}^{\ast}}-\pmatrix{\alpha\cr 1}\biggr{|} p\displaystyle\mathop{\rightarrow}\limits^{p} 0.\displaystyle 0.

Since T~c,g(Q)\tilde{T}_{c,g}(Q) is a full column rank matrix and P(Q^(c,g)=Q)1P(\hat{Q}(c,g)=Q)\rightarrow 1, 𝐩~(c,g)𝐩\tilde{\mathbf{p}}(c,g)\rightarrow\mathbf{p}^{\ast} in probability.

{pf*}

Proof of Theorem 4.2 Assuming gg is known, note that

infp𝟎,𝐩|T~c,g(Q)(p𝟎𝐩)(α1)|\inf_{p_{\mathbf{0}},\mathbf{p}}\biggl{|}\tilde{T}_{c,g}(Q)\pmatrix{p_{\mathbf{0}}\cr\mathbf{p}}-\pmatrix{\alpha\cr 1}\biggr{|}

is a continuous function of cc. According to the results of Proposition 4.1, the definition in (26), and the definition of c^\hat{c} in Section 4.1, we obtain that

infp𝟎,𝐩|T~c^(Q,g),g(Q)(p𝟎𝐩)(α1)|0,\inf_{p_{\mathbf{0}},\mathbf{p}}\biggl{|}\tilde{T}_{\hat{c}(Q,g),g}(Q)\pmatrix{p_{\mathbf{0}}\cr\mathbf{p}}-\pmatrix{\alpha\cr 1}\biggr{|}\rightarrow 0,

in probability as NN\rightarrow\infty. In addition, thanks to Proposition 6.6 and with a similar argument as in the proof of Theorem 3.1, Q^c^(g)\hat{Q}_{\hat{c}}(g) is a consistent estimator.

Furthermore, if c~(Q,g)\tilde{c}(Q,g) is a consistent estimator, then c^(Q,g)\hat{c}(Q,g) is also consistent. Then, the consistency of 𝐩~c^(g)\tilde{\mathbf{p}}_{\hat{c}}(g) follows from the facts that Q^c^(g)\hat{Q}_{\hat{c}}(g) is consistent and T~c^,g(Q)\tilde{T}_{\hat{c},g}(Q) is of full column rank.

Appendix: Technical proofs

{pf*}

Proof of Proposition 6.3 Note that Q1:k=Q1:k=kQ_{1:k}=Q_{1:k}^{\prime}=\mathcal{I}_{k}. Let T()T(\cdot) be arranged as in (30). Then, T(Q)1:(2k1)=T(Q)1:(2k1)T(Q)_{1:(2^{k}-1)}=T(Q^{\prime})_{1:(2^{k}-1)}. Given that QQQ\neq Q^{\prime}, we have T(Q)T(Q)T(Q)\neq T(Q^{\prime}). We assume that T(Q)liT(Q)liT(Q)_{li}\neq T(Q^{\prime})_{li}, where T(Q)liT(Q)_{li} is the entry in the llth row and iith column. Since T(Q)1:(2k1)=T(Q)1:(2k1)T(Q)_{1:(2^{k}-1)}=T(Q^{\prime})_{1:(2^{k}-1)}, it is necessary that l2kl\geq 2^{k}.

Suppose that the llth row of the T(Q)T(Q^{\prime}) corresponds to an item that requires attributes i1,,ili_{1},\ldots,i_{l^{\prime}}. Then, we consider 1h2k11\leq h\leq 2^{k}-1, such that the hhth row of T(Q)T(Q^{\prime}) is BQ(Ii1Iil)B_{Q^{\prime}}(I_{i_{1}}\wedge\cdots\wedge I_{i_{l^{\prime}}}). Then, the hhth row vector and the llth row vector of T(Q)T(Q^{\prime}) are identical.

Since T(Q)1:(2k1)=T(Q)1:(2k1)T(Q)_{1:(2^{k}-1)}=T(Q^{\prime})_{1:(2^{k}-1)}, we have T(Q)hj=T(Q)hj=T(Q)ljT(Q)_{hj}=T(Q^{\prime})_{hj}=T(Q^{\prime})_{lj} for j=1,,2k1j=1,\ldots,\allowbreak 2^{k}-1. If T(Q)li=0T(Q)_{li}=0 and T(Q)li=1T(Q^{\prime})_{li}=1, the matrices T(Q)T(Q) and T(Q)T(Q^{\prime}) look like

column i\displaystyle\begin{array}[]{ccccccccccccccccccc}&&&&&&&&&&&&&&&&&&\begin{array}[]{ccccc}&&&&\mbox{column }i\\ &&&&\downarrow\end{array}\end{array}
T(Q)\displaystyle T(Q^{\prime}) =\displaystyle= row hrow l(1)\displaystyle\begin{array}[]{c}\\ \\ \mbox{row }h\rightarrow\\ \\ \\[3.0pt] \mbox{row }l\rightarrow\end{array}\left(\begin{array}[]{c@{\quad}c@{\quad}c@{\quad}c@{\quad}c}\mathcal{I}&\ast&\ldots&\ast&\ldots\\ \vdots&\vdots&&\ldots&\ldots\\ \vdots&\vdots&\mathcal{I}&\ldots&\ldots\\ \vdots&\vdots&\vdots&&\\ \ast&1&\ast&&\\ \ast&\ast&\ast&&\end{array}\right)

and

column i\displaystyle\begin{array}[]{ccccccccccccccccccc}&&&&&&&&&&&&&&&&&&\begin{array}[]{ccccc}&&&&\mbox{column }i\\ &&&&\downarrow\end{array}\end{array}
T(Q)\displaystyle T(Q) =\displaystyle= row hrow l(0).\displaystyle\begin{array}[]{c}\\ \\ \mbox{row }h\rightarrow\\ \\ \\[3.0pt] \mbox{row }l\rightarrow\\ \end{array}\left(\begin{array}[]{c@{\quad}c@{\quad}c@{\quad}c@{\quad}c}\mathcal{I}&\ast&\ldots&\ast&\ldots\\ \vdots&\vdots&&\ldots&\ldots\\ \vdots&\vdots&\mathcal{I}&\ldots&\ldots\\ \vdots&\vdots&\vdots&&\\ \ast&0&\ast&&\\ \ast&\ast&\ast&&\end{array}\right).

C

  • ase 2]

  • Case 1.

    Either the hhth or llth row vector of Tc(Q)T_{c^{\prime}}(Q^{\prime}) is a zero vector. The conclusion is immediate because all the entries of Tc(Q)𝐩T_{c}(Q)\mbox{$\mathbf{p}$}^{*} are nonzero.

  • Case 2.

    The hhth and llth row vectors of Tc(Q)T_{c^{\prime}}(Q^{\prime}) are nonzero vectors. Suppose that the llth row corresponds to an item. There are three different situations: according to the true QQ-matrix (a) the item in row ll requires strictly more attributes than row hh, (b) the item in row ll requires strictly fewer attributes than row hh, (c) otherwise. We consider these three situations, respectively.

    1. [(b)]

    2. (a)

      Under the true QQ-matrix, there are two types of sub-populations in consideration: people who are able to answer item(s) in row hh (p1p_{1}) only and people who are able to answer items in both row hh and row ll (p2p_{2}). Then, the sub-matrix of Tc(Q)T_{c}(Q) and Tc(Q)T_{c^{\prime}}(Q) are like

          Tc(Q)T_{c}(Q)
      p1p_{1} p2p_{2}
      row hh chc_{h} chc_{h}
      row ll 0 clc_{l}
          Tc(Q)T_{c^{\prime}}(Q^{\prime})
      p1p_{1} p2p_{2}
      row hh chc_{h}^{\prime} chc_{h}^{\prime}
      row ll clc_{l}^{\prime} clc_{l}^{\prime}

      We now claim that clc_{l} and clc_{l}^{\prime} must be equal (otherwise the conclusion hold) for the following reason.

      Consider the following two rows of T(Q)T(Q): row A corresponding to the combination that contains all the items; row B corresponding to the row that contains all the items except for the one in row ll.

      Rows A and B are in fact identical in T(Q)T(Q). This is because all the attributes are used at least twice (condition C5). Then, the attributes in row ll are also required by some other item(s) and rows A and B require the same combination of items. Thus, the corresponding entries of all the column vectors of Tc(Q)T_{c}(Q) are different by a factor of clc_{l}.

      For T(Q)T(Q^{\prime}), rows A and B are also identical. This is because row hh and row ll have identical attribute requirements. Then, thus, the corresponding entries of all the column vectors of Tc(Q)T_{c^{\prime}}(Q) are different by a factor of clc_{l}^{\prime}. Thus, clc^{\prime}_{l} and clc_{l} must be identical otherwise Tc(Q)𝐩T_{c}(Q)\mbox{$\mathbf{p}$}^{*} is not in the column space of Tc(Q)T_{c^{\prime}}(Q).

      Similarly, we obtain that ch=chc_{h}=c^{\prime}_{h}. Given that ch=chc_{h}=c_{h}^{\prime} and cl=clc_{l}=c_{l}^{\prime}, we now consider row hh and row ll. Notice that all the column vectors in Tc(Q)T_{c^{\prime}}(Q^{\prime}) have their entries in row hh and row ll different by a factor of ch/clc_{h}/c_{l}. On the other hand, the hh and llth entries of Tc(Q)𝐩T_{c}(Q)\mbox{$\mathbf{p}$}^{*} are NOT different by a factor of ch/clc_{h}/c_{l} as long as the proportion of p1p_{1} is positive. Thereby, we conclude this case.

    3. (b)

      Consider the following two types of sub-populations: people who are able to answer item(s) in row ll (p1p_{1}) only and people who are able to answer items in both row hh and row ll (p2p_{2}). Similar to the analysis of (a), the sub-matrices look like:

          Tc(Q)T_{c}(Q)
      p1p_{1} p2p_{2}
      row hh 0 chc_{h}
      row ll clc_{l} clc_{l}
          Tc(Q)T_{c^{\prime}}(Q^{\prime})
      p1p_{1} p2p_{2}
      row hh 0 chc_{h}^{\prime}
      row ll 0 clc_{l}^{\prime}

      With exactly the same argument as in (a), we conclude that cj=cjc_{j}=c^{\prime}_{j}, ch=chc_{h}=c^{\prime}_{h}, and further Tc(Q)𝐩T_{c}(Q)\mbox{$\mathbf{p}$}^{*} is not in the column space of Tc(Q)T_{c^{\prime}}(Q^{\prime}).

    4. (c)

      Consider the following three types of sub-populations: people who are able to answer item(s) in row ll only (p1p_{1}), people who are able to answer item(s) in row hh only (p2p_{2}), and people who are able to answer items in both row hh and row ll (p3p_{3}). The sub-matrices look like:

        Tc(Q)T_{c}(Q)
      p1p_{1} p2p_{2} p3p_{3}
      row hh 0 chc_{h} chc_{h}
      row ll clc_{l} 0 clc_{l}
      row lhl\wedge h 0 0 chclc_{h}c_{l}
        Tc(Q)T_{c^{\prime}}(Q^{\prime})
      p1p_{1} p2p_{2} p3p_{3}
      row hh 0 chc_{h}^{\prime} chc_{h}^{\prime}
      row ll 0 clc_{l}^{\prime} clc_{l}^{\prime}
      row lhl\wedge h 0 chclc_{h}^{\prime}c_{l}^{\prime} chclc_{h}^{\prime}c_{l}^{\prime}

      With the same argument, we have that cl=clc_{l}=c^{\prime}_{l} and ch=chc_{h}=c^{\prime}_{h}. On considering row hh and row lhl\wedge h, we conclude that Tc(Q)𝐩T_{c}(Q)\mbox{$\mathbf{p}$}^{*} is not in the column space of Tc(Q)T_{c^{\prime}}(Q^{\prime}). ∎

\noqed
{pf*}

Proof of Proposition 6.4 T()T(\cdot) is arranged as in (30). Consider QQ^{\prime} such that Q1:kQ^{\prime}_{1:k} is incomplete. We discuss the following situations.

  1. 1.

    There are two row vectors, say the hhth and llth row vectors (1i,jk1\leq i,j\leq k), in Q1:kQ_{1:k}^{\prime} that are identical. Equivalently, two items require exactly the same attributes according to QQ^{\prime}. With exactly the same argument as in the previous proof, under condition C5, we have that ch=chc_{h}=c^{\prime}_{h} and cl=clc_{l}=c_{l}^{\prime}. We now consider the rows corresponding to ll and lhl\wedge h. Note that the elements corresponding to row ll and row lhl\wedge h for all the vectors in the column space of Tc(Q)T_{c^{\prime}}(Q^{\prime}) are different by a factor of chc_{h}. However, the corresponding elements in the vector Tc(Q)𝐩T_{c}(Q)\mbox{$\mathbf{p}$}^{*} are NOT different by a factor of chc_{h} as long as the population is fully diversified.

  2. 2.

    No two row vectors in Q1:kQ_{1:k}^{\prime} are identical. Then, among the first kk rows of QQ^{\prime} there is at least one row vector containing two or more nonzero entries. That is, there exists 1ik1\leq i\leq k such that

    j=1kQij>1.\sum_{j=1}^{k}Q_{ij}^{\prime}>1.

    This is because if each of the first kk items requires only one attribute and Q1:kQ_{1:k}^{\prime} is not complete, there are at least two items that require the same attribute. Then, there are two identical row vectors in Q1:kQ_{1:k}^{\prime} and it belongs to the first situation. We define

    ai=j=1kQij,a_{i}=\sum_{j=1}^{k}Q_{ij}^{\prime},

    the number of attributes required by item ii according to QQ^{\prime}.

    Without loss of generality, assume ai>1a_{i}>1 for i=1,,ni=1,\ldots,n and ai=1a_{i}=1 for i=n+1,,ki=n+1,\ldots,k. Equivalently, among the first kk items, only the first nn items require more than one attribute while the (n+1)(n+1)th through the kkth items require only one attribute each, all of which are distinct. Without loss of generality, we assume Qii=1Q_{ii}^{\prime}=1 for i=n+1,,ki=n+1,\ldots,k and Qij=0Q_{ij}=0 for i=n+1,,ki=n+1,\ldots,k and iji\neq j.

    1. [(b)]

    2. (a)

      n=1n=1. Since a1>1a_{1}>1, there exists an l>1l>1 such that Q1l=1Q^{\prime}_{1l}=1. We now consider rows 11 and ll. With the same argument as before (i.e., the attribute required by row ll is also required by item 1 in QQ^{\prime}), we have that cl=clc_{l}=c_{l}^{\prime} (be careful that we cannot claim that c1=c1c_{1}=c_{1}^{\prime}). We now consider the rows 1 and 1l1\wedge l. Note that in Tc(Q)T_{c^{\prime}}(Q^{\prime}) these two rows are different by a factor of clc_{l}; while the corresponding entries in Tc(Q)𝐩T_{c}(Q)\mbox{$\mathbf{p}$}^{*} are NOT different by a factor of clc_{l}. Thereby, we conclude the result in this situation.

    3. (b)

      n>1n>1 and there exists j>nj>n and ini\leq n such that Qij=1Q^{\prime}_{ij}=1. The argument is identical to that in (2a).

    4. (c)

      n>1n>1 and for each j>nj>n and ini\leq n, Qij=0Q^{\prime}_{ij}=0. Let the ii^{\ast}th row in T(Q)T(Q^{\prime}) correspond to I1InI_{1}\wedge\cdots\wedge I_{n}. Let the ihi_{h}^{\ast}th row in T(Q)T(Q^{\prime}) correspond to I1Ih1Ih+1InI_{1}\wedge\cdots\wedge I_{h-1}\wedge I_{h+1}\wedge\cdots\wedge I_{n} for h=1,,nh=1,\ldots,n.

      We claim that there exists an hh such that the ii^{\ast}th row and the ihi_{h}^{\ast}th row are identical in T(Q)T(Q^{\prime}), that is

      BQ(I1Ih1Ih+1In)=BQ(I1In).B_{Q^{\prime}}(I_{1}\wedge\cdots\wedge I_{h-1}\wedge I_{h+1}\wedge\cdots\wedge I_{n})=B_{Q^{\prime}}(I_{1}\wedge\cdots\wedge I_{n}). (1)

      If the above claim is true, then the attributes required by item hh have been required by some other items. Then, we conclude that chc_{h} and chc^{\prime}_{h} must be identical. In addition, rows in Tc(Q)T_{c^{\prime}}(Q^{\prime}) corresponding to I1Ih1Ih+1InI_{1}\wedge\cdots\wedge I_{h-1}\wedge I_{h+1}\wedge\cdots\wedge I_{n} and I1InI_{1}\wedge\cdots\wedge I_{n} are different by a factor of chc_{h}. On the other hand, the corresponding entries in Tc(Q)𝐩T_{c}(Q)\mbox{$\mathbf{p}$}^{*} are NOT different by a factor of chc_{h}. Then, we are able to conclude the results for all the cases.

      In what follows, we prove the claim in (1) by contradiction. Suppose that there does not exist such an hh. This is equivalent to saying that for each jnj\leq n there exists an αj\alpha_{j} such that Qjαj=1Q^{\prime}_{j\alpha_{j}}=1 and Qiαj=0Q^{\prime}_{i\alpha_{j}}=0 for all 1in1\leq i\leq n and iji\neq j. Equivalently, for each jnj\leq n, item jj requires at least one attribute that is not required by other first nn items. Consider

      𝒞i={j:there exists iin such that Qij=1}.\mathcal{C}_{i}=\{j\mathchoice{\nobreak\,\colon}{\nobreak\,\colon}{\nobreak\,\colon\;}{\nobreak\,\colon\;}\mbox{there exists $i\leq i^{\prime}\leq n$ such that $Q^{\prime}_{i^{\prime}j}=1$}\}.

      Let #()\#(\cdot) denote the cardinality of a set. Since for each ini\leq n and j>nj>n, Qij=0Q^{\prime}_{ij}=0, we have that #(𝒞1)n\#(\mathcal{C}_{1})\leq n. Note that Q1α1=1Q^{\prime}_{1\alpha_{1}}=1 and Qiα1=0Q^{\prime}_{i\alpha_{1}}=0 for all 2in2\leq i\leq n. Therefore, α1𝒞1\alpha_{1}\in\mathcal{C}_{1} and α1𝒞2\alpha_{1}\notin\mathcal{C}_{2}. Therefore, #(𝒞2)n1\#(\mathcal{C}_{2})\leq n-1. By a similar argument and induction, we have that an=#(𝒞n)1a_{n}=\#(\mathcal{C}_{n})\leq 1. This contradicts the fact that an>1a_{n}>1. Therefore, there exists an hh such that (1) is true. As for T(Q)T(Q), we have that

      BQ(I1Ih1Ih+1In)BQ(I1In).B_{Q}(I_{1}\wedge\cdots\wedge I_{h-1}\wedge I_{h+1}\wedge\cdots\wedge I_{n})\neq B_{Q}(I_{1}\wedge\cdots\wedge I_{n}).

Summarizing the cases in 1, (2a), (2b) and (2c), we conclude the proof.

Acknowledgements

This research was supported in part by Grants NSF CMMI-1069064, SES-1123698, Institute of Education Sciences R305D100017 and NIH R37 GM047845.

References

  • [1] {barticle}[mr] \bauthor\bsnmChiu, \bfnmChia-Yi\binitsC.Y., \bauthor\bsnmDouglas, \bfnmJeffrey A.\binitsJ.A. &\bauthor\bsnmLi, \bfnmXiaodong\binitsX. (\byear2009). \btitleCluster analysis for cognitive diagnosis: Theory and applications. \bjournalPsychometrika \bvolume74 \bpages633–665. \biddoi=10.1007/s11336-009-9125-0, issn=0033-3123, mr=2565331 \bptokimsref \endbibitem
  • [2] {barticle}[author] \bauthor\bparticlede la \bsnmTorre, \bfnmJ.\binitsJ. (\byear2008). \btitleAn empirically-based method of QQ-matrix balidation for the DINA model: Development and applications. \bjournalJournal of Educational Measurement \bvolume45 \bpages343–362. \bptokimsref \endbibitem
  • [3] {bmisc}[author] \bauthor\bparticlede la \bsnmTorre, \bfnmJ.\binitsJ. (\byear2008). \bhowpublishedThe generalized DINA model. In International Meeting of the Psychometric Society, Durham, NH. \bptokimsref \endbibitem
  • [4] {barticle}[mr] \bauthor\bparticlede la \bsnmTorre, \bfnmJimmy\binitsJ. &\bauthor\bsnmDouglas, \bfnmJeffrey A.\binitsJ.A. (\byear2004). \btitleHigher-order latent trait models for cognitive diagnosis. \bjournalPsychometrika \bvolume69 \bpages333–353. \biddoi=10.1007/BF02295640, issn=0033-3123, mr=2272454 \bptokimsref \endbibitem
  • [5] {bincollection}[author] \bauthor\bsnmDiBello, \bfnmL. V.\binitsL.V., \bauthor\bsnmStout, \bfnmW. F.\binitsW.F. &\bauthor\bsnmRoussos, \bfnmL.\binitsL. (\byear1995). \btitleUnified cognitive psychometric assessment likelihood-based classification techniques. In \bbooktitleCognitively Diagnostic Assessment \bpages361–390. \baddressHillsdale, NJ: \bpublisherErlbaum. \bptokimsref \endbibitem
  • [6] {barticle}[mr] \bauthor\bsnmGelman, \bfnmAndrew\binitsA., \bauthor\bsnmJakulin, \bfnmAleks\binitsA., \bauthor\bsnmPittau, \bfnmMaria Grazia\binitsM.G. &\bauthor\bsnmSu, \bfnmYu-Sung\binitsY.S. (\byear2008). \btitleA weakly informative default prior distribution for logistic and other regression models. \bjournalAnn. Appl. Stat. \bvolume2 \bpages1360–1383. \biddoi=10.1214/08-AOAS191, issn=1932-6157, mr=2655663 \bptokimsref \endbibitem
  • [7] {bmisc}[author] \bauthor\bsnmHartz, \bfnmS. M.\binitsS.M. (\byear2002). \bhowpublishedA Bayesian framework for the unified model for assessing cognitive abilities: Blending theory with practicality. Ph.D. dissertation, Univ. Illinois, Urbana–Champaign. \bptokimsref \endbibitem
  • [8] {barticle}[mr] \bauthor\bsnmHenson, \bfnmRobert\binitsR. &\bauthor\bsnmDouglas, \bfnmJeff\binitsJ. (\byear2005). \btitleTest construction for cognitive diagnosis. \bjournalAppl. Psychol. Meas. \bvolume29 \bpages262–277. \biddoi=10.1177/0146621604272623, issn=0146-6216, mr=2142701 \bptokimsref \endbibitem
  • [9] {barticle}[mr] \bauthor\bsnmHenson, \bfnmRobert\binitsR., \bauthor\bsnmRoussos, \bfnmLouis\binitsL., \bauthor\bsnmDouglas, \bfnmJeff\binitsJ. &\bauthor\bsnmHe, \bfnmXuming\binitsX. (\byear2008). \btitleCognitive diagnostic attribute-level discrimination indices. \bjournalAppl. Psychol. Meas. \bvolume32 \bpages275–288. \biddoi=10.1177/0146621607302478, issn=0146-6216, mr=2432210 \bptokimsref \endbibitem
  • [10] {bmisc}[author] \bauthor\bsnmHenson, \bfnmR. A.\binitsR.A. &\bauthor\bsnmTemplin, \bfnmJ. L.\binitsJ.L. (\byear2005). \bhowpublishedHierarchical log-linear modeling of the skill joint distribution. Technical report, External Diagnostic Research Group. \bptokimsref \endbibitem
  • [11] {bmisc}[author] \bauthor\bsnmJunker, \bfnmB. W.\binitsB.W. (\byear1999). \bhowpublishedSome statistical models and computational methods that may be useful for cognitively-relevant assessment. Technical report. Available at http://www.stat.cmu.edu/~brian/nrc/cfa/documents/final.pdf. \bptokimsref \endbibitem
  • [12] {barticle}[mr] \bauthor\bsnmJunker, \bfnmBrian W.\binitsB.W. &\bauthor\bsnmSijtsma, \bfnmKlaas\binitsK. (\byear2001). \btitleCognitive assessment models with few assumptions, and connections with nonparametric item response theory. \bjournalAppl. Psychol. Meas. \bvolume25 \bpages258–272. \biddoi=10.1177/01466210122032064, issn=0146-6216, mr=1842982 \bptokimsref \endbibitem
  • [13] {barticle}[author] \bauthor\bsnmLeighton, \bfnmJ. P.\binitsJ.P., \bauthor\bsnmGierl, \bfnmM. J.\binitsM.J. &\bauthor\bsnmHunka, \bfnmS. M.\binitsS.M. (\byear2004). \btitleThe attribute hierarchy model for cognitive assessment: A variation on Tatsuoka’s rule-space approach. \bjournalJournal of Educational Measurement \bvolume41 \bpages205–237. \bptokimsref \endbibitem
  • [14] {bmisc}[author] \bauthor\bsnmLiu, \bfnmY\binitsY., \bauthor\bsnmDouglas, \bfnmJ. A.\binitsJ.A. &\bauthor\bsnmHenson, \bfnmR.\binitsR. (\byear2007). \bhowpublishedTesting person fit in cognitive diagnosis. In The Annual Meeting of the National Council on Measurement in Education (NCME), Chicago, IL. \bptokimsref \endbibitem
  • [15] {barticle}[mr] \bauthor\bsnmMaris, \bfnmE.\binitsE. (\byear1999). \btitleEstimating multiple classification latent class models. \bjournalPsychometrika \bvolume64 \bpages187–212. \biddoi=10.1007/BF02294535, issn=0033-3123, mr=1700708 \bptokimsref \endbibitem
  • [16] {bmisc}[author] \bauthor\bsnmReckase, \bfnmMark D.\binitsM.D. (\byear1990). \bhowpublishedUnidimensional data from multidimensional tests and multidimensional data from unidimensional tests. In The Annual Meeting of the American Educational Research Association, Boston, MA, April 16–20. \bptokimsref \endbibitem
  • [17] {bbook}[author] \bauthor\bsnmReckase, \bfnmMark D.\binitsM.D. (\byear2009). \btitleMultidimensional Item Response Theory. \baddressNew York: \bpublisherSpringer. \bptokimsref \endbibitem
  • [18] {bmisc}[author] \bauthor\bsnmRoussos, \bfnmL.\binitsL., \bauthor\bsnmDiBello, \bfnmL. V.\binitsL.V., \bauthor\bsnmStout, \bfnmW.\binitsW., \bauthor\bsnmHartz, \bfnmS.\binitsS., \bauthor\bsnmHenson, \bfnmR.\binitsR. &\bauthor\bsnmTemplin, \bfnmJ.\binitsJ. (\byear2007). \bhowpublishedThe fusion model skills diagnosis system. In Cognitively Diagnostic Assessment for Education: Theory and Practice (J. P. Leighton and M. J. Gierl, eds.) 275–318. New York: Cambridge Univ. Press. \bptokimsref \endbibitem
  • [19] {barticle}[author] \bauthor\bsnmRoussos, \bfnmLouis A.\binitsL.A., \bauthor\bsnmTemplin, \bfnmJonathan L.\binitsJ.L. &\bauthor\bsnmHenson, \bfnmRobert A.\binitsR.A. (\byear2007). \btitleSkills diagnosis using IRT-based latent class models. \bjournalJournal of Educational Measurement \bvolume44 \bpages293–311. \bptokimsref \endbibitem
  • [20] {barticle}[author] \bauthor\bsnmRupp, \bfnmA. A.\binitsA.A. (\byear2002). \btitleFeature selection for choosing and assembling measurement models: A building-block-based organization. \bjournalPsychometrika \bvolume2 \bpages311–360. \bptokimsref \endbibitem
  • [21] {barticle}[author] \bauthor\bsnmRupp, \bfnmA. A.\binitsA.A. &\bauthor\bsnmTemplin, \bfnmJ. L.\binitsJ.L. (\byear2008). \btitleEffects of QQ-matrix misspecification on parameter estimates and misclassification rates in the DINA model. \bjournalEducational and Psychological Measurement \bvolume68 \bpages78–98. \bptokimsref \endbibitem
  • [22] {barticle}[author] \bauthor\bsnmRupp, \bfnmA. A.\binitsA.A. &\bauthor\bsnmTemplin, \bfnmJ. L.\binitsJ.L. (\byear2008). \btitleUnique characteristics of diagnostic classification models: A comprehensive review of the current state-of-the-art. \bjournalMeasurement: Interdisciplinary Research and Perspective \bvolume6 \bpages219–262. \bptokimsref \endbibitem
  • [23] {bbook}[author] \bauthor\bsnmRupp, \bfnmA. A.\binitsA.A., \bauthor\bsnmTemplin, \bfnmJ. L.\binitsJ.L. &\bauthor\bsnmHenson, \bfnmR. A.\binitsR.A. (\byear2010). \btitleDiagnostic Measurement: Theory, Methods, and Applications. \baddressNew York: \bpublisherGuilford Press. \bptokimsref \endbibitem
  • [24] {barticle}[mr] \bauthor\bsnmSchwarz, \bfnmGideon\binitsG. (\byear1978). \btitleEstimating the dimension of a model. \bjournalAnn. Statist. \bvolume6 \bpages461–464. \bidissn=0090-5364, mr=0468014 \bptokimsref \endbibitem
  • [25] {barticle}[author] \bauthor\bsnmStout, \bfnmW.\binitsW. (\byear2007). \btitleSkills diagnosis using IRT-based continuous latent trait models. \bjournalJournal of Educational Measurement \bvolume44 \bpages313–324. \bptokimsref \endbibitem
  • [26] {barticle}[mr] \bauthor\bsnmTatsuoka, \bfnmCurtis\binitsC. (\byear2002). \btitleData analytic methods for latent partially ordered classification models. \bjournalJ. Roy. Statist. Soc. Ser. C \bvolume51 \bpages337–350. \biddoi=10.1111/1467-9876.00272, issn=0035-9254, mr=1920801 \bptokimsref \endbibitem
  • [27] {barticle}[author] \bauthor\bsnmTatsuoka, \bfnmK. K.\binitsK.K. (\byear1983). \btitleRule space: An approch for dealing with misconceptions based on item response theory. \bjournalJournal of Educational Measurement \bvolume20 \bpages345–354. \bptokimsref \endbibitem
  • [28] {barticle}[author] \bauthor\bsnmTatsuoka, \bfnmK. K.\binitsK.K. (\byear1985). \btitleA probabilistic model for diagnosing misconceptions in the pattern classification approach. \bjournalJournal of Educational Statistics \bvolume12 \bpages55–73. \bptokimsref \endbibitem
  • [29] {bbook}[author] \bauthor\bsnmTatsuoka, \bfnmK. K.\binitsK.K. (\byear2009). \btitleCognitive Assessment: An Introduction to the Rule Space Method. \baddressFlorence, KY: \bpublisherRoutledge. \bptokimsref \endbibitem
  • [30] {bmisc}[author] \bauthor\bsnmTemplin, \bfnmJ.\binitsJ., \bauthor\bsnmHe, \bfnmX.\binitsX., \bauthor\bsnmRoussos, \bfnmL.\binitsL. &\bauthor\bsnmStout, \bfnmW.\binitsW. (\byear2003). \bhowpublishedThe pseudo-item method: A simple technique for analysis of polytomous data with the fusion model. Technical report, External Diagnostic Research Group. \bptokimsref \endbibitem
  • [31] {bmisc}[author] \bauthor\bsnmTemplin, \bfnmJ. L.\binitsJ.L. (\byear2006). \bhowpublishedCDM: Cognitive diagnosis modeling with Mplus. Computer software. Available at http://jtemplin.coe.uga.edu/research/. \bptokimsref \endbibitem
  • [32] {barticle}[pbm] \bauthor\bsnmTemplin, \bfnmJonathan L.\binitsJ.L. &\bauthor\bsnmHenson, \bfnmRobert A.\binitsR.A. (\byear2006). \btitleMeasurement of psychological disorders using cognitive diagnosis models. \bjournalPsychol. Methods \bvolume11 \bpages287–305. \biddoi=10.1037/1082-989X.11.3.287, issn=1082-989X, pii=2006-11159-006, pmid=16953706 \bptokimsref \endbibitem
  • [33] {bmisc}[author] \bauthor\bparticlevon \bsnmDavier, \bfnmM.\binitsM. (\byear2005). \bhowpublishedA general diagnosis model applied to language testing data. Research report, Educational Testing Service. \bptokimsref \endbibitem