Theory of self-learning $Q$ -matrix

Jingchen Liulabel=e1,mark]jcliu@stat.columbia.edu [ Gongjun Xulabel=e2,mark]gongjun@stat.columbia.edu [ Zhiliang Yinglabel=e3,mark]zying@stat.columbia.edu [ Department of Statistics, Columbia University, 1255 Amsterdam Avenue, New York, NY 10027, USA.

(2013; 3 2011; 11 2011)

Abstract

Cognitive assessment is a growing area in psychological and educational measurement, where tests are given to assess mastery/deficiency of attributes or skills. A key issue is the correct identification of attributes associated with items in a test. In this paper, we set up a mathematical framework under which theoretical properties may be discussed. We establish sufficient conditions to ensure that the attributes required by each item are learnable from the data.

classification model,

cognitive assessment,

consistency,

diagnostic,

Q

-matrix,

self-learning,

doi:

10.3150/12-BEJ430

keywords:

^†^†volume: 19^†^†issue: 5A

, and

1 Introduction

Cognitive diagnosis has recently gained prominence in educational assessment, psychiatric evaluation, and many other disciplines. A key task is the correct specification of item-attribute relationships. A widely used mathematical formulation is the well known $Q$ -matrix [27]. Under the setting of the $Q$ -matrix, a typical modeling approach assumes a latent variable structure in which each subject possesses a vector of $k$ attributes and responds to $m$ items. The so-called $Q$ -matrix is an $m\times k$ binary matrix establishing the relationship between responses and attributes by indicating the required attributes for each item. The entry in the $i$ th row and $j$ th column indicates if item $i$ requires attribute $j$ (see Example 2.3 for a demonstration of a $Q$ -matrix). A short list of further developments of cognitive diagnosis models (CDMs) based on the $Q$ -matrix includes the rule space method [28, 29], the reparameterized unified/fusion model (RUM) [5, 7, 30], the conjunctive (noncompensatory) DINA and NIDA models [12, 26, 4, 31, 3], the compensatory DINO and NIDO models [32, 31], the attribute hierarchy method [13], and clustering methods [1]; see also [11, 33, 23] for more approaches to cognitive diagnosis.

Statistical analysis with CDMs typically assumes a known $Q$ -matrix provided by experts such as those who developed the questions [20, 10, 19, 25]. Such a priori knowledge when correct is certainly very helpful for both model estimation and eventually identification of subjects’ latent attributes. On the other hand, model fitting is usually sensitive to the choice of $Q$ -matrix and its misspecification could seriously affect the goodness of fit. This is one of the main sources for lack of fit. Various diagnostic tools and testing procedures have been developed [21, 2, 8, 14, 9]. A comprehensive review of diagnostic classification models can be found in [22].

Despite the importance of the $Q$ -matrix in cognitive diagnosis, its estimation problem is largely an unexplored area. Unlike typical inference problems, the inference for the $Q$ -matrix is particularly challenging for the following reasons. First, in many cases, the $Q$ -matrix is simply nonidentifiable. One typical situation is that multiple $Q$ -matrices lead to an identical response distribution. Therefore, we only expect to identify the $Q$ -matrix up to some equivalence relation (Definition 2.2). In other words, two $Q$ -matrices in the same equivalence class are not distinguishable based on data. Our first task is to define a meaningful and identifiable equivalence class. Second, the $Q$ -matrix lives on a discrete space – the set of $m\times k$ matrices with binary entries. This discrete nature makes analysis particularly difficult because calculus tools are not applicable. In fact, most analyses are combinatorics based. Third, the model makes explicit distributional assumptions on the (unobserved) attributes, dictating the law of observed responses. The dependence of responses on attributes via $Q$ -matrix is a highly nonlinear discrete function. The nonlinearity also adds to the difficulty of the analysis.

The primary purpose of this paper is to provide theoretical analyses on the learnability of the underlying $Q$ -matrix. In particular, we obtain definitive answers to the identifiability of $Q$ -matrix for one of the most commonly used models – the DINA model – by specifying a set of sufficient conditions under which the $Q$ -matrix is identifiable up to an explicitly defined equivalence class. We also present the corresponding consistent estimators. We believe that the results (especially the intermediate results) and analysis strategies can be extended to other conjunctive models [15, 12, 31, 32, 18].

The rest of this paper is organized as follows. In Section 2, we present the basic inference result for $Q$ -matrices in a conjunctive model with no slipping or guessing. In addition, we introduce all the necessary terminologies and technical conditions. In Section 3, we extend the results in Section 2 to the DINA model with known slipping and guessing parameters. In Section 4, we further generalize the results to the DINA model with unknown slipping parameters. Further discussion is provided in Section 5. Proofs are given in Section 6. Lastly, the proofs of two key propositions are given in the Appendix.

2 Model specifications and basic results

We start the discussion with a simplified situation, under which the responses depend on the attribute profile deterministically (with no uncertainty). We describe our estimation procedure under this simple scenario. The results for the general cases are given in Sections 3 and 4.

2.1 Basic model specifications

The model specifications consist of the following concepts.

Attributes: subject’s (unobserved) mastery of certain skills. Consider that there are $k$ attributes. Let $\mathbf{A}=(A^{1},\ldots,A^{k})^{\top}$ be the vector of attributes and $A^{j}\in\{0,1\}$ be the indicator of the presence or absence of the $j$ th attribute.

Responses: subject’s binary responses to items. Consider that there are $m$ items. Let $\mathbf{R}=(R^{1},\ldots,R^{m})^{\top}$ be the vector of responses and $R^{i}\in\{0,1\}$ be the response to the $i$ th item.

Both $\mathbf{A}$ and $\mathbf{R}$ are subject specific. We assume that the integers $m$ and $k$ are known.

$Q$ -matrix: the link between item responses and attributes. We define an $m\times k$ matrix $Q=(Q_{ij})_{m\times k}$ . For each $i$ and $j$ , $Q_{ij}=1$ when item $i$ requires attribute $j$ and $0$ otherwise.

Furthermore, we define

\xi^{i}=\prod_{j=1}^{k}(A^{j})^{Q_{ij}}={\mathbf{1}}(A^{j}\geq Q_{ij}\mathchoice{\nobreak\,\colon}{\nobreak\,\colon}{\nobreak\,\colon\;}{\nobreak\,\colon\;}j=1,\ldots,k),

(1)

which indicates whether a subject with attribute $\mathbf{A}$ is capable of providing a positive response to item $i$ . This model is conjunctive, meaning that it is necessary and sufficient to master all the necessary skills to be capable of solving one problem. Possessing additional attributes does not compensate for the absence of necessary attributes. In this section, we consider the simplest situation that there is no uncertainty in the response, that is,

R^{i}=\xi^{i}

(2)

for $i=1,\ldots,m$ . Therefore, the responses are completely determined by the attributes. We assume that all items require at least one attribute. Equivalently, the $Q$ -matrix does not have zero row vectors. Subjects who do not possess any attribute are not capable of responding positively to any item.

We use subscripts to indicate different subjects. For instance, $\mathbf{R}_{r}=(R^{1}_{r},\ldots,R^{m}_{r})^{\top}$ is the response vector of subject $r$ . Similarly, $\mathbf{A}_{r}$ is the attribute vector of subject $r$ . We observe $\mathbf{R}_{1},\ldots,\mathbf{R}_{N}$ , where we use $N$ to denote sample size. The attributes $\mathbf{A}_{r}$ are not observed. Our objective is to make inference on the $Q$ -matrix based on the observed responses.

2.2 Estimation of $Q$ -matrix

We first introduce a few quantities for the presentation of an estimator.

$T$ -matrix

In order to provide an estimator of $Q$ , we first introduce one central quantity, the $T$ -matrix, which connects the $Q$ -matrix with the response and attribute distributions. Matrix $T(Q)$ has $2^{k}-1$ columns each of which corresponds to one nonzero attribute vector, $\mathbf{A}\in\{0,1\}^{k}\setminus\{(0,\ldots,0)\}$ . Instead of labeling the columns of $T(Q)$ by ordinal numbers, we label them by binary vectors of length $k$ . For instance, the $\mathbf{A}$ th column of $T(Q)$ is the column that corresponds to attribute $\mathbf{A}$ , for all $\mathbf{A}\neq(0,\ldots,0)$ .

Let $I_{i}$ be a generic notation for positive responses to item $i$ . Let “ $\wedge$ ” stand for “and” combination. For instance, $I_{i_{1}}\wedge I_{i_{2}}$ denotes positive responses to both items $i_{1}$ and $i_{2}$ . Each row of $T(Q)$ corresponds to one item or one “and” combination of items, for instance, $I_{i_{1}}$ , $I_{i_{1}}\wedge I_{i_{2}}$ or $I_{i_{1}}\wedge I_{i_{2}}\wedge I_{i_{3}}$ , …. If $T(Q)$ contains all the single items and all “and” combinations, $T(Q)$ contains $2^{m}-1$ rows. We will later say that such a $T(Q)$ is saturated (Definition 2.1 in Section 2.4).

We now describe each row vector of $T(Q)$ . We define that $B_{Q}(I_{i})$ is a $2^{k}-1$ dimensional row vector. Using the same labeling system as that of the columns of $T(Q)$ , the $\mathbf{A}$ th element of $B_{Q}(I_{i})$ is defined as $\prod_{j=1}^{k}(A^{j})^{Q_{ij}}$ , which indicates if a subject with attribute $\mathbf{A}$ is able to solve item $i$ .

Using a similar notation, we define that

B_{Q}(I_{i_{1}}\wedge\cdots\wedge I_{i_{l}})=\Upsilon_{h=1}^{l}B_{Q}(I_{i_{h}}),

(3)

where the operator “ $\Upsilon_{h=1}^{l}$ ” is element-by-element multiplication from $B_{Q}(I_{i_{1}})$ to $B_{Q}(I_{i_{l}})$ . For instance,

W=\Upsilon_{h=1}^{l}V_{h}

means that $W^{j}=\prod_{h=1}^{l}V_{h}^{j}$ , where $W=(W^{1},\ldots,W^{2^{k}-1})$ and $V_{h}=(V_{h}^{1},\ldots,V_{h}^{2^{k}-1})$ . Therefore, $B_{Q}(I_{i_{1}}\wedge\cdots\wedge I_{i_{l}})$ is the vector indicating the attributes that are capable of responding positively to items $i_{1},\ldots,i_{l}$ . The row in $T(Q)$ corresponding to $I_{i_{1}}\wedge\cdots\wedge I_{i_{l}}$ is $B_{Q}(I_{i_{1}}\wedge\cdots\wedge I_{i_{l}})$ .

$\alpha$ -vector

We let $\alpha$ be a column vector the length of which equals to the number of rows of $T(Q)$ . Each element of $\alpha$ corresponds to one row vector of $T(Q)$ . The element in $\alpha$ corresponding to $I_{i_{1}}\wedge\cdots\wedge I_{i_{l}}$ is defined as $N_{I_{i_{1}}\wedge\cdots\wedge I_{i_{l}}}/N$ , where $N_{I_{i_{1}}\wedge\cdots\wedge I_{i_{l}}}$ denotes the number of people who have positive responses to items $i_{1},\ldots,i_{l}$ , that is

N_{I_{i_{1}}\wedge\cdots\wedge I_{i_{l}}}=\sum_{r=1}^{N}I(R_{r}^{i_{j}}=1\mathchoice{\nobreak\,\colon}{\nobreak\,\colon}{\nobreak\,\colon\;}{\nobreak\,\colon\;}j=1,\ldots,l).

For each $\mathbf{A}\in\{0,1\}^{k}$ , we let

\hat{p}_{\mathbf{A}}=\frac{1}{N}\sum_{r=1}^{N}I(\mathbf{A}_{r}=\mathbf{A}).

(4)

If (2) is strictly respected, then

T(Q)\hat{\mathbf{p}}=\alpha,

(5)

where $\hat{\mathbf{p}}=(\hat{p}_{\mathbf{A}}\mathchoice{\nobreak\,\colon}{\nobreak\,\colon}{\nobreak\,\colon\;}{\nobreak\,\colon\;}\mathbf{A}\in\{0,1\}^{k}\setminus\{(0,\ldots,0)\})$ is arranged in the same order as the columns of $T(Q)$ . This is because each row of $T(Q)$ indicates the attribute profiles corresponding to subjects capable of responding positively to that set of item(s). Vector $\hat{\mathbf{p}}$ contains the proportions of subjects with each attribute profile. For each set of items, matrix multiplication sums up the proportions corresponding to each attribute profile capable of responding positively to that set of items, giving us the total proportion of subjects who respond positively to the corresponding items.

The estimator of the $Q$ -matrix

For each $m\times k$ binary matrix $Q^{\prime}$ , we define

S(Q^{\prime})=\inf_{\mathbf{p}\in[0,1]^{2^{k}-1}}|T(Q^{\prime})\mathbf{p}-\alpha|,

(6)

where $\mathbf{p}=(p_{\mathbf{A}}\mathchoice{\nobreak\,\colon}{\nobreak\,\colon}{\nobreak\,\colon\;}{\nobreak\,\colon\;}\mathbf{A}\neq(0,\ldots,0))$ . The above minimization is subject to the constraint that $\sum_{\mathbf{A}\neq\mathbf{(}0,\ldots,0)}p_{\mathbf{A}}\in[0,1]$ . $|\cdot|$ is the Euclidean distance. An estimator of $Q$ can be obtained by minimizing $S(Q^{\prime})$ ,

\hat{Q}=\arg\inf_{Q^{\prime}}S(Q^{\prime}),

(7)

where “ $\arg\inf$ ” is the minimizer of the minimization problem over all $m\times k$ binary matrices. Note that the minimizers are not unique. We will later prove that the minimizers are in the same meaningful equivalence class. Because of (5), the true $Q$ -matrix is always among the minimizers because $S(Q)=0$ .

2.3 Example

We illustrate the above construction by one simple example. We emphasize that this example is discussed to explain the estimation procedure for a concrete and simple example. The proposed estimator is certainly able to handle much larger $Q$ -matrices. We consider the following $3\times 2$ $Q$ -matrix,

2+3105×201(2+3)×211Q=\,\begin{tabular}[]{@{}lll@{}}\hline\cr&Addition&Multiplication\\ \hline\cr$2+3$&$1$&$0$\\ $5\times 2$&$0$&$1$\\ $(2+3)\times 2$&$1$&$1$\\ \hline\cr\end{tabular}

	$Q=\,\begin{tabular}[]{@{}lll@{}}\hline\cr&Addition&Multiplication\\ \hline\cr$2+3$&$1$&$0$\\ $5\times 2$&$0$&$1$\\ $(2+3)\times 2$&$1$&$1$\\ \hline\cr\end{tabular}$		Addition	Multiplication		(8)

There are two attributes and three items. We consider the contingency table of attributes,

	Multiplication
Addition	$\hat{p}_{00}$	$\hat{p}_{01}$
Addition	$\hat{p}_{10}$	$\hat{p}_{11}$

In the above table, $\hat{p}_{00}$ is the proportional of people who do not master either addition or multiplication. Similarly, we define $\hat{p}_{01}$ , $\hat{p}_{10}$ and $\hat{p}_{11}$ . $\{\hat{p}_{ij};j=0,1\}$ is not observed.

Just for illustration, we construct a simple nonsaturated $T$ -matrix. Suppose the relationship in (2) is strictly respected. Then, we should be able to establish the following identities:

N(\hat{p}_{10}+\hat{p}_{11})=N_{I_{1}},\qquad N(\hat{p}_{01}+\hat{p}_{11})=N_{I_{2}},\qquad N\hat{p}_{11}=N_{I_{3}}.

(9)

Therefore, if we let $\hat{\mathbf{p}}=(\hat{p}_{10},\hat{p}_{01},\hat{p}_{11})$ , the above display imposes three linear constraints on the vector $\hat{\mathbf{p}}$ . Together with the natural constraint that $\sum_{ij}\hat{p}_{ij}=1$ , $\hat{\mathbf{p}}$ solves linear equation,

T(Q)\hat{\mathbf{p}}=\alpha,

(10)

subject to the constraints that $\hat{\mathbf{p}}\in[0,1]^{3}$ and $\hat{p}_{10}+\hat{p}_{01}+\hat{p}_{11}\in[0,1]$ , where

T(Q)=\pmatrix{1&0&1\cr 0&1&1\cr 0&0&1},\qquad\alpha=\pmatrix{N_{I_{1}}/N\cr N_{I_{2}}/N\cr N_{I_{3}}/N}.

(11)

Each column of $T(Q)$ corresponds to one attribute profile. The first column corresponds to $\mathbf{A}=(1,0)$ , the second column to $\mathbf{A}=(0,1)$ , and the third column to $\mathbf{A}=(1,1)$ . The first row corresponds to item $2+3$ , the second row to $5\times 2$ and the last row to $(2+3)\times 2$ . For this particular situation, $T(Q)$ has full rank and there exists one unique solution to (10). In fact, we would not expect the constrained solution to the linear equation in (10) to always exist unless (2) is strictly followed. This is the topic of the next section.

The identities in (9) only consider the marginal rate of each question. There are additional constraints if one considers “combinations” among items. For instance,

N\hat{p}_{11}=N_{I_{1}\wedge I_{2}}.

People who are able to solve problem 3 must have both attributes and therefore are able to solve both problems 1 and 2. Again, if (2) is not strictly followed, this is not necessarily respected in the real data, though it is a logical conclusion. The DINA in the next section handles such a case. Upon considering $I_{1}$ , $I_{2}$ , $I_{3}$ and $I_{1}\wedge I_{2}$ , the new $T$ -matrix is

T(Q)=\pmatrix{1&0&1\cr 0&1&1\cr 0&0&1\cr 0&0&1},\qquad\alpha=\pmatrix{N_{I_{1}}/N\cr N_{I_{2}}/N\cr N_{I_{3}}/N\cr N_{I_{1}\wedge I_{2}}/N}.

(12)

The last row is added corresponding to $I_{1}\wedge I_{2}$ . With (2) in force, we have

S(Q)=\inf_{\mathbf{p}\in[0,1]^{3}}|T(Q)\mathbf{p}-\alpha|=|T(Q)\hat{\mathbf{p}}-\alpha|=0,

(13)

if $Q$ is the true matrix.

2.4 Basic results

Before stating the main result, we provide a list of notations, which will be used in the discussions.

•

Linear space spanned by vectors $V_{1},\ldots,V_{l}$ :

\mathcal{L}(V_{1},\ldots,V_{l})=\Biggl{\{}\sum_{j=1}^{l}a_{j}V_{j}\mathchoice{\nobreak\,\colon}{\nobreak\,\colon}{\nobreak\,\colon\;}{\nobreak\,\colon\;}a_{j}\in\mathbb{R}\Biggr{\}}.

•

For a matrix $M$ , $M_{1:l}$ denotes the submatrix containing the first $l$ rows and all columns of $M$ .
•

Vector $e_{i}$ denotes a column vector such that the $i$ th element is one and the rest are zero. When there is no ambiguity, we omit the length index of $e_{i}$ .
•

Matrix $\mathcal{I}_{l}$ denotes the $l\times l$ identity matrix.
•

For a matrix $M$ , $C(M)$ is the linear space generated by the column vectors of $M$ . It is usually called the column space of $M$ .
•

$C_{M}$ denotes the set of column vectors of $M$ .
•

$R_{M}$ denotes the set of row vectors of $M$ .
•

Vector $\mathbf{0}$ denotes the zero vector, $(0,\ldots,0)$ . When there is no ambiguity, we omit the index of length.
•

Scalar $p_{\mathbf{A}}$ denotes the probability that a subject has attribute profile $\mathbf{A}$ . For instance, $p_{10}$ is the probability that a subject has attribute one but not attribute two.

•

Define a $2^{k}-1$ dimensional vector

\mathbf{p}=(p_{\mathbf{A}}\mathchoice{\nobreak\,\colon}{\nobreak\,\colon}{\nobreak\,\colon\;}{\nobreak\,\colon\;}\mathbf{A}\in\{0,1\}^{k}\setminus\{\mathbf{0}\}).

•

Let $c$ and $g$ be two $m$ dimensional vectors. We write $c\succ g$ if $c_{i}>g_{i}$ for all $1\leq i\leq m$ .
•

We write $c\ncong g$ if $c_{i}\neq g_{i}$ for all $i=1,\ldots,m$ .
•

Matrix $Q$ denotes the true matrix and $Q^{\prime}$ denotes a generic $m\times k$ binary matrix.

The following definitions will be used in subsequent discussions.

Definition 2.1.

We say that $T(Q)$ is saturated if all combinations of form $I_{i_{1}}\wedge\cdots\wedge I_{i_{l}}$ , for $l=1,\ldots,m$ , are included in $T(Q)$ .

Definition 2.2.

We write $Q\sim Q^{\prime}$ if and only if $Q$ and $Q^{\prime}$ have identical column vectors, which could be arranged in different orders; otherwise, we write $Q\nsim Q^{\prime}$ .

Remark 2.1.

It is not hard to show that “ $\sim$ ” is an equivalence relation. $Q\sim Q^{\prime}$ if and only if they are identical after an appropriate permutation of the columns. Each column of $Q$ is interpreted as an attribute. Permuting the columns of $Q$ is equivalent to relabeling the attributes. For $Q\sim Q^{\prime}$ , we are not able to distinguish $Q$ from $Q^{\prime}$ based on data.

Definition 2.3.

A $Q$ -matrix is said to be complete if $\{e_{i}\mathchoice{\nobreak\,\colon}{\nobreak\,\colon}{\nobreak\,\colon\;}{\nobreak\,\colon\;}i=1,\ldots,k\}\subset R_{Q}$ ( $R_{Q}$ is the set of row vectors of $Q$ ); otherwise, we say that $Q$ is incomplete.

A $Q$ -matrix is complete if and only if for each attribute there exists an item only requiring that attribute. Completeness implies that $m\geq k$ . We will show that completeness is among the sufficient conditions to identify $Q$ .

Remark 2.2.

One of the main objectives of cognitive assessment is to identify the subjects’ attributes; see [22] for other applications. It has been established in [1] that the completeness of the $Q$ -matrix is a sufficient and necessary condition for a set of items to consistently identify attributes if (2) is strictly followed. Thus, it is usually recommended to use a complete $Q$ -matrix. For a precise formulation, see [1].

Listed below are assumptions which will be used in subsequent development.

[C3]
C1

$Q$ is complete.
C2

$T(Q)$ is saturated.
C3

$\mathbf{A}_{1},\ldots,\mathbf{A}_{N}$ are i.i.d. random vectors following distribution

$P(\mathbf{A}_{r}=\mathbf{A})=p^{*}_{\mathbf{A}}.$

We further let $\mathbf{p}^{*}=(p^{*}_{\mathbf{A}}\mathchoice{\nobreak\,\colon}{\nobreak\,\colon}{\nobreak\,\colon\;}{\nobreak\,\colon\;}\mathbf{A}\in\{0,1\}\setminus\{\mathbf{0}\})$ .

[C5]
C4

$(p_{\mathbf{0}}^{*},\mathbf{p}^{*})\succ\mathbf{0}$ .
C5

Each attribute has been required by at least two items.

With these preparations, we are ready to introduce the first theorem, the proof of which is given in Section 6.

Theorem 2.4.

Assume that conditions C1–C5 are in force. Suppose that for subject $r$ the response corresponding to item $i$ follows

R^{i}_{r}=\xi^{i}_{r}=\prod_{j=1}^{k}(A^{j}_{r})^{Q_{ij}}.

Let $\hat{Q}$ , defined in (7), be a minimizer of $S(Q^{\prime})$ among all $m\times k$ binary matrices, where $S(Q^{\prime})$ is defined in (6). Then,

\lim_{N\rightarrow\infty}P(\hat{Q}\sim Q)=1.

(14)

Further, let

\tilde{\mathbf{p}}=\arg\inf_{\mathbf{p}}|T(\hat{Q})\mathbf{p}-\alpha|^{2}.

(15)

With an appropriate rearrangement of the columns of $\hat{Q}$ , for any $\varepsilon>0$

\lim_{N\rightarrow\infty}P(|\tilde{\mathbf{p}}-\mathbf{p}^{*}|\leq\varepsilon)=1.

Remark 2.3.

If $Q_{1}\sim Q_{2}$ , the two matrices only differ by a column permutation and will be considered to be the “same”. Therefore, we expect to identify the equivalence class that $Q$ belongs to. Also, note that $S(Q_{1})=S(Q_{2})$ if $Q_{1}\sim Q_{2}$ .

Remark 2.4.

In order to obtain the consistency of $\hat{Q}$ (subject to a column permutation), it is necessary to have $\mathbf{p}^{*}$ not living on some sub-manifold. To see a counter example, suppose that $P(\mathbf{A}_{r}=(1,\ldots,1)^{\top})=p^{*}_{1\ldots 1}=1$ . Then, for all $Q$ , $P(\mathbf{R}_{r}=(1,\ldots,1)^{\top})=1$ , that is, all subjects are able to solve all problems. Therefore, the distribution of $\mathbf{R}$ is independent of $Q$ . In other words, the $Q$ -matrix is not identifiable. More generally, if there exit $A_{r}^{i}$ and $A_{r}^{j}$ such that $P(A_{r}^{i}=A_{r}^{j})=1$ , then the $Q$ -matrix is not identifiable based on the data. This is because one cannot tell if an item requires attribute $i$ alone, attribute $j$ alone, or both; see [16, 17] for similar cases for the multidimensional IRT models.

Remark 2.5.

Note that the estimator of the attribute distribution, $\tilde{\mathbf{p}}$ , in (15) depends on the order of columns of $\hat{Q}$ . In order to achieve consistency, we will need to arrange the columns of $\hat{Q}$ such that $\hat{Q}=Q$ whenever $\hat{Q}\sim Q$ .

Remark 2.6.

One practical issue associated with the proposed procedure is the computation. For a specific $Q$ , the computation of $S(Q)$ only involves a constraint minimization of a quadratic function. However, if $m$ or $k$ is large, the computation overhead of searching the minimizer of $S(Q)$ over the space of $m\times k$ matrices could be substantial. One practical solution is to break the $Q$ -matrix into smaller sub-matrices. For instance, one may divide the $m$ items in to $l$ groups (possibly with nonempty overlap across different groups). Then apply the proposed estimator to each of the $l$ group of items. This is equivalent to breaking a big $m$ by $k$ $Q$ -matrix into several smaller matrices and estimating each of them separately. Lastly, combine the $l$ estimated sub-matrices together to form a single estimate. The consistency results can be applied to each of the $l$ sub-matrices and therefore the combined matrix is also a consistent estimator. A similar technique has been discussed in Chapter 8.6 of [29].

Remark 2.7.

Conditions C1 and C2 are imposed to guarantee consistency. They may not be always necessary. Furthermore, constructing a saturated $T$ -matrix is sometimes computationally not feasible, especially when the number of items is large. In practice, one may include the combinations of one item, two items, …, $j$ items. The choice of $j$ depends on the sample size and the computation resources. The condition C5 is required for technical purposes. Nonetheless, one can in fact construct counterexamples, that is, the $Q$ -matrix is not identifiable up to the relationship “ $\sim$ ”, if C5 is violated.

3 DINA model with known slipping and guessing parameters

3.1 Model specification

In this section, we extend the inference results in the previous section to the situation under which the responses do not deterministically depend on the attributes. In particular, we consider the DINA (Deterministic Input, Noisy Output “AND” gate) model [12]. We would like to introduce two parameters: the slipping parameter ( $s_{i}$ ) and the guessing parameter $(g_{i})$ . Here $1-s_{i}$ ( $g_{i}$ ) is the probability of a subject’s responding positively to item $i$ given that s/he is capable of solving that problem. To simplify the notations, we denote $1-s_{i}$ by $c_{i}$ . An extension of (2) to include slipping and guessing specifies the response probabilities as

P(R^{i}=1|\xi^{i})=c_{i}^{\xi^{i}}g_{i}^{1-\xi^{i}},

(16)

where $\xi^{i}$ is the capability indicator defined in (1). In addition, conditional on $\{\xi^{1},\ldots,\xi^{m}\}$ , $\{R^{1},\ldots,R^{m}\}$ are jointly independent.

In this context, the $T$ -matrix needs to be modified accordingly. Throughout this section, we assume that both $c_{i}$ ’s and $g_{i}$ ’s are known. We discuss the case that $c_{i}$ ’s are unknown in the next section.

We first consider the case that $g_{i}=0$ for all $i=1,\ldots,m$ . We introduce a diagonal matrix $D_{c}$ . If the $h$ th row of matrix $T_{c}(Q)$ corresponds to $I_{i_{1}}\wedge\cdots\wedge I_{i_{l}}$ , then the $h$ th diagonal element of $D_{c}$ is $c_{i_{1}}\times\cdots\times c_{i_{l}}$ . Then, we let

T_{c}(Q)=D_{c}T(Q),

(17)

where $T(Q)$ is the binary matrix defined previously. In other words, we multiply each row of $T(Q)$ by a common factor and obtain $T_{c}(Q)$ . Note that in absence of slipping ( $c_{i}=1$ for each $i$ ) we have that $T_{c}(Q)=T(Q)$ .

There is another equivalent way of constructing $T_{c}(Q)$ . We define

B_{c,Q}(I_{j})=c_{j}B_{Q}(I_{j})

and

B_{c,Q}(I_{i_{1}}\wedge\cdots\wedge I_{i_{l}})=\Upsilon_{h=1}^{l}B_{c,Q}(I_{i_{h}}),

(18)

where “ $\Upsilon$ ” refers to element by element multiplication. Let the row vector in $T_{c}(Q)$ corresponding to $I_{i_{1}}\wedge\cdots\wedge I_{i_{l}}$ be $B_{c,Q}(I_{i_{1}}\wedge\cdots\wedge I_{i_{l}})$ .

For instance, with $c=(c_{1},c_{2},c_{3})$ , the $T_{c}(Q)$ corresponding to the $T$ -matrix in (12) would be

T_{c}(Q)=\pmatrix{c_{1}&0&c_{1}\cr 0&c_{2}&c_{2}\cr 0&0&c_{3}\cr 0&0&c_{1}c_{2}}.

(19)

Lastly, we consider the situation that both the probability of making a mistake and the probability of guessing correctly could be strictly positive. By this, we mean that the probability that a subject responds positively to item $i$ is $c_{i}$ if s/he is capable of doing so; otherwise the probability is $g_{i}$ . We create a corresponding $T_{c,g}(Q)$ by slightly modifying $T_{c}(Q)$ . We define row vector

\mathbb{E}=(1,\ldots,1).

When there is no ambiguity, we omit the length index of $\mathbb{E}$ . Now, let

B_{c,g,Q}(I_{i})=g_{i}\mathbb{E}+(c_{i}-g_{i})B_{Q}(I_{i})

and

B_{c,g,Q}(I_{i_{1}}\wedge\cdots\wedge I_{i_{l}})=\Upsilon_{h=1}^{l}B_{c,g,Q}(I_{i_{h}}).

(20)

Let the row vector in $T_{c,g}(Q)$ corresponding to $I_{i_{1}}\wedge\cdots\wedge I_{i_{l}}$ be $B_{c,g,Q}(I_{i_{1}}\wedge\cdots\wedge I_{i_{l}})$ . For instance, the matrix $T_{c,g}$ corresponding to the $T_{c}(Q)$ in (19) is

T_{c,g}(Q)=\pmatrix{c_{1}&g_{1}&c_{1}\cr g_{2}&c_{2}&c_{2}\cr g_{3}&g_{3}&c_{3}\cr c_{1}g_{2}&g_{1}c_{2}&c_{1}c_{2}}.

(21)

3.2 Estimation of the $Q$ -matrix and consistency results

Having concluded our preparations, we are now ready to introduce our estimators for $Q$ . Given $c$ and $g$ , we define

S_{c,g}(Q)=\inf_{\mathbf{p}^{\prime}\in[0,1]^{2^{k}-1}}|T_{c,g}(Q)\mathbf{p}^{\prime}+p_{\mathbf{0}}^{\prime}\mathbf{g}-\alpha|,

(22)

where $\mathbf{p}^{\prime}=(p^{\prime}_{\mathbf{A}}\mathchoice{\nobreak\,\colon}{\nobreak\,\colon}{\nobreak\,\colon\;}{\nobreak\,\colon\;}\mathbf{A}\in\{0,1\}^{k}\setminus\{\mathbf{0}\})$ , $p^{\prime}_{\mathbf{0}}=p^{\prime}_{0\ldots 0}$ and

\mathbf{g}=\pmatrix{g_{1}\cr\vdots\cr g_{k}\cr g_{1}g_{2}\cr\vdots\cr g_{k-1}g_{k}\cr g_{1}g_{2}g_{3}\cr\vdots}\begin{array}[]{l}I_{1}\\ \vdots\\ I_{k}\\ I_{1}\wedge I_{2}\\ \vdots\\ I_{k-1}\wedge I_{k}\\ I_{1}\wedge I_{2}\wedge I_{3}\\ \vdots\end{array}

(23)

The labels to the right of the vector indicate the corresponding row vectors in $T_{c,g}(Q)$ . The minimization in (22) is subject to constraints that

p^{\prime}_{\mathbf{A}}\in[0,1]\quad\mbox{and}\quad\sum_{\mathbf{A}}p_{\mathbf{A}}^{\prime}=1.

The vector $\mathbf{g}$ contains the probabilities of providing positive responses to items simply by guessing. We propose an estimator of the $Q$ -matrix through a minimization problem, that is,

\hat{Q}(c,g)=\arg\inf_{Q^{\prime}}S_{c,g}(Q^{\prime}).

(24)

We write $c$ and $g$ in the argument to emphasize that the estimator depends on $c$ and $g$ . The computation of the minimization in (22) consists of minimizing a quadratic function subject to finitely many linear constraints. Therefore, it can be done efficiently.

Theorem 3.1.

Suppose that $c$ and $g$ are known and that conditions C1–C5 are in force. For subject $r$ , the responses are generated independently such that

P(R_{r}^{i}=1|\xi^{i}_{r})=c_{i}^{\xi^{i}_{r}}g_{i}^{1-\xi^{i}_{r}},

(25)

where $\xi^{i}_{r}$ is defined as in Theorem 2.4. Let $\hat{Q}(c,g)$ be defined as in (24). If $c_{i}\neq g_{i}$ for all $i$ and $T_{c-g}(Q)\mbox{$\mathbf{p}$}^{*}$ does not have zero elements, then

\lim_{N\rightarrow\infty}P\bigl{(}\hat{Q}(c,g)\sim Q\bigr{)}=1.

Furthermore, let

\tilde{\mathbf{p}}(c,g)=\arg\inf_{\mathbf{p}}|T_{c,g}(\hat{Q}(c,g))\mathbf{p}+p_{\mathbf{0}}\mathbf{g}-\alpha|^{2},

subject to constraint that $\sum_{\mathbf{A}}p_{\mathbf{A}}=1$ . Then, with an appropriate rearrangement of the columns of $\hat{Q}$ , for any $\varepsilon>0$ ,

\lim_{N\rightarrow\infty}P\bigl{(}|\tilde{\mathbf{p}}(c,g)-\mathbf{p}^{*}|\leq\varepsilon\bigr{)}=1.

Remark 3.1.

There are various metrics one can employ to measure the distance between the vectors $T_{c,g}(\hat{Q}(c,g))\mathbf{p}+p_{\mathbf{0}}\mathbf{g}$ and $\alpha$ . In fact, any metric that generates the same topology as the Euclidian metric is sufficient to obtain the consistency results in the theorem. For instance, a principled choice of objective function would be the likelihood with $\mathbf{p}$ profiled out. The reason we prefer the Euclidian metric (versus, for instance, the full likelihood) is that the evaluation of $S(Q)$ is easier than the evaluation based on other metrics. More specifically, the computation of current $S(Q)$ consists of quadratic programming types of well oiled optimization techniques.

4 Extension to the situation with unknown slipping probabilities

In this section, we further extend our results to the situation where the slipping probabilities are unknown and the guessing probabilities are known. In the context of standard exams, the guessing probabilities can typically be set to zero for open problems. For instance, the chance of guessing the correct answer to $``(3+2)\times 2=\,$ ?” is very small. On the other hand, for multiple choice problems, the guessing probabilities cannot be ignored. In that case, $g_{i}$ can be considered as $1/n$ when there are $n$ choices; see Remark 4.2 for more description.

4.1 Estimator of $c$

We provide two estimators of $c$ given $Q$ and $g$ . One is applicable to all $Q$ -matrices, but computationally intensive. The other is computationally easy, but requires certain structures of $Q$ . Then, we combine them into a single estimator.

A general estimator

We first provide an estimator of $c$ that is applicable to all $Q$ -matrices. Considering that the estimator of $Q$ minimizes the objective function $S_{c,g}(Q)$ , we propose the following estimator of $c$ :

\tilde{c}(Q,g)=\arg\inf_{c\in[0,1]^{m}}S_{c,g}(Q).

(26)

A moment estimator

The computation of $\tilde{c}(Q,g)$ is typically intensive. When the $Q$ -matrix has a certain structure, we are able to estimate $c$ consistently based on estimating equations.

For a particular item $i$ , suppose that there exist items ${i_{1}},\ldots,{i_{l}}$ (different from $i$ ) such that

B_{Q}(I_{i}\wedge I_{i_{1}}\wedge\cdots\wedge I_{i_{l}})=B_{Q}(I_{i_{1}}\wedge\cdots\wedge I_{i_{l}}),

(27)

that is, the attributes required by item $i$ are a subset of the attributes required by ${i_{1}},\ldots,{i_{l}}$ .

Let ${c-g}=(c_{1}-g_{1},\ldots,c_{m}-g_{m})$ and

\tilde{T}_{c,g}(Q)=\pmatrix{\mathbf{g}&T_{c,g}(Q)\cr 1&\mathbb{E}}.

We borrow a result which will be given in the proof of Proposition 6.6 (Section 6.1) to say that there exists a matrix $D$ (only depending on $g$ ) such that

D\tilde{T}_{c,g}(Q)=(\mathbf{0},T_{{c-g}}(Q)).

Let $a_{g}$ and $a_{*g}$ be the row vectors in $D$ corresponding to $I_{i_{1}}\wedge\cdots\wedge I_{i_{i}}$ and $I_{i}\wedge I_{i_{1}}\wedge\cdots\wedge I_{i_{i}}$ (in $T_{{c-g}}(Q)$ ).

Then,

	$\displaystyle\frac{a_{\ast g}^{\top}{\alpha\choose 1}}{a_{g}^{\top}{\alpha\choose 1}}$	$\displaystyle=$	$\displaystyle\frac{a_{\ast g}^{\top}\tilde{T}_{c,g}(Q){p_{\mathbf{0}}^{}\choose\mathbf{p}^{}}}{a_{g}^{\top}\tilde{T}_{c,g}(Q){p_{\mathbf{0}}^{}\choose\mathbf{p}^{}}}+\mathrm{o}_{p}(1)$
		$\displaystyle=$	$\displaystyle\frac{B_{{c-g},Q}(I_{i}\wedge I_{i_{1}}\wedge\cdots\wedge I_{i_{l}})\mathbf{p}^{}}{B_{{c-g},Q}(I_{i_{1}}\wedge\cdots\wedge I_{i_{l}})\mathbf{p}^{}}+\mathrm{o}_{p}(1)\mathop{\rightarrow}^{p}(c_{i}-g_{i}),$

where the vectors $a_{g}$ and $a_{*g}$ only depend on $g$ .

Therefore, the corresponding estimator of $c_{i}$ would be

\bar{c}_{i}(Q,g)=g_{i}+\frac{a^{\top}_{*g}{\alpha\choose 1}}{a^{\top}_{g}{\alpha\choose 1}}.

(29)

Note that the computation of $\bar{c}_{i}(Q,g)$ only consists of affine transformations and therefore is very fast.

Proposition 4.1.

Suppose conditions C3, (25) and (27) are true. Then $\bar{c}_{i}\rightarrow c_{i}$ in probability as $N\rightarrow\infty$ .

Proof.

By the law of large numbers,

a_{\ast g}^{\top}\pmatrix{\alpha\cr 1}-a_{\ast g}^{\top}\tilde{T}_{c,g}(Q)\pmatrix{p_{\mathbf{0}}^{*}\vskip 2.0pt\cr\mathbf{p}^{*}}\rightarrow 0,\qquad a_{g}^{\top}\pmatrix{\alpha\cr 1}-a_{g}^{\top}\tilde{T}_{c,g}(Q)\pmatrix{p_{\mathbf{0}}^{*}\vskip 2.0pt\cr\mathbf{p}^{*}}\rightarrow 0,

in probability as $N\rightarrow\infty$ . By the construction of $a_{*g}$ and $a_{g}$ , we have

	$\displaystyle a_{\ast g}^{\top}\tilde{T}_{c,g}(Q)\pmatrix{p_{\mathbf{0}}^{}\vskip 2.0pt\cr\mathbf{p}^{}}$	$\displaystyle=$	$\displaystyle B_{{c-g},Q}(I_{i}\wedge I_{i_{1}}\wedge\cdots\wedge I_{i_{l}})\mathbf{p}^{*},$
	$\displaystyle a_{g}^{\top}\tilde{T}_{c,g}(Q)\pmatrix{p_{\mathbf{0}}^{}\vskip 2.0pt\cr\mathbf{p}^{}}$	$\displaystyle=$	$\displaystyle B_{{c-g},Q}(I_{i_{1}}\wedge\cdots\wedge I_{i_{l}})\mathbf{p}^{*}.$

Thanks to (27), we have

\frac{a_{\ast g}^{\top}{\alpha\choose 1}}{a_{g}^{\top}{\alpha\choose 1}}\rightarrow c_{i}-g_{i}.

\upqed

∎

Combined estimator

Lastly, we combine $\bar{c}_{i}$ and $\tilde{c}_{i}$ . For each $Q$ , we write $c=(c^{*},c^{**})$ . For each $c_{i}$ in the sub-vector $c^{*}$ , (27) holds. Let $\bar{c}^{*}(Q,g)$ be defined in (29) (element by element). For $c^{**}$ , we let $\tilde{c}^{**}(Q,g)=\arg\inf_{c^{**}}S_{(\bar{c}^{*}(Q,g),c^{**}),g}(Q)$ . Finally, let $\hat{c}(Q,g)=(\bar{c}^{*}(Q,g),\tilde{c}^{**}(Q,g))$ . Furthermore, each element of $\hat{c}(Q,g)$ greater than one is set to be one and each element less than zero is set to be zero. Equivalently, we impose the constraint that $\hat{c}(Q,g)\in[0,1]^{m}$ .

4.2 Consistency result

Theorem 4.2.

Suppose that $g$ is known and the conditions in Theorem 3.1 hold. Let

\hat{Q}_{\hat{c}}(g)=\arg\inf_{Q^{\prime}}S_{\hat{c}(Q^{\prime},g),g}(Q^{\prime}),\qquad\tilde{\mathbf{p}}_{\hat{c}}(g)=\arg\inf_{\mathbf{p}}\bigl{|}T_{\hat{c}(\hat{Q},g),g}(\hat{Q}_{\hat{c}}(g))\mathbf{p}+p_{\mathbf{0}}\mathbf{g}-\alpha\bigr{|}.

The second optimization is subject to constraint that $\sum_{\mathbf{A}}p_{\mathbf{A}}=1$ . Then,

\lim_{N\rightarrow\infty}P\bigl{(}\hat{Q}_{\hat{c}}(g)\sim Q\bigr{)}=1.

Furthermore, if the estimator $\tilde{c}(Q,g)$ , defined in (26), is consistent, then by appropriately rearranging the columns of $\hat{Q}_{\hat{c}}(g)$ , for any $\varepsilon>0$ ,

\lim_{N\rightarrow\infty}P\bigl{(}|\tilde{\mathbf{p}}_{\hat{c}}(g)-\mathbf{p}^{*}|\leq\varepsilon\bigr{)}=1.

Remark 4.1.

The consistency of $\hat{Q}_{\hat{c}}(g)$ does not rely on the consistency of $\tilde{c}(Q,g)$ , which is mainly because of the central intermediate result in Proposition 6.6. The consistency of $\tilde{c}(Q,g)$ is a necessary condition for the consistency of $\tilde{\mathbf{p}}_{\hat{c}}(g)$ .

For most usual situations, $(\mathbf{p}^{*},c)$ is estimable based on the data given a correctly specified $Q$ -matrix. Nonetheless, there are some rare occasions in which nonidentifiability does exist. We provide one example, explained at the intuitive level, to illustrate that it is not always possible to consistently estimate $c$ and $\mathbf{p}^{*}$ . This example is simply to justify that the existence of the consistent estimator for $c$ in the above theorem is not an empty assumption. Consider a complete matrix $Q=\mathcal{I}_{k}$ . The degrees of freedom of a $k$ -way binary table is $2^{k}-1$ . On the other hand, the dimension of parameters $(\mathbf{p}^{*},c)$ is $2^{k}-1+k$ . Therefore, $\mathbf{p}^{*}$ and $c$ cannot be consistently identified without additional information. This problem is typically tackled by introducing addition parametric assumptions such as $\mathbf{p}^{*}$ satisfying certain functional form or in the Bayesian setting (weakly) informative prior distributions [6]. Given that the emphasis of this paper is the inference of $Q$ -matrix, we do not further investigate the identifiability of $(\mathbf{p}^{*},c)$ . Nonetheless, estimation for $(\mathbf{p}^{*},c)$ is definitely an important issue.

Remark 4.2.

Assuming that the guessing probability $g_{i}$ being known is somewhat strong. For complicated situations, such as for multiple choice problems the incorrect choices do not look “equally incorrect”, the guessing probability is typically not $1/n$ . In Theorem 4.2, we make this assumption mostly for technical reasons.

One can certainly provide the same treatment to the unknown guessing probabilities just as to the slipping probabilities by plugging in a consistent estimator of $g_{i}$ or profiling it out (like $\tilde{c}$ ). However, the rigorous establishment of the consistency results is certainly much more difficult and additional technical conditions may be needed. We leave the analysis of the problem with unknown guessing probability to the future study.

5 Discussion

This paper provides basic theoretical results of the estimation of $Q$ -matrix, a key element in modern cognitive diagnosis. Under the conjunctive model assumption, sufficient conditions are developed for the $Q$ -matrix to be identifiable up to an equivalence relation and the corresponding consistent estimators are constructed. The equivalence relation defines a natural partition of the space of $Q$ -matrices and may be viewed as the finest “resolution” that is possibly distinguishable based on the data, unless there is additional information about the specific meaning of each attribute. Our results provide the first steps for statistical inference about $Q$ -matrices by explicitly specifying the conditions under which two $Q$ -matrices lead to different response distributions. We believe that these results, especially the intermediate results in Section 6, can also be applied to general conjunctive models.

There are several directions along which further exploration may be pursued. First, some conditions may be modified to reflect practical circumstance. For instance, if the population is not fully diversified, meaning that certain attribute profiles may never exist, then condition C4 cannot be expected to hold. To ensure identifiability, we will need to impose certain structures on the $Q$ -matrix. In the addition-multiplication example of Section 2.3, if individuals capable of multiplication are also capable of addition, then we may need to impose the natural constraint that every item that requires multiplication should also require addition, which also implies that the $Q$ -matrix is never complete.

Second, when an a priori “expert’s” knowledge of the $Q$ -matrix is available, we may wish to incorporate such information into the estimation. This could be in the form of an additive penalty function attached to the objective function $S$ . Such information, if correct, not only improves estimation accuracy but also reduces the computational complexity – one can just perform a minimization of $S(Q)$ in a neighborhood around the expert’s $Q$ -matrix.

Third, throughout this paper we assume that the number of attributes (dimension) is known. In practice, it would be desirable to develop a data driven way to estimate the dimension, not only to deal with the situation of unknown dimension, but also to check if the assumed dimension is correct. One possible way to tackle the problem is to introduce a penalty function similar to that of BIC [24] which would give a consistent estimator of the $Q$ -matrix even if the dimension is unknown.

Fourth, one issue of both theoretical and practical importance is the inference of the parameters additional to the $Q$ -matrix, such as the slipping ( $s=1-c$ ), guessing ( $g$ ) parameters and the attribute distribution $\mathbf{p}^{*}$ . In the current paper, given that the main interesting parameter is the $Q$ -matrix, the estimations of $\mathbf{p}^{*}$ and $c$ are treated as by-product of the main results. On the other hand, given a known $Q$ , the identifiability and estimation of these parameters are important topics. In the previous discussion, we provided a few examples for potential identifiability issues. Further careful investigation is definitely of great importance and challenges.

Fifth, the rate of convergence of the estimator $\hat{Q}$ is not only of theoretical importance. From a practical point of few, it is crucial to study the rate of convergence as the scale of the problem becomes large in terms of the number of attributes and number of items.

Lastly, the optimization of $S(Q)$ over the space of $m\times k$ binary matrices is a nontrivial problem. It consists of evaluating the function $S$ $2^{m\times k}$ times. This is a substantial computational load if $m$ and $k$ are reasonably large. As mentioned previously, this computation might be reduced by additional information about the $Q$ -matrix or splitting the $Q$ -matrix into small sub-matrices. Nevertheless, it would be highly desirable to explore the structures of the $Q$ -matrix and the function $S$ so as to compute $\hat{Q}$ more efficiently.

6 Proofs of the theorems

6.1 Several propositions and lemmas

To make the discussion smooth, we postpone several long proofs to the Appendix.

Proposition 6.1.

Suppose that $Q$ is complete and matrix $T(Q)$ is saturated. Then, we are able to arrange the columns and rows of $Q$ and $T(Q)$ such that $T(Q)_{1:(2^{k}-1)}$ has full rank and $T(Q)$ has full column rank.

Proof.

Provided that $Q$ is complete, without loss of generality we assume that the $i$ th row vector of $Q$ is $e_{i}^{\top}$ for $i=1,\ldots,k$ , that is, item $i$ only requires attribute $i$ for each $i=1,\ldots,k$ . Let the first $2^{k}-1$ rows of $T(Q)$ be associated with $\{I_{1},\ldots,I_{k}\}$ . In particular, we let the first $k$ rows correspond to $I_{1},\ldots,I_{k}$ and the first $k$ columns of $T(Q)$ correspond to $\mathbf{A}$ ’s that only have one attribute. We further arrange the next $C^{k}_{2}$ rows of $T(Q)$ to correspond to combinations of two items, $I_{i}\wedge I_{j}$ , $i\neq j$ . The next $C^{k}_{2}$ columns of $T(Q)$ correspond to $\mathbf{A}$ ’s that only have two positive attributes. Similarly, we arrange $T(Q)$ for combinations of three, four, and up to $k$ items. Therefore, the first $2^{k}-1$ rows of $T(Q)$ admit a block upper triangle form. In addition, we are able to further arrange the columns within each block such that the diagonal matrices are identities, so that $T(Q)$ has form

\begin{array}[]{c}I_{1},I_{2},\ldots\\ I_{1}\wedge I_{2},I_{1}\wedge I_{3},\ldots\\ I_{1}\wedge I_{2}\wedge I_{3},\ldots\\ \vdots\end{array}\pmatrix{\mathcal{I}_{k}&\ast&\ast&\ast&\cdots&\cr 0&\mathcal{I}_{C_{2}^{k}}&\ast&\ast&&\cr 0&0&\mathcal{I}_{C_{3}^{k}}&\ast&&\cr\vdots&\vdots&\vdots&&&}.

(30)

Note that $T(Q)$ has $2^{k}-1$ columns and $T(Q)_{1:(2^{k}-1)}$ obviously has full rank, therefore $T(Q)$ has full column rank. ∎

From now on, we assume that $Q_{1:k}=\mathcal{I}_{k}$ and the first $2^{k}-1$ rows of $T(Q)$ are arranged in the order as in (30).

Proposition 6.2.

Suppose that $Q$ is complete, $T(Q)$ is saturated, and $c\ncong\mathbf{0}$ . Then, $T_{c}(Q)$ and $T_{c}(Q)_{1:(2^{k}-1)}$ have full column rank.

Proof.

By Proposition 6.1, (17) and the fact that $D_{c}$ is a diagonal matrix of full rank as long as $c\ncong\mathbf{0}$ ,

T_{c}(Q)=D_{c}T(Q),

is of full column rank. ∎

The following two propositions, which compare the column spaces of $T_{c}(Q)$ and $T_{c}(Q^{\prime})$ , are central to the proof of all the theorems. Their proofs are delayed to the Appendix.

The first proposition discusses the case where $Q^{\prime}_{1:k}$ is complete. We can always rearrange the columns of $Q^{\prime}$ so that $Q_{1:k}=Q^{\prime}_{1:k}$ . In addition, according to the proof of Proposition 6.1, the last column vector of $T_{c}(Q)$ corresponds to attribute $\mathbf{A}=(1,\ldots,1)^{\top}$ . Therefore, this column vector is all of nonzero entries.

Proposition 6.3.

Assume that $Q$ is a complete matrix and $T(Q)$ is saturated. Without loss of generality, let $Q_{1:k}=\mathcal{I}_{k}$ . Assume that the first $k$ rows of $Q^{\prime}$ form a complete matrix. Further, assume that $Q_{1:k}=Q^{\prime}_{1:k}=\mathcal{I}_{k}$ . If $Q^{\prime}\neq Q$ and $c\ncong\mathbf{0}$ , under the conditions in Theorem 4.2, $T_{c}(Q)\mbox{$\mathbf{p}$}^{*}$ is not in the column space $C(T_{c^{\prime}}(Q^{\prime}))$ for all $c^{\prime}\in\mathbb{R}^{m}$ .

The next proposition discusses the case where $Q^{\prime}_{1:k}$ is incomplete.

Proposition 6.4.

Assume that $Q$ is a complete matrix and $T(Q)$ is saturated. Without loss of generality, let $Q_{1:k}=\mathcal{I}_{k}$ . If $c\ncong\mathbf{0}$ and $Q^{\prime}_{1:k}$ is incomplete, under the conditions in Theorem 4.2, $T_{c}(Q)\mbox{$\mathbf{p}$}^{*}$ is not in the column space $C(T_{c^{\prime}}(Q^{\prime}))$ for all $c^{\prime}\in R^{m}$ .

The next result is a direct corollary of these two propositions. It follows by setting $c_{i}=1$ and $g_{i}=0$ for all $i=1,\ldots,m$ .

Corollary 6.5.

If $Q\nsim Q^{\prime}$ , under the conditions of Theorem 4.2, $T_{c}(Q)\mbox{$\mathbf{p}$}^{*}$ is not in the column space $C(T_{c^{\prime}}(Q^{\prime}))$ for all $c^{\prime}\in[0,1]^{m}$ .

To obtain a similar proposition for the cases where the $g_{i}$ ’s are nonzero, we will need to expand the $T_{c,g}(Q)$ as follows. As previously defined, let

\tilde{T}_{c,g}(Q)=\pmatrix{\mathbf{g}&T_{c,g}(Q)\cr 1&\mathbb{E}}.

(31)

The last row of $\tilde{T}_{c,g}(Q)$ consists entirely of ones. Vector $\mathbf{g}$ is defined as in (23).

Proposition 6.6.

Suppose that $Q$ is a complete matrix, $Q^{\prime}\nsim Q$ , $T$ is saturated and $c\ncong g$ . Let $\mbox{$\mathbf{p}$}^{*}_{0}=(p^{*}_{\mathbf{0}},(\mbox{$\mathbf{p}$}^{*})^{\top})^{\top}$ . Under the conditions of Theorem 4.2, $\tilde{T}_{c,g}(Q)\mbox{$\mathbf{p}$}^{*}_{0}$ is not in the column space $C(\tilde{T}_{c^{\prime},g}(Q^{\prime}))$ for all $c^{\prime}\in[0,1]^{m}$ . In addition, $\tilde{T}_{c,g}(Q)$ is of full column rank.

To prove Proposition 6.6, we will need the following lemma.

Lemma 6.7.

Consider two matrices $T_{1}$ and $T_{2}$ of the same dimension. If $T_{1}\mbox{$\mathbf{p}$}\in C(T_{2})$ , then for any matrix $D$ of appropriate dimension for multiplication, we have

DT_{1}\mbox{$\mathbf{p}$}\in C(DT_{2}).

Conversely, if for some $D$ , $DT_{1}\mbox{$\mathbf{p}$}$ does not belong to $C(DT_{2})$ , then $T_{1}\mbox{$\mathbf{p}$}$ does not belong to $C(T_{2})$ .

Proof.

Note that $DT_{i}$ is just a linear row transform of $T_{i}$ for $i=1,2$ . The conclusion is immediate by basic linear algebra. ∎

{pf*}

Proof of Proposition 6.6 Thanks to Lemma 6.7, we only need to find a matrix $D$ such that $D\tilde{T}_{c,g}(Q)\mbox{$\mathbf{p}$}^{*}_{0}$ does not belong to the column space of $D\tilde{T}_{c^{\prime},g}(Q^{\prime})$ for all $c^{\prime}\in[0,1]^{m}$ .

We define

	$\displaystyle{c-g}$	$\displaystyle=$	$\displaystyle(c_{1}-g_{1},\ldots,c_{m}-g_{m}),$
	$\displaystyle c^{\prime}-{g}$	$\displaystyle=$	$\displaystyle(c_{1}^{\prime}-g_{1},\ldots,c_{m}^{\prime}-g_{m}).$

We claim that there exists a matrix $D$ such that

D\tilde{T}_{c,g}(Q)=(0,T_{{c-g}}(Q))

and

D\tilde{T}_{c^{\prime},g}(Q^{\prime})=(0,T_{c^{\prime}-{g}}(Q^{\prime})),

where the choice of $D$ does not depends on $c$ or $c^{\prime}$ . In the rest of the proof, we construct such a $D$ -matrix for $\tilde{T}_{c,g}(Q)$ . The verification for $\tilde{T}_{c^{\prime},g}(Q^{\prime})$ is completely analogous. Note that each row in $D\tilde{T}_{c,g}(Q)$ is just a linear combination of rows of $\tilde{T}_{c,g}(Q)$ . Therefore, it suffices to show that every row vector of the form

\bigl{(}0,B_{{c-g},Q}(I_{i_{1}}\wedge\cdots\wedge I_{i_{l}})\bigr{)}

can be written as a linear combination of the row vectors of $\tilde{T}_{c,g}(Q)$ . We prove this by induction. First note that for each $1\leq i\leq m$ ,

(0,B_{{c-g},Q}(I_{i}))=(c_{i}-g_{i})(0,B_{Q}(I_{i}))=(g_{i},B_{c,g,Q}(I_{i}))-g_{i}\mathbb{E}.

(32)

Suppose that all rows of the form

\bigl{(}0,B_{{c-g},Q}(I_{i_{1}}\wedge\cdots\wedge I_{i_{l}})\bigr{)}

for all $1\leq l\leq j$ can be written as linear combinations of the row vectors of $\tilde{T}_{c,g}(Q)$ with coefficients only depending on $g_{1},\ldots,g_{m}$ . Thanks to (32), the case of $j=1$ holds. Suppose the statement holds for some general $j$ . We consider the case of $j+1$ . By definition,

	$\displaystyle\bigl{(}g_{i_{1}}\ldots g_{i_{j+1}},B_{c,g,Q}(I_{i_{1}}\wedge\cdots\wedge I_{i_{j+1}})\bigr{)}$	$\displaystyle=$	$\displaystyle\Upsilon_{h=1}^{j+1}(g_{i_{h}},B_{c,g,Q}(I_{i_{h}}))$
		$\displaystyle=$	$\displaystyle\Upsilon_{h=1}^{j+1}\bigl{(}g_{i_{h}}\mathbb{E}+(0,B_{{c-g},Q}(I_{i_{h}}))\bigr{)}.$

Let “ $\ast$ ” denote element-by-element multiplication. For every generic vector $V^{\prime}$ of appropriate length,

\mathbb{E}\ast V^{\prime}=V^{\prime}.

We expand the right-hand side of (6.1). The last term would be

\bigl{(}0,B_{{c-g},Q}(I_{i_{1}}\wedge\cdots\wedge I_{i_{j+1}})\bigr{)}=\Upsilon_{h=1}^{j+1}(0,B_{{c-g},Q}(I_{i_{h}})).

From the induction assumption and definition (18), the other terms on both sides of (6.1) belong to the row space of $\tilde{T}_{c,g}(Q)$ . Therefore, $(0,B_{{c-g},Q}(I_{i_{1}}\wedge\cdots\wedge I_{i_{j+1}}))$ is also in the row space of $\tilde{T}_{c,g}(Q)$ . In addition, all the corresponding coefficients only consist of $g_{i}$ . Therefore, one can construct a $(2^{m}-1)\times 2^{m}$ matrix $D$ such that

D\tilde{T}_{c,g}(Q)=(0,T_{{c-g}}(Q)).

Because $D$ is free of $c$ and $Q$ , we have

D\tilde{T}_{c^{\prime},g}(Q^{\prime})=(0,T_{c^{\prime}-g}(Q^{\prime})).

In addition, thanks to Propositions 6.3 and 6.4, $D\tilde{T}_{c,g}(Q)\mbox{$\mathbf{p}$}^{*}_{0}=T_{c-g}(Q)\mbox{$\mathbf{p}$}^{*}$ is not in the column space $C(T_{c^{\prime}-g}(Q^{\prime}))=C(D\tilde{T}_{c^{\prime},g}(Q^{\prime}))$ for all $c^{\prime}\in[0,1]^{m}$ . Therefore, by Lemma 6.7, $\tilde{T}_{c,g}(Q)\mbox{$\mathbf{p}$}^{*}_{0}$ is not in the column space $C(\tilde{T}_{c^{\prime},g}(Q^{\prime}))$ for all $c^{\prime}\in[0,1]^{m}$ .

In addition,

\pmatrix{D\cr e_{2^{m}}^{\top}}\tilde{T}_{c,g}(Q)

is of full column rank, where $e^{\top}_{2^{m}}$ is a $2^{m}$ dimension row vector with last element being one and rest being zero. Therefore, $\tilde{T}_{c,g}(Q)$ is also of full column rank.

6.2 Proof of the theorems

Using the results of the previous propositions and lemmas, we now proceed to prove our theorems.

{pf*}

Proof of Theorem 2.4 Consider $Q^{\prime}\nsim Q$ and $T(\cdot)$ saturated. Recall that $\hat{\mathbf{p}}$ is the vector containing $\hat{p}_{\mathbf{A}}$ ’s with $\mathbf{A}\ncong\mathbf{0}$ , where

\hat{p}_{\mathbf{A}}=\frac{1}{N}\sum_{r=1}^{N}\mathbf{1}(\mathbf{A}_{r}=\mathbf{A}).

For any $\mathbf{p}^{*}\succ\mathbf{0}$ , since $\hat{\mathbf{p}}\rightarrow\mathbf{p}^{*}$ almost surely, according to Corollary 6.5, $\alpha=T(Q)\hat{\mathbf{p}}$ by (5), and $T(Q)\mathbf{p}^{*}\notin C(T(Q^{\prime}))$ , there exists $\delta>0$ such that,

\lim_{N\rightarrow\infty}P\Bigl{(}\inf_{\mathbf{p}\in[0,1]^{2^{k}-1}}|T(Q^{\prime})\mathbf{p}-\alpha|>\delta\Bigr{)}=1

and

P\Bigl{(}\inf_{\mathbf{p}\in[0,1]^{2^{k}-1}}|T(Q)\mathbf{p}-\alpha|=0\Bigr{)}=1.

Given that there are finitely many $m\times k$ binary matrices, $P(\hat{Q}\sim Q)\rightarrow 1$ as $N\rightarrow\infty$ . In fact, we can arrange the columns of $\hat{Q}$ such that $P(\hat{Q}=Q)\rightarrow 1$ as $N\rightarrow\infty$ .

Note that $\hat{\mathbf{p}}$ satisfies the identity

T(Q)\hat{\mathbf{p}}=\alpha.

In addition, since $T(Q)$ is of full rank (Proposition 6.1), the solution to the above linear equation is unique. Therefore, the solution to the optimization problem $\inf_{\mathbf{p}}|T(Q)\mathbf{p}-\alpha|$ is unique and is $\hat{\mathbf{p}}$ . Notice that when $\hat{Q}=Q$ , $\tilde{\mathbf{p}}=\arg\inf_{\mathbf{p}}|T(\hat{Q})\mathbf{p}-\alpha|=\hat{\mathbf{p}}$ . Therefore,

\lim_{N\rightarrow\infty}P(\tilde{\mathbf{p}}=\hat{\mathbf{p}})=1.

Together with the consistency of $\hat{\mathbf{p}}$ , the conclusion of the theorem follows immediately.

{pf*}

Proof of Theorem 3.1 Note that for all $Q^{\prime}$

T_{c,g}(Q^{\prime})\mathbf{p}+p_{\mathbf{0}}\mathbf{g}-\alpha=(\mathbf{g},T_{c,g}(Q^{\prime}))\pmatrix{p_{\mathbf{0}}\cr\mathbf{p}}-\alpha.

By the law of large numbers,

|T_{c,g}(Q)\mathbf{p}^{\ast}+p_{\mathbf{0}}^{\ast}\mathbf{g}-\alpha|=\biggl{|}(\mathbf{g},T_{c,g}(Q))\pmatrix{p_{\mathbf{0}}^{\ast}\vskip 2.0pt\cr\mathbf{p}^{\ast}}-\alpha\biggr{|}\rightarrow 0

almost surely as $N\rightarrow\infty$ . Therefore,

S_{c,g}(Q)\rightarrow 0

almost surely as $N\rightarrow\infty$ .

For any $Q^{\prime}\nsim Q$ , note that

\pmatrix{\alpha\cr 1}\rightarrow\tilde{T}_{c,g}(Q)\pmatrix{p_{\mathbf{0}}^{\ast}\vskip 2.0pt\cr\mathbf{p}^{\ast}}.

According to Proposition 6.6 and the fact that $\mathbf{p}^{\ast}\succ\mathbf{0}$ , there exists $\delta(c^{\prime})>0$ such that $\delta(c^{\prime})$ is continuous in $c^{\prime}$ and

\inf_{\mathbf{p},p_{\mathbf{0}}}\biggl{|}\tilde{T}_{c^{\prime},g}(Q^{\prime})\pmatrix{p_{\mathbf{0}}\cr\mathbf{p}}-\tilde{T}_{c,g}(Q)\pmatrix{p_{\mathbf{0}}^{\ast}\vskip 2.0pt\cr\mathbf{p}^{\ast}}\biggr{|}>\delta(c^{\prime}).

By elementary calculus,

\delta\triangleq\inf_{c^{\prime}\in[0,1]^{m}}\delta(c^{\prime})>0

and

\inf_{c^{\prime},\mathbf{p},p_{\mathbf{0}}}\biggl{|}\tilde{T}_{c^{\prime},g}(Q^{\prime})\pmatrix{p_{\mathbf{0}}\cr\mathbf{p}}-\tilde{T}_{c,g}(Q)\pmatrix{p_{\mathbf{0}}^{\ast}\vskip 2.0pt\cr\mathbf{p}^{\ast}}\biggr{|}>\delta.

Therefore,

P\biggl{(}\inf_{c^{\prime},\mathbf{p},p_{\mathbf{0}}}\biggl{|}\tilde{T}_{c^{\prime},g}(Q^{\prime})\pmatrix{p_{\mathbf{0}}\cr\mathbf{p}}-\pmatrix{\alpha\cr 1}\biggr{|}>\delta/2\biggr{)}\rightarrow 1,

as $N\rightarrow\infty$ . For the same $\delta$ , we have

P\biggl{(}\inf_{c^{\prime},\mathbf{p},p_{\mathbf{0}}}\biggl{|}(\mathbf{g},T_{c^{\prime},g}(Q^{\prime}))\pmatrix{p_{\mathbf{0}}\cr\mathbf{p}}-\alpha\biggr{|}>\delta/2\biggr{)}=P\Bigl{(}\inf_{c^{\prime}}S_{c^{\prime},g}(Q^{\prime})>\delta/2\Bigr{)}\rightarrow 1.

The above minimization on the left of the equation is subject to the constraint that

\sum_{\mathbf{A}\in\{0,1\}^{k}}p_{\mathbf{A}}=1.

Together with the fact that there are only finitely many $m\times k$ binary matrices, we have

P\bigl{(}\hat{Q}(c,g)\sim Q\bigr{)}=1.

We arrange the columns of $\hat{Q}(c,g)$ so that $P(\hat{Q}(c,g)=Q)\rightarrow 1$ as $N\rightarrow\infty$ .

Now we proceed to the proof of consistency for $\tilde{\mathbf{p}}(c,g)$ . Note that

	$\displaystyle\biggl{\|}\tilde{T}_{c,g}(\hat{Q}(c,g))\pmatrix{\tilde{p}_{\mathbf{0}}(c,g)\cr\tilde{\mathbf{p}}(c,g)}-\pmatrix{\alpha\cr 1}\biggr{\|}$	$\displaystyle\mathop{\rightarrow}\limits^{p}$	$\displaystyle 0,$
	$\displaystyle\biggl{\|}\tilde{T}_{c,g}(Q)\pmatrix{p_{\mathbf{0}}^{\ast}\vskip 2.0pt\cr\mathbf{p}^{\ast}}-\pmatrix{\alpha\cr 1}\biggr{\|}$	$\displaystyle\mathop{\rightarrow}\limits^{p}$	$\displaystyle 0.$

Since $\tilde{T}_{c,g}(Q)$ is a full column rank matrix and $P(\hat{Q}(c,g)=Q)\rightarrow 1$ , $\tilde{\mathbf{p}}(c,g)\rightarrow\mathbf{p}^{\ast}$ in probability.

{pf*}

Proof of Theorem 4.2 Assuming $g$ is known, note that

\inf_{p_{\mathbf{0}},\mathbf{p}}\biggl{|}\tilde{T}_{c,g}(Q)\pmatrix{p_{\mathbf{0}}\cr\mathbf{p}}-\pmatrix{\alpha\cr 1}\biggr{|}

is a continuous function of $c$ . According to the results of Proposition 4.1, the definition in (26), and the definition of $\hat{c}$ in Section 4.1, we obtain that

\inf_{p_{\mathbf{0}},\mathbf{p}}\biggl{|}\tilde{T}_{\hat{c}(Q,g),g}(Q)\pmatrix{p_{\mathbf{0}}\cr\mathbf{p}}-\pmatrix{\alpha\cr 1}\biggr{|}\rightarrow 0,

in probability as $N\rightarrow\infty$ . In addition, thanks to Proposition 6.6 and with a similar argument as in the proof of Theorem 3.1, $\hat{Q}_{\hat{c}}(g)$ is a consistent estimator.

Furthermore, if $\tilde{c}(Q,g)$ is a consistent estimator, then $\hat{c}(Q,g)$ is also consistent. Then, the consistency of $\tilde{\mathbf{p}}_{\hat{c}}(g)$ follows from the facts that $\hat{Q}_{\hat{c}}(g)$ is consistent and $\tilde{T}_{\hat{c},g}(Q)$ is of full column rank.

Appendix: Technical proofs

{pf*}

Proof of Proposition 6.3 Note that $Q_{1:k}=Q_{1:k}^{\prime}=\mathcal{I}_{k}$ . Let $T(\cdot)$ be arranged as in (30). Then, $T(Q)_{1:(2^{k}-1)}=T(Q^{\prime})_{1:(2^{k}-1)}$ . Given that $Q\neq Q^{\prime}$ , we have $T(Q)\neq T(Q^{\prime})$ . We assume that $T(Q)_{li}\neq T(Q^{\prime})_{li}$ , where $T(Q)_{li}$ is the entry in the $l$ th row and $i$ th column. Since $T(Q)_{1:(2^{k}-1)}=T(Q^{\prime})_{1:(2^{k}-1)}$ , it is necessary that $l\geq 2^{k}$ .

Suppose that the $l$ th row of the $T(Q^{\prime})$ corresponds to an item that requires attributes $i_{1},\ldots,i_{l^{\prime}}$ . Then, we consider $1\leq h\leq 2^{k}-1$ , such that the $h$ th row of $T(Q^{\prime})$ is $B_{Q^{\prime}}(I_{i_{1}}\wedge\cdots\wedge I_{i_{l^{\prime}}})$ . Then, the $h$ th row vector and the $l$ th row vector of $T(Q^{\prime})$ are identical.

Since $T(Q)_{1:(2^{k}-1)}=T(Q^{\prime})_{1:(2^{k}-1)}$ , we have $T(Q)_{hj}=T(Q^{\prime})_{hj}=T(Q^{\prime})_{lj}$ for $j=1,\ldots,\allowbreak 2^{k}-1$ . If $T(Q)_{li}=0$ and $T(Q^{\prime})_{li}=1$ , the matrices $T(Q)$ and $T(Q^{\prime})$ look like

			$\displaystyle\begin{array}[]{ccccccccccccccccccc}&&&&&&&&&&&&&&&&&&\begin{array}[]{ccccc}&&&&\mbox{column }i\\ &&&&\downarrow\end{array}\end{array}$
	$\displaystyle T(Q^{\prime})$	$\displaystyle=$	$\displaystyle\begin{array}[]{c}\\ \\ \mbox{row }h\rightarrow\\ \\ \\[3.0pt] \mbox{row }l\rightarrow\end{array}\left(\begin{array}[]{c@{\quad}c@{\quad}c@{\quad}c@{\quad}c}\mathcal{I}&\ast&\ldots&\ast&\ldots\\ \vdots&\vdots&&\ldots&\ldots\\ \vdots&\vdots&\mathcal{I}&\ldots&\ldots\\ \vdots&\vdots&\vdots&&\\ \ast&1&\ast&&\\ \ast&\ast&\ast&&\end{array}\right)$

and

			$\displaystyle\begin{array}[]{ccccccccccccccccccc}&&&&&&&&&&&&&&&&&&\begin{array}[]{ccccc}&&&&\mbox{column }i\\ &&&&\downarrow\end{array}\end{array}$
	$\displaystyle T(Q)$	$\displaystyle=$	$\displaystyle\begin{array}[]{c}\\ \\ \mbox{row }h\rightarrow\\ \\ \\[3.0pt] \mbox{row }l\rightarrow\\ \end{array}\left(\begin{array}[]{c@{\quad}c@{\quad}c@{\quad}c@{\quad}c}\mathcal{I}&\ast&\ldots&\ast&\ldots\\ \vdots&\vdots&&\ldots&\ldots\\ \vdots&\vdots&\mathcal{I}&\ldots&\ldots\\ \vdots&\vdots&\vdots&&\\ \ast&0&\ast&&\\ \ast&\ast&\ast&&\end{array}\right).$

ase 2]
Case 1.

Either the $h$ th or $l$ th row vector of $T_{c^{\prime}}(Q^{\prime})$ is a zero vector. The conclusion is immediate because all the entries of $T_{c}(Q)\mbox{$\mathbf{p}$}^{*}$ are nonzero.

Case 2.

The $h$ th and $l$ th row vectors of $T_{c^{\prime}}(Q^{\prime})$ are nonzero vectors. Suppose that the $l$ th row corresponds to an item. There are three different situations: according to the true $Q$ -matrix (a) the item in row $l$ requires strictly more attributes than row $h$ , (b) the item in row $l$ requires strictly fewer attributes than row $h$ , (c) otherwise. We consider these three situations, respectively.

[(b)]
(a)

Under the true $Q$ -matrix, there are two types of sub-populations in consideration: people who are able to answer item(s) in row $h$ ( $p_{1}$ ) only and people who are able to answer items in both row $h$ and row $l$ ( $p_{2}$ ). Then, the sub-matrix of $T_{c}(Q)$ and $T_{c^{\prime}}(Q)$ are like

$T_{c}(Q)$

$p_{1}$ $p_{2}$

row $h$ $c_{h}$ $c_{h}$

row $l$ $0$ $c_{l}$

$T_{c^{\prime}}(Q^{\prime})$

$p_{1}$ $p_{2}$

row $h$ $c_{h}^{\prime}$ $c_{h}^{\prime}$

row $l$ $c_{l}^{\prime}$ $c_{l}^{\prime}$

We now claim that $c_{l}$ and $c_{l}^{\prime}$ must be equal (otherwise the conclusion hold) for the following reason.

Consider the following two rows of $T(Q)$ : row A corresponding to the combination that contains all the items; row B corresponding to the row that contains all the items except for the one in row $l$ .

Rows A and B are in fact identical in $T(Q)$ . This is because all the attributes are used at least twice (condition C5). Then, the attributes in row $l$ are also required by some other item(s) and rows A and B require the same combination of items. Thus, the corresponding entries of all the column vectors of $T_{c}(Q)$ are different by a factor of $c_{l}$ .

For $T(Q^{\prime})$ , rows A and B are also identical. This is because row $h$ and row $l$ have identical attribute requirements. Then, thus, the corresponding entries of all the column vectors of $T_{c^{\prime}}(Q)$ are different by a factor of $c_{l}^{\prime}$ . Thus, $c^{\prime}_{l}$ and $c_{l}$ must be identical otherwise $T_{c}(Q)\mbox{$\mathbf{p}$}^{*}$ is not in the column space of $T_{c^{\prime}}(Q)$ .

Similarly, we obtain that $c_{h}=c^{\prime}_{h}$ . Given that $c_{h}=c_{h}^{\prime}$ and $c_{l}=c_{l}^{\prime}$ , we now consider row $h$ and row $l$ . Notice that all the column vectors in $T_{c^{\prime}}(Q^{\prime})$ have their entries in row $h$ and row $l$ different by a factor of $c_{h}/c_{l}$ . On the other hand, the $h$ and $l$ th entries of $T_{c}(Q)\mbox{$\mathbf{p}$}^{*}$ are NOT different by a factor of $c_{h}/c_{l}$ as long as the proportion of $p_{1}$ is positive. Thereby, we conclude this case.
(b)

Consider the following two types of sub-populations: people who are able to answer item(s) in row $l$ ( $p_{1}$ ) only and people who are able to answer items in both row $h$ and row $l$ ( $p_{2}$ ). Similar to the analysis of (a), the sub-matrices look like:

$T_{c}(Q)$

$p_{1}$ $p_{2}$

row $h$ $0$ $c_{h}$

row $l$ $c_{l}$ $c_{l}$

$T_{c^{\prime}}(Q^{\prime})$

$p_{1}$ $p_{2}$

row $h$ $0$ $c_{h}^{\prime}$

row $l$ $0$ $c_{l}^{\prime}$

With exactly the same argument as in (a), we conclude that $c_{j}=c^{\prime}_{j}$ , $c_{h}=c^{\prime}_{h}$ , and further $T_{c}(Q)\mbox{$\mathbf{p}$}^{*}$ is not in the column space of $T_{c^{\prime}}(Q^{\prime})$ .

(c)

Consider the following three types of sub-populations: people who are able to answer item(s) in row $l$ only ( $p_{1}$ ), people who are able to answer item(s) in row $h$ only ( $p_{2}$ ), and people who are able to answer items in both row $h$ and row $l$ ( $p_{3}$ ). The sub-matrices look like:

$T_{c}(Q)$
	$p_{1}$	$p_{2}$	$p_{3}$
row $h$	$0$	$c_{h}$	$c_{h}$
row $l$	$c_{l}$	$0$	$c_{l}$
row $l\wedge h$	$0$	$0$	$c_{h}c_{l}$

$T_{c^{\prime}}(Q^{\prime})$
	$p_{1}$	$p_{2}$	$p_{3}$
row $h$	$0$	$c_{h}^{\prime}$	$c_{h}^{\prime}$
row $l$	$0$	$c_{l}^{\prime}$	$c_{l}^{\prime}$
row $l\wedge h$	$0$	$c_{h}^{\prime}c_{l}^{\prime}$	$c_{h}^{\prime}c_{l}^{\prime}$

With the same argument, we have that $c_{l}=c^{\prime}_{l}$ and $c_{h}=c^{\prime}_{h}$ . On considering row $h$ and row $l\wedge h$ , we conclude that $T_{c}(Q)\mbox{$\mathbf{p}$}^{*}$ is not in the column space of $T_{c^{\prime}}(Q^{\prime})$ . ∎

\noqed

{pf*}

Proof of Proposition 6.4 $T(\cdot)$ is arranged as in (30). Consider $Q^{\prime}$ such that $Q^{\prime}_{1:k}$ is incomplete. We discuss the following situations.

1.

There are two row vectors, say the $h$ th and $l$ th row vectors ( $1\leq i,j\leq k$ ), in $Q_{1:k}^{\prime}$ that are identical. Equivalently, two items require exactly the same attributes according to $Q^{\prime}$ . With exactly the same argument as in the previous proof, under condition C5, we have that $c_{h}=c^{\prime}_{h}$ and $c_{l}=c_{l}^{\prime}$ . We now consider the rows corresponding to $l$ and $l\wedge h$ . Note that the elements corresponding to row $l$ and row $l\wedge h$ for all the vectors in the column space of $T_{c^{\prime}}(Q^{\prime})$ are different by a factor of $c_{h}$ . However, the corresponding elements in the vector $T_{c}(Q)\mbox{$\mathbf{p}$}^{*}$ are NOT different by a factor of $c_{h}$ as long as the population is fully diversified.

No two row vectors in $Q_{1:k}^{\prime}$ are identical. Then, among the first $k$ rows of $Q^{\prime}$ there is at least one row vector containing two or more nonzero entries. That is, there exists $1\leq i\leq k$ such that

\sum_{j=1}^{k}Q_{ij}^{\prime}>1.

This is because if each of the first $k$ items requires only one attribute and $Q_{1:k}^{\prime}$ is not complete, there are at least two items that require the same attribute. Then, there are two identical row vectors in $Q_{1:k}^{\prime}$ and it belongs to the first situation. We define

a_{i}=\sum_{j=1}^{k}Q_{ij}^{\prime},

the number of attributes required by item $i$ according to $Q^{\prime}$ .

Without loss of generality, assume $a_{i}>1$ for $i=1,\ldots,n$ and $a_{i}=1$ for $i=n+1,\ldots,k$ . Equivalently, among the first $k$ items, only the first $n$ items require more than one attribute while the $(n+1)$ th through the $k$ th items require only one attribute each, all of which are distinct. Without loss of generality, we assume $Q_{ii}^{\prime}=1$ for $i=n+1,\ldots,k$ and $Q_{ij}=0$ for $i=n+1,\ldots,k$ and $i\neq j$ .

[(b)]
(a)

$n=1$ . Since $a_{1}>1$ , there exists an $l>1$ such that $Q^{\prime}_{1l}=1$ . We now consider rows $1$ and $l$ . With the same argument as before (i.e., the attribute required by row $l$ is also required by item 1 in $Q^{\prime}$ ), we have that $c_{l}=c_{l}^{\prime}$ (be careful that we cannot claim that $c_{1}=c_{1}^{\prime}$ ). We now consider the rows 1 and $1\wedge l$ . Note that in $T_{c^{\prime}}(Q^{\prime})$ these two rows are different by a factor of $c_{l}$ ; while the corresponding entries in $T_{c}(Q)\mbox{$\mathbf{p}$}^{*}$ are NOT different by a factor of $c_{l}$ . Thereby, we conclude the result in this situation.
(b)

$n>1$ and there exists $j>n$ and $i\leq n$ such that $Q^{\prime}_{ij}=1$ . The argument is identical to that in (2a).

(c)

$n>1$ and for each $j>n$ and $i\leq n$ , $Q^{\prime}_{ij}=0$ . Let the $i^{\ast}$ th row in $T(Q^{\prime})$ correspond to $I_{1}\wedge\cdots\wedge I_{n}$ . Let the $i_{h}^{\ast}$ th row in $T(Q^{\prime})$ correspond to $I_{1}\wedge\cdots\wedge I_{h-1}\wedge I_{h+1}\wedge\cdots\wedge I_{n}$ for $h=1,\ldots,n$ .

We claim that there exists an $h$ such that the $i^{\ast}$ th row and the $i_{h}^{\ast}$ th row are identical in $T(Q^{\prime})$ , that is

B_{Q^{\prime}}(I_{1}\wedge\cdots\wedge I_{h-1}\wedge I_{h+1}\wedge\cdots\wedge I_{n})=B_{Q^{\prime}}(I_{1}\wedge\cdots\wedge I_{n}).

(1)

If the above claim is true, then the attributes required by item $h$ have been required by some other items. Then, we conclude that $c_{h}$ and $c^{\prime}_{h}$ must be identical. In addition, rows in $T_{c^{\prime}}(Q^{\prime})$ corresponding to $I_{1}\wedge\cdots\wedge I_{h-1}\wedge I_{h+1}\wedge\cdots\wedge I_{n}$ and $I_{1}\wedge\cdots\wedge I_{n}$ are different by a factor of $c_{h}$ . On the other hand, the corresponding entries in $T_{c}(Q)\mbox{$\mathbf{p}$}^{*}$ are NOT different by a factor of $c_{h}$ . Then, we are able to conclude the results for all the cases.

In what follows, we prove the claim in (1) by contradiction. Suppose that there does not exist such an $h$ . This is equivalent to saying that for each $j\leq n$ there exists an $\alpha_{j}$ such that $Q^{\prime}_{j\alpha_{j}}=1$ and $Q^{\prime}_{i\alpha_{j}}=0$ for all $1\leq i\leq n$ and $i\neq j$ . Equivalently, for each $j\leq n$ , item $j$ requires at least one attribute that is not required by other first $n$ items. Consider

\mathcal{C}_{i}=\{j\mathchoice{\nobreak\,\colon}{\nobreak\,\colon}{\nobreak\,\colon\;}{\nobreak\,\colon\;}\mbox{there exists $i\leq i^{\prime}\leq n$ such that $Q^{\prime}_{i^{\prime}j}=1$}\}.

Let $\#(\cdot)$ denote the cardinality of a set. Since for each $i\leq n$ and $j>n$ , $Q^{\prime}_{ij}=0$ , we have that $\#(\mathcal{C}_{1})\leq n$ . Note that $Q^{\prime}_{1\alpha_{1}}=1$ and $Q^{\prime}_{i\alpha_{1}}=0$ for all $2\leq i\leq n$ . Therefore, $\alpha_{1}\in\mathcal{C}_{1}$ and $\alpha_{1}\notin\mathcal{C}_{2}$ . Therefore, $\#(\mathcal{C}_{2})\leq n-1$ . By a similar argument and induction, we have that $a_{n}=\#(\mathcal{C}_{n})\leq 1$ . This contradicts the fact that $a_{n}>1$ . Therefore, there exists an $h$ such that (1) is true. As for $T(Q)$ , we have that

B_{Q}(I_{1}\wedge\cdots\wedge I_{h-1}\wedge I_{h+1}\wedge\cdots\wedge I_{n})\neq B_{Q}(I_{1}\wedge\cdots\wedge I_{n}).

Summarizing the cases in 1, (2a), (2b) and (2c), we conclude the proof.

Acknowledgements

This research was supported in part by Grants NSF CMMI-1069064, SES-1123698, Institute of Education Sciences R305D100017 and NIH R37 GM047845.

References

[1] {barticle}[mr] \bauthor\bsnmChiu, \bfnmChia-Yi\binitsC.Y., \bauthor\bsnmDouglas, \bfnmJeffrey A.\binitsJ.A. &\bauthor\bsnmLi, \bfnmXiaodong\binitsX. (\byear2009). \btitleCluster analysis for cognitive diagnosis: Theory and applications. \bjournalPsychometrika \bvolume74 \bpages633–665. \biddoi=10.1007/s11336-009-9125-0, issn=0033-3123, mr=2565331 \bptokimsref \endbibitem
[2] {barticle}[author] \bauthor\bparticlede la \bsnmTorre, \bfnmJ.\binitsJ. (\byear2008). \btitleAn empirically-based method of $Q$ -matrix balidation for the DINA model: Development and applications. \bjournalJournal of Educational Measurement \bvolume45 \bpages343–362. \bptokimsref \endbibitem
[3] {bmisc}[author] \bauthor\bparticlede la \bsnmTorre, \bfnmJ.\binitsJ. (\byear2008). \bhowpublishedThe generalized DINA model. In International Meeting of the Psychometric Society, Durham, NH. \bptokimsref \endbibitem
[4] {barticle}[mr] \bauthor\bparticlede la \bsnmTorre, \bfnmJimmy\binitsJ. &\bauthor\bsnmDouglas, \bfnmJeffrey A.\binitsJ.A. (\byear2004). \btitleHigher-order latent trait models for cognitive diagnosis. \bjournalPsychometrika \bvolume69 \bpages333–353. \biddoi=10.1007/BF02295640, issn=0033-3123, mr=2272454 \bptokimsref \endbibitem
[5] {bincollection}[author] \bauthor\bsnmDiBello, \bfnmL. V.\binitsL.V., \bauthor\bsnmStout, \bfnmW. F.\binitsW.F. &\bauthor\bsnmRoussos, \bfnmL.\binitsL. (\byear1995). \btitleUnified cognitive psychometric assessment likelihood-based classification techniques. In \bbooktitleCognitively Diagnostic Assessment \bpages361–390. \baddressHillsdale, NJ: \bpublisherErlbaum. \bptokimsref \endbibitem
[6] {barticle}[mr] \bauthor\bsnmGelman, \bfnmAndrew\binitsA., \bauthor\bsnmJakulin, \bfnmAleks\binitsA., \bauthor\bsnmPittau, \bfnmMaria Grazia\binitsM.G. &\bauthor\bsnmSu, \bfnmYu-Sung\binitsY.S. (\byear2008). \btitleA weakly informative default prior distribution for logistic and other regression models. \bjournalAnn. Appl. Stat. \bvolume2 \bpages1360–1383. \biddoi=10.1214/08-AOAS191, issn=1932-6157, mr=2655663 \bptokimsref \endbibitem
[7] {bmisc}[author] \bauthor\bsnmHartz, \bfnmS. M.\binitsS.M. (\byear2002). \bhowpublishedA Bayesian framework for the unified model for assessing cognitive abilities: Blending theory with practicality. Ph.D. dissertation, Univ. Illinois, Urbana–Champaign. \bptokimsref \endbibitem
[8] {barticle}[mr] \bauthor\bsnmHenson, \bfnmRobert\binitsR. &\bauthor\bsnmDouglas, \bfnmJeff\binitsJ. (\byear2005). \btitleTest construction for cognitive diagnosis. \bjournalAppl. Psychol. Meas. \bvolume29 \bpages262–277. \biddoi=10.1177/0146621604272623, issn=0146-6216, mr=2142701 \bptokimsref \endbibitem
[9] {barticle}[mr] \bauthor\bsnmHenson, \bfnmRobert\binitsR., \bauthor\bsnmRoussos, \bfnmLouis\binitsL., \bauthor\bsnmDouglas, \bfnmJeff\binitsJ. &\bauthor\bsnmHe, \bfnmXuming\binitsX. (\byear2008). \btitleCognitive diagnostic attribute-level discrimination indices. \bjournalAppl. Psychol. Meas. \bvolume32 \bpages275–288. \biddoi=10.1177/0146621607302478, issn=0146-6216, mr=2432210 \bptokimsref \endbibitem
[10] {bmisc}[author] \bauthor\bsnmHenson, \bfnmR. A.\binitsR.A. &\bauthor\bsnmTemplin, \bfnmJ. L.\binitsJ.L. (\byear2005). \bhowpublishedHierarchical log-linear modeling of the skill joint distribution. Technical report, External Diagnostic Research Group. \bptokimsref \endbibitem
[11] {bmisc}[author] \bauthor\bsnmJunker, \bfnmB. W.\binitsB.W. (\byear1999). \bhowpublishedSome statistical models and computational methods that may be useful for cognitively-relevant assessment. Technical report. Available at http://www.stat.cmu.edu/~brian/nrc/cfa/documents/final.pdf. \bptokimsref \endbibitem
[12] {barticle}[mr] \bauthor\bsnmJunker, \bfnmBrian W.\binitsB.W. &\bauthor\bsnmSijtsma, \bfnmKlaas\binitsK. (\byear2001). \btitleCognitive assessment models with few assumptions, and connections with nonparametric item response theory. \bjournalAppl. Psychol. Meas. \bvolume25 \bpages258–272. \biddoi=10.1177/01466210122032064, issn=0146-6216, mr=1842982 \bptokimsref \endbibitem
[13] {barticle}[author] \bauthor\bsnmLeighton, \bfnmJ. P.\binitsJ.P., \bauthor\bsnmGierl, \bfnmM. J.\binitsM.J. &\bauthor\bsnmHunka, \bfnmS. M.\binitsS.M. (\byear2004). \btitleThe attribute hierarchy model for cognitive assessment: A variation on Tatsuoka’s rule-space approach. \bjournalJournal of Educational Measurement \bvolume41 \bpages205–237. \bptokimsref \endbibitem
[14] {bmisc}[author] \bauthor\bsnmLiu, \bfnmY\binitsY., \bauthor\bsnmDouglas, \bfnmJ. A.\binitsJ.A. &\bauthor\bsnmHenson, \bfnmR.\binitsR. (\byear2007). \bhowpublishedTesting person fit in cognitive diagnosis. In The Annual Meeting of the National Council on Measurement in Education (NCME), Chicago, IL. \bptokimsref \endbibitem
[15] {barticle}[mr] \bauthor\bsnmMaris, \bfnmE.\binitsE. (\byear1999). \btitleEstimating multiple classification latent class models. \bjournalPsychometrika \bvolume64 \bpages187–212. \biddoi=10.1007/BF02294535, issn=0033-3123, mr=1700708 \bptokimsref \endbibitem
[16] {bmisc}[author] \bauthor\bsnmReckase, \bfnmMark D.\binitsM.D. (\byear1990). \bhowpublishedUnidimensional data from multidimensional tests and multidimensional data from unidimensional tests. In The Annual Meeting of the American Educational Research Association, Boston, MA, April 16–20. \bptokimsref \endbibitem
[17] {bbook}[author] \bauthor\bsnmReckase, \bfnmMark D.\binitsM.D. (\byear2009). \btitleMultidimensional Item Response Theory. \baddressNew York: \bpublisherSpringer. \bptokimsref \endbibitem
[18] {bmisc}[author] \bauthor\bsnmRoussos, \bfnmL.\binitsL., \bauthor\bsnmDiBello, \bfnmL. V.\binitsL.V., \bauthor\bsnmStout, \bfnmW.\binitsW., \bauthor\bsnmHartz, \bfnmS.\binitsS., \bauthor\bsnmHenson, \bfnmR.\binitsR. &\bauthor\bsnmTemplin, \bfnmJ.\binitsJ. (\byear2007). \bhowpublishedThe fusion model skills diagnosis system. In Cognitively Diagnostic Assessment for Education: Theory and Practice (J. P. Leighton and M. J. Gierl, eds.) 275–318. New York: Cambridge Univ. Press. \bptokimsref \endbibitem
[19] {barticle}[author] \bauthor\bsnmRoussos, \bfnmLouis A.\binitsL.A., \bauthor\bsnmTemplin, \bfnmJonathan L.\binitsJ.L. &\bauthor\bsnmHenson, \bfnmRobert A.\binitsR.A. (\byear2007). \btitleSkills diagnosis using IRT-based latent class models. \bjournalJournal of Educational Measurement \bvolume44 \bpages293–311. \bptokimsref \endbibitem
[20] {barticle}[author] \bauthor\bsnmRupp, \bfnmA. A.\binitsA.A. (\byear2002). \btitleFeature selection for choosing and assembling measurement models: A building-block-based organization. \bjournalPsychometrika \bvolume2 \bpages311–360. \bptokimsref \endbibitem
[21] {barticle}[author] \bauthor\bsnmRupp, \bfnmA. A.\binitsA.A. &\bauthor\bsnmTemplin, \bfnmJ. L.\binitsJ.L. (\byear2008). \btitleEffects of $Q$ -matrix misspecification on parameter estimates and misclassification rates in the DINA model. \bjournalEducational and Psychological Measurement \bvolume68 \bpages78–98. \bptokimsref \endbibitem
[22] {barticle}[author] \bauthor\bsnmRupp, \bfnmA. A.\binitsA.A. &\bauthor\bsnmTemplin, \bfnmJ. L.\binitsJ.L. (\byear2008). \btitleUnique characteristics of diagnostic classification models: A comprehensive review of the current state-of-the-art. \bjournalMeasurement: Interdisciplinary Research and Perspective \bvolume6 \bpages219–262. \bptokimsref \endbibitem
[23] {bbook}[author] \bauthor\bsnmRupp, \bfnmA. A.\binitsA.A., \bauthor\bsnmTemplin, \bfnmJ. L.\binitsJ.L. &\bauthor\bsnmHenson, \bfnmR. A.\binitsR.A. (\byear2010). \btitleDiagnostic Measurement: Theory, Methods, and Applications. \baddressNew York: \bpublisherGuilford Press. \bptokimsref \endbibitem
[24] {barticle}[mr] \bauthor\bsnmSchwarz, \bfnmGideon\binitsG. (\byear1978). \btitleEstimating the dimension of a model. \bjournalAnn. Statist. \bvolume6 \bpages461–464. \bidissn=0090-5364, mr=0468014 \bptokimsref \endbibitem
[25] {barticle}[author] \bauthor\bsnmStout, \bfnmW.\binitsW. (\byear2007). \btitleSkills diagnosis using IRT-based continuous latent trait models. \bjournalJournal of Educational Measurement \bvolume44 \bpages313–324. \bptokimsref \endbibitem
[26] {barticle}[mr] \bauthor\bsnmTatsuoka, \bfnmCurtis\binitsC. (\byear2002). \btitleData analytic methods for latent partially ordered classification models. \bjournalJ. Roy. Statist. Soc. Ser. C \bvolume51 \bpages337–350. \biddoi=10.1111/1467-9876.00272, issn=0035-9254, mr=1920801 \bptokimsref \endbibitem
[27] {barticle}[author] \bauthor\bsnmTatsuoka, \bfnmK. K.\binitsK.K. (\byear1983). \btitleRule space: An approch for dealing with misconceptions based on item response theory. \bjournalJournal of Educational Measurement \bvolume20 \bpages345–354. \bptokimsref \endbibitem
[28] {barticle}[author] \bauthor\bsnmTatsuoka, \bfnmK. K.\binitsK.K. (\byear1985). \btitleA probabilistic model for diagnosing misconceptions in the pattern classification approach. \bjournalJournal of Educational Statistics \bvolume12 \bpages55–73. \bptokimsref \endbibitem
[29] {bbook}[author] \bauthor\bsnmTatsuoka, \bfnmK. K.\binitsK.K. (\byear2009). \btitleCognitive Assessment: An Introduction to the Rule Space Method. \baddressFlorence, KY: \bpublisherRoutledge. \bptokimsref \endbibitem
[30] {bmisc}[author] \bauthor\bsnmTemplin, \bfnmJ.\binitsJ., \bauthor\bsnmHe, \bfnmX.\binitsX., \bauthor\bsnmRoussos, \bfnmL.\binitsL. &\bauthor\bsnmStout, \bfnmW.\binitsW. (\byear2003). \bhowpublishedThe pseudo-item method: A simple technique for analysis of polytomous data with the fusion model. Technical report, External Diagnostic Research Group. \bptokimsref \endbibitem
[31] {bmisc}[author] \bauthor\bsnmTemplin, \bfnmJ. L.\binitsJ.L. (\byear2006). \bhowpublishedCDM: Cognitive diagnosis modeling with Mplus. Computer software. Available at http://jtemplin.coe.uga.edu/research/. \bptokimsref \endbibitem
[32] {barticle}[pbm] \bauthor\bsnmTemplin, \bfnmJonathan L.\binitsJ.L. &\bauthor\bsnmHenson, \bfnmRobert A.\binitsR.A. (\byear2006). \btitleMeasurement of psychological disorders using cognitive diagnosis models. \bjournalPsychol. Methods \bvolume11 \bpages287–305. \biddoi=10.1037/1082-989X.11.3.287, issn=1082-989X, pii=2006-11159-006, pmid=16953706 \bptokimsref \endbibitem
[33] {bmisc}[author] \bauthor\bparticlevon \bsnmDavier, \bfnmM.\binitsM. (\byear2005). \bhowpublishedA general diagnosis model applied to language testing data. Research report, Educational Testing Service. \bptokimsref \endbibitem

Theory of self-learning QQ-matrix

Abstract

doi:

keywords:

1 Introduction

2 Model specifications and basic results

2.1 Basic model specifications

2.2 Estimation of QQ-matrix

TT-matrix

α\alpha-vector

The estimator of the QQ-matrix

2.3 Example

2.4 Basic results

Definition 2.1.

Definition 2.2.

Remark 2.1.

Definition 2.3.

Remark 2.2.

Theorem 2.4.

Remark 2.3.

Remark 2.4.

Remark 2.5.

Remark 2.6.

Remark 2.7.

3 DINA model with known slipping and guessing parameters

3.1 Model specification

3.2 Estimation of the QQ-matrix and consistency results

Theorem 3.1.

Remark 3.1.

4 Extension to the situation with unknown slipping probabilities

4.1 Estimator of cc

A general estimator

A moment estimator

Proposition 4.1.

Proof.

Combined estimator

4.2 Consistency result

Theorem 4.2.

Remark 4.1.

Remark 4.2.

5 Discussion

6 Proofs of the theorems

6.1 Several propositions and lemmas

Proposition 6.1.

Proof.

Proposition 6.2.

Proof.

Proposition 6.3.

Proposition 6.4.

Corollary 6.5.

Proposition 6.6.

Lemma 6.7.

Proof.

6.2 Proof of the theorems

Appendix: Technical proofs

Acknowledgements

References

Theory of self-learning $Q$ -matrix

2.2 Estimation of $Q$ -matrix

$T$ -matrix

$\alpha$ -vector

The estimator of the $Q$ -matrix

3.2 Estimation of the $Q$ -matrix and consistency results

4.1 Estimator of $c$