Quasi-Akaike information criterion of structural equation modeling with latent variables for diffusion processes
Abstract.
We consider a model selection problem for structural equation modeling (SEM) with latent variables for diffusion processes based on high-frequency data. First, we propose the quasi-Akaike information criterion of the SEM and study the asymptotic properties. Next, we consider the situation where the set of competing models includes some misspecified parametric models. It is shown that the probability of choosing the misspecified models converges to zero. Furthermore, examples and simulation results are given.
Key words and phrases:
Structural equation modeling; Quasi-Akaike information criterion; Quasi-likelihood analysis; High-frequency data; Stochastic differential equation.1. Introduction
We consider a model selection problem for structural equation modeling (SEM) with latent variables for diffusion processes. First, we define the true model of the SEM. The stochastic processes and are defined by the factor models as follows:
(1.1) | ||||
(1.2) |
where and are and -dimensional observable vector processes, and are and -dimensional latent common factor vector processes, and are and -dimensional latent unique factor vector processes, respectively. and are constant loading matrices. Both and are not zero, , , and are fixed, and . Let . Suppose that , and satisfy the following stochastic differential equations:
(1.3) | ||||
(1.4) | ||||
(1.5) |
where , , , , , , , , and , and are , and -dimensional standard Wiener processes, respectively. Moreover, we express the relationship between and as follows:
(1.6) |
where is a constant loading matrix, whose diagonal elements are zero, and is a constant loading matrix. Define , where denotes the identity matrix of size . We assume that is a full column rank matrix and is non-singular. is a -dimensional latent unique factor vector process defined by the following stochastic differential equation:
(1.7) |
where , , and is an -dimensional standard Wiener process. Set , , and , where denotes the transpose. It is supposed that and are positive definite matrices, and , , and are independent standard Wiener processes on a stochastic basis with usual conditions . Let . Set as the variance of . If there is no misunderstanding, we simply write as . are discrete observations, where , , is fixed, and , , and are independent of . We consider the situation where as . We cannot estimate all the elements of , , , , , , and . Thus, some elements may be assumed to be zero to satisfy an identifiability condition; see, e.g., Everitt [6]. Note that these constraints and the number of factors and are determined from the theoretical viewpoint of each research field.
A model selection problem among the following parametric models is considered. We define the parametric model of Model as follows. Set as the parameter of Model , where is a convex compact space. It is assumed that has locally Lipschitz boundary; see, e.g., Adams and Fournier [1]. The stochastic processes and are defined as the following factor models:
(1.8) | ||||
(1.9) |
where and are and -dimensional observable vector processes, and are and -dimensional latent common factor vector processes, and are and -dimensional latent unique factor vector processes, respectively. and are constant loading matrices. Assume that , and satisfy the following stochastic differential equations:
(1.10) | ||||
(1.11) | ||||
(1.12) |
where , and . Furthermore, the relationship between and is expressed as follows:
(1.13) |
where is a constant loading matrix, whose diagonal elements are zero, and is a constant loading matrix. Set . It is supposed that is a full column rank matrix and is non-singular. is a -dimensional latent unique factor vector process defined by the following stochastic differential equation:
(1.14) |
where . Let , , and . It is assumed that and are positive definite matrices. Define . Set
as the variance of , where
It is supposed that there exists such that , and Model satisfies an identifiability condition.
Structural equation modeling (SEM) with latent variables is a method of analyzing the relationships between latent variables that cannot be observed; see, e.g., Jöreskog [10], Everitt [6], Mueller [15] and references therein. A researcher has often some candidate models in SEM. Note that the candidate models are usually specified to express different hypotheses. The goodness-of-fit test based on the likelihood ratio is widely used for model evaluation in SEM. Akaike [4] proposed the use of the Akaike information criterion (AIC) in a factor model. Using AIC in a factor model, we can choose the optimal number of factors in terms of prediction. AIC is also widely used in SEM to choose the optimal model as well as a factor model; see e.g., Huang [9].
Thanks to the development of measuring devices, high-frequency data such as stock prices can be easily obtained these days, so that many researchers have studied parametric estimation of diffusion processes based on high-frequency data; see, e.g., Yoshida [18], Genon-Catalot and Jacod [7], Kessler [11], Uchida and Yoshida [17] and references therein. Recently, in the field of financial econometrics, the factor model based on high-frequency data has been extensively studied. Aït-Sahalia and Xiu [3] proposed a continuous-time latent factor model for a high-dimensional model using principal component analysis. Kusano and Uchida [12] suggested classical factor analysis for diffusion processes. This method enables us to analyze the relationships between low-dimensional observed variables sampled with high frequency and latent variables. For instance, based on high-frequency stock price data, we can analyze latent variables such as a world market factor and factors related to a certain industry (Figure 1). On the other hand, there have been few researchers who examine the relationships between these latent variables based on high-frequency data. Kusano and Uchida [13] proposed SEM with latent variables for diffusion processes. Using this method, one can examine the relationships between latent variables based on high-frequency data. For example, if we want to study the relationship between the world market factor and the Japanese financial factor, this method enables us to analyze the relationship (Figure 2). SEM with latent variables may be referred as the regression analysis between latent variables. While both explanatory and objective variables are observable in regression analysis, both of them are latent in SEM with latent variables. For the regression analysis and the market models based on high-frequency data, see, e.g., Aït-Sahalia et al. [2].
The model selection problem for diffusion processes based on discrete observations has been actively studied. Uchida [16] proposed the contrast-based information criterion for ergodic diffusion processes, and obtained the asymptotic result of the difference between the contrast-based information criteria. Eguchi and Masuda [5] studied the model comparison problem for semiparametric Lvy driven SDE and suggested the Gaussian quasi-AIC. Since the information criterion is important in SEM as mentioned above, we propose the quasi-AIC (QAIC) of SEM with latent variables for diffusion processes and study the asymptotic properties. In this paper, we consider the non-ergodic case. For the ergodic case, see Appendix 6.3.
The paper is organized as follows. In Section 2, we introduce the notation and assumptions. In Section 3, the QAIC of SEM with latent variables for diffusion processes is considered. Moreover, the situation where the set of competing models includes some (not all) misspecified parametric models is studied. It is shown that the probability of choosing the misspecified models converges to zero. In Section 4, we give examples and simulation results. In Section 5, the results described in Section 3 are proved.



2. Notation and assumptions
First, we prepare the following notations and definitions. For any vector , , is the -th element of , and is the diagonal matrix, whose -th diagonal element is . For any matrix , , and is the -th element of . For matrices and of the same size, . For any matrix and vectors , we define . For a positive definite matrix , we write . For any symmetric matrix , , and are the vectorization of , the half-vectorization of and the duplication matrix respectively, where . Note that ; see, e.g., Harville [8]. For any matrix , stands for the Moore-Penrose inverse of . Set as the sets of all real-valued positive definite matrices. For any positive sequence , denotes the short notation for functions which satisfy for some . Let be the space of all functions satisfying the following conditions:
-
(i)
is continuously differentiable with respect to up to order .
-
(ii)
and all its derivatives are of polynomial growth in , i.e., is of polynomial growth in if .
The symbols and denote convergence in probability and convergence in distribution, respectively. For any process , . Set
denotes the expectation under . Next, we make the following assumptions.
-
[A]
-
(a)
-
(i)
There exists a constant such that
for any .
-
(ii)
For all , .
-
(iii)
.
-
(i)
-
(b)
-
(i)
There exists a constant such that
for any .
-
(ii)
For all , .
-
(iii)
.
-
(i)
-
(c)
-
(i)
There exists a constant such that
for any .
-
(ii)
For all , .
-
(iii)
.
-
(i)
-
(d)
-
(i)
There exists a constant such that
for any .
-
(ii)
For all , .
-
(iii)
.
-
(i)
-
(a)
Remark 1
For diffusion processes, is the standard assumption; see, e.g., Kessler [11].
3. Qaic of sem for diffusion processes
Using a locally Gaussian approximation, we obtain the following quasi-likelihood of Model from (1.8)-(1.14):
See Appendix 8.1 in Kusano and Uchida [14] for details of the quasi-likelihood. Define the quasi-likelihood based on the discrete observations as follows:
The quasi-maximum likelihood estimator is defined by
Set as an i.i.d. copy of . Let us consider the following Kullback-Leibler divergence between the transition density of the true model (1.1)-(1.7) and the quasi-likelihood :
where is the expectation under the law of . Our purpose is to know the model which minimizes . Since does not depend on the model, it is sufficient to consider the model which maximizes
(3.1) |
so that we need to estimate (3.1). Set
and
Moreover, the following assumptions are made.
-
[B1]
-
(a)
There exists a constant such that
for all .
-
(b)
.
-
(a)
Remark 2
By the following theorem, we obtain the asymptotically unbiased estimator of (3.1).
Theorem 1
Let . Under [A] and [B1], as ,
We define the quasi-Akaike information criterion as
(3.2) |
Since it holds from Theorem 1 that is the asymptotically unbiased estimator of
we select the optimal model among competing models by
(3.3) |
Remark 3
Since is not the exact likelihood but the quasi-likelihood, all the competing models are misspecified. Note that we consider a model selection problem among the quasi-likelihood models; see, e.g., Eguchi and Masuda [5].
Remark 4
Next, we consider the situation where the set of competing models includes some (not all) misspecified parametric models; that is, there exists such that
for all . Set
and . The optimal parameter is defined as
where
Note that for . Furthermore, we make the following assumption.
-
[B2]
.
implies that ; see, Lemma 36 in Kusano and Uchida [14]. The following asymptotic result of defined in (3.3) holds.
Theorem 2
Under [A] and [B2], as ,
Theorem 2 shows that the probability of choosing the misspecified models converges to zero as .
4. Simulation results
4.1. True model
The stochastic process is defined by the following factor model :
where is a six-dimensional observable vector process, is a two-dimensional latent common factor vector process, and is a six-dimensional latent unique factor vector process. The stochastic process is defined by the factor model as follows:
where is a two-dimensional observable vector process, is a one-dimensional latent common factor vector process, and is a two-dimensional latent unique factor vector process. Furthermore, the relationship between and is expressed as follows:
where is a one-dimensional latent unique factor vector process. It is supposed that is the two-dimensional OU process as follows:
where is a two-dimensional standard Wiener process. is defined by the six-dimensional OU process as follows:
where , , , and is a six-dimensional standard Wiener process. satisfies the following two-dimensional OU process:
where is a two-dimensional standard Wiener process. is defined by the following one-dimensional OU process:
where is a one-dimensional standard Wiener process. Figure 3 shows the path diagram of the true model at time .

4.2. Competing models
4.2.1. Model 1
Set the parameter as . Let , , and . Assume
and
where , , , , , and are not zero. It is supposed that and satisfy
and
where is not zero. Moreover, and are assumed to satisfy
and . Set
It holds that , so that Model is a correctly specified model. There exists a constant such that
(4.1) |
for all . For the proof of (4.1), see Appendix 6.2. Figure 4 shows the path diagram of Model at time .

4.2.2. Model 2
The parameter is defined as . Set , , and . Suppose
and
where , , , , , , and are not zero. and are assumed to satisfy
and
where is not zero. Furthermore, we suppose that and satisfy
and . Let
Since , Model is a correctly specified model. In a similar way to the proof of (4.1), we can prove that there exists a constant such that
for all . Figure 5 shows the path diagram of Model at time .

4.2.3. Model 3
Set the parameter as . Let , , and . Assume
and
where , , , , , and are not zero. We assume that and satisfy and
Moreover, it is supposed that and satisfy
and . For any , one has , so that Model is a misspecified model. Figure 6 shows the path diagram of Model at time .

4.3. Simulation results
In the simulation, we use optim() with the BFGS method in R language. The initial parameter is chosen as . The number of iterations is 10000. Set and consider the case where . Table 1 shows the number of models selected by QAIC. Since Model 3 is not selected, this simulation result implies that Theorem 2 seems to be correct in this example. Furthermore, we see from this result that QAIC does not have consistency. In other words, the over-fitted model (Model 2) is selected with significant probability. This result is natural since QAIC chooses the best model in terms of prediction.
Model 1 | 8394 | 8417 | 8461 | 8410 |
---|---|---|---|---|
Model 2 | 1606 | 1583 | 1539 | 1590 |
Model 3 | 0 | 0 | 0 | 0 |
5. proof
In this section, we may omit the model index “”, and we use instead of . Moreover, we simply write , , , , and as , , , , and , respectively. For any process and , we set . Without loss of generality, we suppose that . Set
for . Let
Set , where
Define the random field
Let be the random field as follows:
for , where
Set and . denotes the variance under the law of . Write and . Define as a -dimensional standard normal random variable.
Lemma 1
Under [A], as ,
and
Lemma 2
Under [A] and [B1], as ,
and
Lemma 3
Under [A], for all ,
(5.1) | |||
(5.2) | |||
(5.3) |
and
(5.4) |
Proof.
First, we will prove (5.1). Lemmas 14-15 in Kusano and Uchida [14] implies
(5.5) |
for , where
Since
it follows from Lemmas 16-18 in Kusano and Uchida [14] that
(5.6) |
for , where
Lemma 20 in Kusano and Uchida [14] shows
for all , so that
(5.7) |
Similarly, we see from Lemma 20 in Kusano and Uchida [14] that
(5.8) |
for any . Thus, it holds from (5.7) and (5.8) that for all ,
which yields (5.1). Using (5.5) and (5.7), one gets
(5.9) |
for all . In an analogous manner, (5.6) and (5.8) deduce
(5.10) |
for any . Consequently, we see from (5.9) and (5.10) that
for all , which yields (5.2). It follows from (5.5) that
(5.11) |
for any . In a similar way, (5.6) implies
(5.12) |
for all . Hence, it holds from (5.11) and (5.12) that
for all , so that (5.3) holds. Next, we consider (5.4). Since it follows from (5.5) and Lemma 21 in Kusano and Uchida [14] that
for , we see
for any . In an analogous manner, one has
for and , and
for , so that we get
and
for all . Therefore, it is shown that
for any , which yields (5.4). ∎
Lemma 4
Under [A], for all ,
for .
Proof of Lemma 4.
Note that
for . Since
we have
for , where
and
First, we will prove
(5.13) |
for any . Set
for , where
for . Since
one has
so that is a discrete-time martingale with respect to . Note that is the terminal value of :
Using the Burkholder inequality and
we have
for all , which yields
(5.14) |
Moreover, it follows from Lemma 3 and the Cauchy-Schwartz inequality that
for any , so that it holds from (5.14) that
which implies (5.13). Next, we will prove
(5.15) |
for all . In an analogous manner to the proof of Lemma 3, one has
(5.16) |
for all . Lemma 3 and (5.16) show
for any . Since as , we obtain (5.15). Consequently, for all , it holds from (5.13) and (5.15) that
Therefore, it is shown that for all ,
for . ∎
Lemma 5
Under [A], for all and ,
for .
Proof of Lemma 5.
Note that
and
for . Since
we have
so that a decomposition is given by
where
and
Let
for , where
for . In a similar way to the proof of Lemma 4, is a discrete-time martingale with respect to , and is the terminal value of :
In a similar way to the proof of Lemma 4, it follows from the Burkholder inequality that
for all , which yields
For any , it is shown that
in an analogous manner to the proof of Lemma 4, which deduces
Since , we have as , so that
(5.17) |
for all . Furthermore, we see from Lemma 3 and (5.16) that for all ,
and as , which implies
(5.18) |
Hence, it holds from (5.17) and (5.18) that
for all . Therefore, for all , we obtain
for . ∎
Lemma 6
Under [A], for all and ,
Proof of Lemma 6.
Since
and
one has a decomposition
where
and
In an analogous manner to Lemma 5, one has
and
for all . Consequently, it holds from the Sobolev inequality that
for any , which yields
(5.19) |
In a similar way, it is shown that
(5.20) |
for all . Thus, we see from (5.19) and (5.20) that
for any . Therefore, one gets
for all and . ∎
Lemma 7
Under [A], for all ,
for .
Proof of Lemma 7.
Lemma 8
Under [A] and [B1], for all , there exists such that
for all and .
Proof.
It is enough to check the regularity conditions [A1′′], [A4′], [A6], [B1] and [B2] of Theorem 3 (c) in Yoshida [19]. It is supposed that , , , and satisfy [A4′]:
where . For example, we can take , , , and . For any , it follows from Lemmas 4 and 6 that
and
where and , which satisfies [A6]. Furthermore, we see from Lemmas 5 and 7 that
and
for all , where and . Hence, [A1′′] is satisfied. It follows from Lemma 35 in Kusano and Uchida [14] and [B1] (b) that is a positive definite matrix, which satisfies [B1]. Moreover, [B1] (a) yields [B2]. ∎
denotes the expectation under the probability measure on the probability space on which is realized.
Lemma 9
Under [A] and [B1], for all ,
and for ,
as .
Proof.
Proof of Theorem 1.
Let us consider the following decomposition:
where
First of all, we will prove
(5.24) |
as . Using the Taylor expansion, one has
where and
First, we consider the expectation of . Set
Note that as . By using the Taylor expansion, one gets
for on , so that we have
(5.25) |
where
Let
Since
it holds from Lemmas 5, 7 and 9 that
(5.26) |
Consequently, we see from (5.26) and Lemma 9 that
as , which yields
(5.27) |
as . Set
Using Lemmas 4 and 9, we obtain
which implies
(5.28) |
It follows from (5.28) and Lemma 9 that
as , so that one gets
(5.29) |
as . Hence, it holds from (5.27) and (5.29) that
(5.30) |
as . Let
for . Since , we see from Lemma 9 that
(5.31) |
as . Therefore, (5.25), (5.30) and (5.31) show
(5.32) |
as . Next, the expectation of is considered. Note that
where
for . By using Lemmas 5 and 9, it is shown that
as . Thus, it follows from (5.31) that
(5.33) |
as . Note that
where
for . It holds from Lemmas 7 and 9 that
as , which yields
(5.34) |
as . Hence, (5.32), (5.33) and (5.34) show (5.24). Next, we will prove
(5.35) |
as . By using the Taylor expansion, one gets
where
Since it holds from Lemmas 1, 4 and 9 that
and
as , we have
(5.36) |
as . Note that
Let
for . Lemmas 1, 4 and 9 deduce
and
as , which implies
(5.37) |
as . Moreover, we note that
where
for . Since
it follows from Lemmas 7 and 9 that
as , so that one has
(5.38) |
as . Consequently, we see from (5.36), (5.37) and (5.38) that
as , which yields (5.35). Furthermore, we have
(5.39) |
since and have the same distribution. Therefore, it holds from (5.24), (5.35) and (5.39) that
as . ∎
Lemma 10
Under [A] and [B2], as ,
Proof of Lemma 10.
Proof of Theorem 2.
Fix . From the definition of , one has
(5.40) |
For all , it follows from Lemma 10 that
(5.41) |
as , where
Define the function :
Note that has the unique maximum point at . Since
we obtain
for any , which yields . Consequently, it holds from (5.41) that for all ,
as , which implies
(5.42) |
as . Therefore, we see from (5.40) and (5.42) that
as . ∎
References
- [1] Adams, R. A. and Fournier, J. J. (2003). Sobolev spaces. Elsevier.
- [2] Aït-Sahalia, Y., Kalnina, I. and Xiu, D. (2020). High-frequency factor models and regressions. Journal of Econometrics, 216(1), 86-105.
- [3] Aït-Sahalia, Y.and Xiu, D. (2017). Using principal component analysis to estimate a high dimensional factor model with high-frequency data. Journal of Econometrics, 201(2), 384-399.
- [4] Akaike, H. (1987). Factor analysis and AIC. Psychometrika, 52, 317-332.
- [5] Eguchi, S. and Masuda, H. (2023). Gaussian quasi-information criteria for ergodic Lévy driven SDE. Annals of the Institute of Statistical Mathematics, 1-47.
- [6] Everitt, B. (1984) An introduction to latent variable models, Springer Science & Business Media
- [7] Genon-Catalot, V. and Jacod, J. (1993). On the estimation of the diffusion coefficient for multidimensional diffusion processes. Annales de l’Institut Henri Poincaré (B) Probabilités et Statistiques,29, 119-151.
- [8] Harville, D. A. (1998). Matrix algebra from a statistician’s perspective. Taylor & Francis.
- [9] Huang, P. H. (2017). Asymptotics of AIC, BIC, and RMSEA for model selection in structural equation modeling. Psychometrika, 82(2), 407-426.
- [10] Jöreskog, K. G. (1970). A general method for analysis of covariance structures. Biometrika, 57(2), 239-251.
- [11] Kessler, M. (1997). Estimation of an ergodic diffusion from discrete observations. Scandinavian Journal of Statistics, 24(2), 211-229.
- [12] Kusano, S., and Uchida, M. (2023). Statistical inference in factor analysis for diffusion processes from discrete observations. Journal of Statistical Planning and Inference, (Version of Record). DOI: https://doi.org/10.1016/j.jspi.2023.07.009
- [13] Kusano, S., and Uchida, M. (2023). Sparse inference of structural equation modeling with latent variables for diffusion processes. Japanese Journal of Statistics and Data Science, (Version of Record). DOI: https://doi.org/10.1007/s42081-023-00230-1
- [14] Kusano, S., and Uchida, M. (2023). Structural equation modeling with latent variables for diffusion processes and its application to sparse estimation. arXiv preprint arXiv:2305.02655v2.
- [15] Mueller, R. O. (1999). Basic principles of structural equation modeling: An introduction to LISREL and EQS. Springer Science & Business Media.
- [16] Uchida, M. (2010). Contrast-based information criterion for ergodic diffusion processes from discrete observations. Annals of the Institute of Statistical Mathematics, 62, 161-187.
- [17] Uchida, M. and Yoshida, N. (2012). Adaptive estimation of an ergodic diffusion process based on sampled data. Stochastic Processes and their Applications, 122(8), 2885-2924.
- [18] Yoshida, N. (1992). Estimation for diffusion processes from discrete observation. Journal of Multivariate Analysis, 41, 220–242.
- [19] Yoshida, N. (2011). Polynomial type large deviation inequalities and quasi-likelihood analysis for stochastic differential equations. Annals of the Institute of Statistical Mathematics, 63(3), 431-479.
6. Appendix
6.1. Proofs of Lemmas
Proof of Lemma 1.
Proof of Lemma 2.
[B1] deduces
For all , there exists such that
In an analogous manner to Lemma 33 in Kusano and Uchida [14], we obtain
Since it holds from the definition of that
we see
as , which yields
(6.1) |
as . Using the Taylor expansion, we have
where . Note that
on and as , where
In a similar manner to Theorem 2 in Kusano and Uchida [13], it holds from Lemma 2 and (6.1) that
as . Therefore, we see from Lemma 2 that
as . ∎
6.2. Proof of (4.1)
Proof.
In an analogous manner to Appendix 6.2 in Kusano and Uchida [14], it is shown that
(6.2) |
For , we define
Note that has the unique minimum point at . Since
it holds from (6.2) that has the unique maximum point at , which yields
(6.3) |
for all . The Taylor expansion of around is given by
(6.4) |
where . In a similar way to Theorem 2 in Kusano and Uchida [13], it is shown that
as , which deduces
as . Hence, for all , there exists such that
so that we see from (6.4) that
for , where
Note that it holds from the proof of Lemma 1 that . Since is arbitrary, one has
as for . Recalling that is a positive definite matrix, we have
where and are the minimum and maximum eigenvalues of . There exists such that
(6.5) |
for . Let
Since
we see from (6.3) that
for , so that there exists such that
(6.6) |
for . Therefore, it follows from (6.5) and (6.6) that
for , where . ∎
6.3. Ergodic case
In this section, we consider the ergodic case. The following assumptions are made.
-
[C]
-
(a)
The diffusion process is ergodic with its invariant measure . For any -integrable function , it holds that
as .
-
(b)
The diffusion process is ergodic with its invariant measure . For any -integrable function , it holds that
as .
-
(c)
The diffusion process is ergodic with its invariant measure . For any -integrable function , it holds that
as .
-
(d)
The diffusion process is ergodic with its invariant measure . For any -integrable function , it holds that
as .
-
(a)
In the ergodic case, we have the following results similar to the non-ergodic case.
Theorem 3
Let . Under [A], [B1] and [C], as , and ,
Theorem 4
Under [A], [B2] and [C], as , and ,
Proofs of Theorems 3-4.
Since and , we can prove the results in the same way as the proofs of Theorems 1-2. ∎