On computing HITS ExpertRank via lumping the hub matrix ¹¹1This work was supported by National Natural Science Foundation of China (Grant Nos. 12001363, 71671125 ) .

Yongxin Dong Yuehua Feng yhfeng@sues.edu.cn Jianxin You School of Economics and Management, Tongji University, Shanghai 200092, China School of Mathematics, Physics and Statistics, Shanghai University of Engineering Science, Shanghai 201620, China

Abstract

The dangling nodes is the nodes with no out-links in the web graph. It saves many computational cost and operations provided the dangling nodes are lumped into one node. In this paper, motivated by so many dangling nodes in web graph, we develop theoretical results for HITS by the lumping method. We mainly have three findings. First, the HITS model can be lumped although the matrix involved is not stochastic. Second, the hub vector of the nondangling nodes can be computed separately from dangling nodes, but not vice versa. Third, the authoritative vector of the nondangling nodes is difficult to compute separately from dangling nodes. Therefore, it is better to compute hub vector of the hub matrix in priority, not authoritative vector of the authoritative matrix or them simultaneous.

keywords:

HITS, Lumping, Nondangling nodes, Dangling nodes, Similarity transformation

MSC:

[2020] 65F10, 65F50, 15A18, 15A21, 68P20

^†^†journal: XXX

1 Introduction

The PageRank of Page and Brin, Hyperlink-Induced Topic Search (HITS) of Kleinberg, stochastic approach for link-structure analysis (SALSA) of Lempel and Moran use dominant eigenvector of non-negative matrices for ranking web page purpose [1, 2, 3]. PageRank is used in Google, HITS is used in the ask.com search engine, and SALSA is a combination of PageRank and HITS [3, 4]. Since the late 20th century, HITS is an another extremely successful modern web information retrieval application of dominant eigenvector. The resulting two ranking vectors (authority vector and hub vector) from HITS provide the ExpertRanks. The HITS method has broad applications such as product quality ranking, similarity ranking and so on. For discussions on HITS, together with the literature on modifications to overcome its weaknesses, we refer readers to [5].

The eigenproblems related to web information retrieval and data mining can be of huge dimension. Because of memory constraint of computer, the power method has become the dominant method for solving the HITS and PageRank eigenproblems [4, 5]. As the web is very enormous, the web may contain more than $10^{9}$ pages and increases quickly and dynamically. It can even take much time (several hours or days) to compute a large web ranking vector. For the search-dependent HITS, the computation problem which only involves users’ query related nodes is relatively small, that is why there is relatively less acceleration work on HITS. While for the search independent HITS, the matrices involved are usually of tremendous dimension and effective numerical accelerations are very desirable [6].

As we know that the power method will be lack of efficiency when the eigen gap between the largest eigenvalue ( $\lambda_{1}$ ) and the second largest eigenvalue ( $\lambda_{2}$ in magnitude), is close enough to 1. The famous Krylov subspace methods can converge faster than the power method, but they are not suitable for such web information problem due to relatively large storage and subspace dimension and so on [4, 5, 7]. Therefore, many acceleration methods for information retrieval model calculations include the aggregation methods [5], extrapolation methods [8, 9], and two stage acceleration methods [10, 11] and other contributions [4, 7] are developed. Among them, most methods consider the difficult computation case that the gap-ratio $\frac{|\lambda_{2}|}{|\lambda_{1}|}$ approximates 1. By lumping Google matrix, Ipsen and Selee analyzed the relationship between rankings of nondangling nodes and rankings of dangling nodes [12]. To improve the computational efficiency of HITS ExpertRank, the filtered power method combining Chebyshev polynomials is proposed[6]. For more theoretical and numerical results on the web information retrieval model are available, see [4, 5, 7, 9, 13]. One may raise a question that whether the HITS model can still have similar lumping results, thus the computation cost can be reduced even if the matrix involved is not stochastic?

The rest of this paper is organized as follows. Section 2 describes the HITS model briefly. Section 3 derives the main approach and theorems for lumping the HITS model. Finally, we give a short conclusion in section 4.

2 HITS model

The HITS model uses an adjacency matrix $L\in\mathbb{R}^{n\times n}$ to describe web link structure graph. The eigenvetor of $L^{T}L$ or $LL^{T}$ are employed to reveal the relative importance (rank) of corresponding web pages’ authoritative vectors or hub vectors. Kleinberg [1] invented new matrices defined by

L^{T}L\in\mathbb{R}^{n\times n},\quad LL^{T}\in\mathbb{R}^{n\times n},

(1)

respectively, where $L$ is an adjacent matrix given by

L_{ij}=\left\{\begin{array}[]{l}1,~{}~{}\mbox{if page $i$ links to page $j$,}\\ 0,~{}~{}\rm{otherwise}.\\ \end{array}\right.

In (1), if web page $i$ has no outlinks (i.e., image files, pdf with on links to other pages), it is called a dangling node; otherwise it is called a nondangling node.

The HITS method updates $v_{a}$ and $v_{h}$ iteratively from some initial vectors $v_{a}^{(0)}$ and $v_{h}^{(0)}$ ,

\displaystyle v_{a}^{(k)}=L^{T}v_{h}^{(k-1)},\quad v_{h}^{(k)}=Lv_{a}^{(k)},\quad k=1,2,\cdots.

(2)

Once one of $v_{a}$ and $v_{h}$ is convergent, the other vector is solved by multiplying $L$ or $L^{T}$ . From (2), we have the following expression. The authoritative vector $\pi_{a}$ of the authoritative matrix $L^{T}L$ and the hub vector $\pi_{h}$ of the hub matrix are defined by computing the principle eigenvector of $L^{T}L$ and $LL^{T}$

\displaystyle\lambda_{\max}\pi_{a}=L^{T}L\pi_{a},~{}\lambda_{\max}\pi_{h}=LL^{T}\pi_{h},~{}\mbox{where}~{}\pi_{a}\geq 0,~{}\pi_{h}\geq 0,~{}\|\pi_{a}\|_{1}=1,~{}\|\pi_{h}\|=1,

(3)

respectively. The matrix $L^{T}L$ or $LL^{T}$ is a symmetric positive semi-definite matrix, thus it has nonnegative eigenvalues. ExpertRanks is provided by the authoritative and hub vectors from HITS.

The expression (3) can’t guarantee the uniqueness of $\lambda_{\max}$ , $\pi_{a}$ or $\pi_{h}$ . To make the uniqueness, we modify $L^{T}L$ and $LL^{T}$ such that they are primitive matrices. The authoritative matrix $A$ and hub matrix $H$ are defined by

\displaystyle A=\xi L^{T}L+\frac{(1-\xi)}{n}ee^{T},~{}H=\xi LL^{T}+\frac{(1-\xi)}{n}ee^{T},~{}\mbox{where}~{}0<\xi<1,~{}e=\begin{bmatrix}1&\cdots&1\end{bmatrix}^{T},

(4)

respectively. Accordingly, the authoritative and hub vector are defined by

\displaystyle\pi_{a}^{T}A=\pi_{a}^{T},\quad\pi_{h}^{T}H=\pi_{h}^{T},\quad\mbox{where}\quad\pi_{a}\geq 0,~{}\pi_{h}\geq 0,~{}\pi_{a}^{T}e=1,~{}\pi_{h}^{T}e=1,

(5)

respective, where $e$ is a suitable length vector of all ones. In this paper, we thus mainly discuss the computation problem (5). In the following section, we try to make theoretical and practical contributions for computing purposes.

3 Lumping and related theorems

The adjacency matrix is lumpable if all dangling nodes are lumped into a single node [10, 12]. According to (1), the adjacency matrix admits the structure

\displaystyle PLP^{T}=\begin{bmatrix}L_{11}&L_{12}\\ 0&0\end{bmatrix}\quad\mbox{and}\quad PL^{T}P^{T}=\begin{bmatrix}L_{11}^{T}&0\\ L_{12}^{T}&0\end{bmatrix},

(6)

where $P$ is a suitable permutation matrix, $L_{11}\in\mathbb{R}^{k\times k}$ , $L_{12}\in\mathbb{R}^{k\times(n-k)}$ and $k$ is the number of nondangling nodes. Then we have

	$\displaystyle PLL^{T}P^{T}=PLP^{T}PL^{T}P^{T}=$	$\displaystyle\begin{bmatrix}L_{11}&L_{12}\\ 0&0\end{bmatrix}\begin{bmatrix}L_{11}^{T}&0\\ L_{12}^{T}&0\end{bmatrix}$
	$\displaystyle=$	$\displaystyle\begin{bmatrix}L_{11}L_{11}^{T}+L_{12}L_{12}^{T}&0\\ 0&0\end{bmatrix}.$		(7)

After the primitive modification, we obtain

$\displaystyle PHP^{T}=$	$\displaystyle P\left(\xi LL^{T}+\frac{1-\xi}{n}ee^{T}\right)P^{T}$
$\displaystyle=$	$\displaystyle\xi\begin{bmatrix}L_{11}L_{11}^{T}+L_{12}L_{12}^{T}&0\\ 0&0\end{bmatrix}+\frac{1-\xi}{n}ee^{T}$
$\displaystyle=$	$\displaystyle\begin{bmatrix}\xi(L_{11}L_{11}^{T}+L_{12}L_{12}^{T})+\frac{1-\xi}{n}ee^{T}&\frac{1-\xi}{n}ee^{T}\\ \frac{1-\xi}{n}ee^{T}&\frac{1-\xi}{n}ee^{T}\end{bmatrix},$	(8)

where $e$ is a suitable length vector of all ones. By using the lumping method and the similarity transformation matrix

\displaystyle Y=I_{n-k}-\widehat{e}e_{1}^{T},\quad\mbox{with}\quad\widehat{e}=e-e_{1}=\begin{bmatrix}0,1,\cdots,1\end{bmatrix}^{T}\in\mathbb{R}^{n-k},

where $I_{n-k}=\begin{bmatrix}e_{1}\cdots e_{n-k}\end{bmatrix}$ denotes the identity matrix of order $n-k$ , and $e_{i}(i=1,2,\cdots,n-k)$ is its $i-$ th column vector [13]. Define

\displaystyle X=\begin{bmatrix}I_{k}&0\\ 0&Y\end{bmatrix},

(9)

then we have

	$\displaystyle XPHP^{T}X^{-1}$
$\displaystyle=$	$\displaystyle\begin{bmatrix}\xi(L_{11}L_{11}^{T}+L_{12}L_{12}^{T})+\frac{1-\xi}{n}ee^{T}&\frac{1-\xi}{n}ee^{T}Y^{-1}\\ \frac{1-\xi}{n}Yee^{T}&\frac{1-\xi}{n}Yee^{T}Y^{-1}\end{bmatrix}$
$\displaystyle=$	$\displaystyle\begin{bmatrix}\xi(L_{11}L_{11}^{T}+L_{12}L_{12}^{T})+\frac{1-\xi}{n}ee^{T}&\frac{1-\xi}{n}ee^{T}Y^{-1}e_{1}&\frac{1-\xi}{n}ee^{T}Y^{-1}\begin{bmatrix}e_{2}&\cdots&e_{n-k}\end{bmatrix}\\ \frac{1-\xi}{n}e_{1}^{T}Yee^{T}&\frac{1-\xi}{n}e_{1}^{T}Yee^{T}Y^{-1}e_{1}&\frac{1-\xi}{n}e_{1}^{T}Yee^{T}Y^{-1}\begin{bmatrix}e_{2}&\cdots&e_{n-k}\end{bmatrix}\\ \frac{1-\xi}{n}\begin{bmatrix}e_{2}^{T}\\ \vdots\\ e_{n-k}^{T}\end{bmatrix}Yee^{T}&\frac{1-\xi}{n}\begin{bmatrix}e_{2}^{T}\\ \vdots\\ e_{n-k}^{T}\end{bmatrix}Yee^{T}Y^{-1}e_{1}&\frac{1-\xi}{n}\begin{bmatrix}e_{2}^{T}\\ \vdots\\ e_{n-k}^{T}\end{bmatrix}Yee^{T}Y^{-1}\begin{bmatrix}e_{2}&\cdots&e_{n-k}\end{bmatrix}\end{bmatrix}$
$\displaystyle=$	$\displaystyle\begin{bmatrix}\xi(L_{11}L_{11}^{T}+L_{12}L_{12}^{T})+\frac{1-\xi}{n}ee^{T}&\frac{1-\xi}{n}ee^{T}Y^{-1}e_{1}&\frac{1-\xi}{n}ee^{T}Y^{-1}\begin{bmatrix}e_{2}&\cdots&e_{n-k}\end{bmatrix}\\ \frac{1-\xi}{n}e_{1}^{T}Yee^{T}&\frac{1-\xi}{n}e_{1}^{T}Yee^{T}Y^{-1}e_{1}&\frac{1-\xi}{n}e_{1}^{T}Yee^{T}Y^{-1}\begin{bmatrix}e_{2}&\cdots&e_{n-k}\end{bmatrix}\\ 0&0&0\\ \end{bmatrix}$
$\displaystyle=$	$\displaystyle\begin{bmatrix}\xi(L_{11}L_{11}^{T}+L_{12}L_{12}^{T})+\frac{1-\xi}{n}ee^{T}&\frac{1-\xi}{n}(n-k)e&\frac{1-\xi}{n}ee^{T}Y^{-1}\begin{bmatrix}e_{2}&\cdots&e_{n-k}\end{bmatrix}\\ \frac{1-\xi}{n}e^{T}&\frac{1-\xi}{n}(n-k)&\frac{1-\xi}{n}e^{T}Y^{-1}\begin{bmatrix}e_{2}&\cdots&e_{n-k}\end{bmatrix}\\ 0&0&0\\ \end{bmatrix},$	(10)

where we have used the fact that $Y^{-1}e_{1}=e$ . Thus, we have proved the following theorem.

Theorem 3.1.

With the above notation, let

X=\begin{bmatrix}I_{k}&0\\ 0&Y\end{bmatrix},\quad\mbox{where}\quad Y=I_{n-k}-\widehat{e}e_{1}^{T}\quad\mbox{and}\quad\widehat{e}=e-e_{1}=\begin{bmatrix}0&1&\cdots&1\end{bmatrix}^{T},

(11)

and $P$ be a suitable permutation matrix. Then

\displaystyle XPHP^{T}X^{-1}=\begin{bmatrix}H^{(1)}&H^{(2)}\\ 0&0\end{bmatrix},

(12)

where

\displaystyle H^{(1)}=\begin{bmatrix}H^{(1)}_{11}&\frac{1-\xi}{n}(n-k)e\\ \frac{1-\xi}{n}e^{T}&\frac{1-\xi}{n}(n-k)\end{bmatrix}~{}\mbox{and}~{}~{}H^{(2)}=\begin{bmatrix}\frac{1-\xi}{n}ee^{T}Y^{-1}\begin{bmatrix}e_{2},&\cdots&e_{n-k}\end{bmatrix}\\ \frac{1-\xi}{n}e^{T}Y^{-1}\begin{bmatrix}e_{2},&\cdots&e_{n-k}\end{bmatrix}\end{bmatrix},

with $H^{(1)}\in\mathbb{R}^{(k+1)\times(k+1)}$ , $H^{(2)}\in\mathbb{R}^{(k+1)\times(n-k-1)}$ and $H^{(1)}_{11}=\xi(L_{11}L_{11}^{T}+L_{12}L_{12}^{T})+\frac{1-\xi}{n}ee^{T}$ . $H^{(1)}$ has the same nonzero eigenvalues as $H$ .

The following theorem distinguished the relationship between the hub ranking vector $\pi_{h}$ of $H$ and the stationary distribution $\sigma$ of $H^{(1)}$ . As we can see, the leading $k$ elements of $\pi_{h}$ represent the hub vector due to the nondangling nodes, and the trailing $n-k$ elements stand for the hub vector associated with the dangling nodes. Thus the relationship between ranking of dangling nodes and that of nondanling nodes is derived. For ease of the following proof process, we show the structure of the submatrix $Y$ . Separate the first leading row and column,

\displaystyle Y=\begin{bmatrix}1&0\\ -e&I_{n-k-1}\end{bmatrix}.

(13)

Motivated by validating the analytic relationship between ranking vector of nondangling nodes and that of dangling nodes in the hub matrix model, we present our main lumping results for HITS.

Theorem 3.2.

With the above notation, let

\displaystyle\sigma^{T}H^{(1)}=\lambda_{\max}\sigma^{T},\quad\sigma\in\mathbb{R}^{k+1},\,\sigma\geq 0,\,\|\sigma\|_{1}=1,

(14)

where $H^{(1)}$ is defined by (12) and partition $\sigma^{T}=\begin{bmatrix}\sigma^{T}_{1:k}&\sigma_{k+1}\end{bmatrix}$ , where $\sigma_{k+1}$ is a scalar, and $\lambda_{\max}$ is the largest eigenvalue of $L^{T}L$ or $LL^{T}$ . Then the hub vector of $H$ equals

\pi_{h}^{T}=\begin{bmatrix}\sigma^{T}_{1:k}&\frac{1}{\lambda_{\max}}\sigma^{T}\begin{bmatrix}\frac{1-\xi}{n}ee^{T}\\ \frac{1-\xi}{n}e^{T}\end{bmatrix}\end{bmatrix}P,

(15)

where $P$ is a suitable permutation matrix satisfying (6).

Proof.

According to Theorem 3.1, the stochastic matrix $H^{(1)}$ of order $k+1$ has the same nonzero eigenvalues as $H$ . From (12) and (14), we can obtain that $\left[\sigma^{T}\right.\left.\frac{1}{\lambda_{\max}}\sigma^{T}H^{(2)}\right]$ is an eigenvector for $XPHP^{T}X^{-1}$ associated with the eigenvalue $\lambda_{\max}$ . Therefore,

\displaystyle\widetilde{\pi}^{T}=\begin{bmatrix}\sigma^{T}&\frac{1}{\lambda_{\max}}\sigma^{T}H^{(2)}\end{bmatrix}XP

(16)

is an eigenvector of $H$ associated with $\lambda_{\max}$ . Since $H$ and $H^{(1)}$ have the same nonzero eigenvalues, and the principle eigenvalue of $H$ is distinct, the stationary probability distribution $\sigma$ of $H^{(1)}$ is unique. We repartition

\displaystyle\widetilde{\pi}^{T}=\begin{bmatrix}\sigma_{1:k}^{T}&\begin{bmatrix}\sigma_{k+1}&\frac{1}{\lambda_{\max}}\sigma^{T}H^{(2)}\end{bmatrix}\end{bmatrix}\begin{bmatrix}I_{k}&0\\ 0&Y\end{bmatrix}P.

(17)

Multiplying out $\widetilde{\pi}^{T}=\begin{bmatrix}\sigma_{1:k}^{T}&\begin{bmatrix}\sigma_{k+1}&\frac{1}{\lambda_{\max}}\sigma^{T}H^{(2)}\end{bmatrix}Y\end{bmatrix}P$ . Hence, by (12) and (13), we have

	$\displaystyle\begin{bmatrix}\sigma_{k+1}&\frac{1}{\lambda_{\max}}\sigma^{T}H^{(2)}\end{bmatrix}\begin{bmatrix}1&0\\ -e&I_{n-k-1}\end{bmatrix}$
$\displaystyle=$	$\displaystyle\begin{bmatrix}\sigma_{k+1}-\frac{1}{\lambda_{\max}}\sigma^{T}H^{(2)}e&\frac{1}{\lambda_{\max}}\sigma^{T}H^{(2)}\end{bmatrix}$
$\displaystyle=$	$\displaystyle\begin{bmatrix}\frac{1}{\lambda_{\max}}\sigma^{T}\begin{bmatrix}\frac{1-\xi}{n}(n-k)e\\ \frac{1-\xi}{n}(n-k)\end{bmatrix}-\frac{1}{\lambda_{\max}}\sigma^{T}H^{(2)}e&\frac{1}{\lambda_{\max}}\sigma^{T}\begin{bmatrix}\frac{1-\xi}{n}ee^{T}Y^{-1}\begin{bmatrix}e_{2}&\cdots&e_{n-k}\end{bmatrix}\\ \frac{1-\xi}{n}e^{T}Y^{-1}\begin{bmatrix}e_{2}&\cdots&e_{n-k}\end{bmatrix}\end{bmatrix}\end{bmatrix}$
$\displaystyle=$	$\displaystyle\begin{bmatrix}\frac{1}{\lambda_{\max}}\sigma^{T}\begin{bmatrix}\frac{1-\xi}{n}e(e^{T}e)\\ \frac{1-\xi}{n}(e^{T}e)\end{bmatrix}-\frac{1}{\lambda_{\max}}\sigma^{T}\begin{bmatrix}\frac{1-\xi}{n}ee^{T}\widehat{e}\\ \frac{1-\xi}{n}e^{T}\widehat{e}\end{bmatrix}&\frac{1}{\lambda_{\max}}\sigma^{T}\begin{bmatrix}\frac{1-\xi}{n}ee^{T}\\ \frac{1-\xi}{n}e^{T}\end{bmatrix}\begin{bmatrix}e_{2}&\ldots&e_{n-k}\end{bmatrix}\end{bmatrix}$
$\displaystyle=$	$\displaystyle\begin{bmatrix}\frac{1}{\lambda_{\max}}\sigma^{T}\begin{bmatrix}\frac{1-\xi}{n}ee^{T}\\ \frac{1-\xi}{n}e^{T}\end{bmatrix}e_{1}&\frac{1}{\lambda_{\max}}\sigma^{T}\begin{bmatrix}\frac{1-\xi}{n}ee^{T}\\ \frac{1-\xi}{n}e^{T}\end{bmatrix}\begin{bmatrix}e_{2}&\ldots&e_{n-k}\end{bmatrix}\end{bmatrix}$
$\displaystyle=$	$\displaystyle\frac{1}{\lambda_{\max}}\sigma^{T}\begin{bmatrix}\frac{1-\xi}{n}ee^{T}\\ \frac{1-\xi}{n}e^{T}\end{bmatrix},$	(18)

due to the fact that

\displaystyle Y^{-1}\begin{bmatrix}e_{2}&\cdots&e_{n-k}\end{bmatrix}=\begin{bmatrix}e_{2}&\cdots&e_{n-k}\end{bmatrix}\quad\mbox{and}\quad\sigma_{k+1}=\frac{1}{\lambda_{\max}}\sigma^{T}\begin{bmatrix}\frac{1-\xi}{n}(n-k)e\\ \frac{1-\xi}{n}(n-k)\end{bmatrix}.

Hence,

\widetilde{\pi}^{T}=\begin{bmatrix}\sigma^{T}_{1:k}&\frac{1}{\lambda_{\max}}\sigma^{T}\begin{bmatrix}\frac{1-\xi}{n}ee^{T}\\ \frac{1-\xi}{n}e^{T}\end{bmatrix}\end{bmatrix}P,

(19)

where $P$ is a suitable permutation matrix satisfying (6). As discussed above and $\pi_{h}$ is unique, we conclude that $\widetilde{\pi}=\pi_{h}$ if $e^{T}\widetilde{\pi}=1$ . ∎

Remark 3.1.

Theorem 3.2 shows that we can compute the ranking vector of submatrix $H^{(1)}$ which is derived from the hub matrix, and then recover the hub vector according to (15).

Remark 3.2.

One can generalize the concrete invertible matrix $Y$ in (13) in theorems 3.1 and 3.2 by any invertible similarity transformation matrix satisfying the condition $\widehat{Y}e=e_{1}$ . For more detailed induction, we refer readers to [13].

Remark 3.3.

Since

\displaystyle PAP^{T}=PL^{T}LP^{T}=PL^{T}P^{T}PLP^{T}=

\displaystyle\begin{bmatrix}L_{11}^{T}&0\\ L_{12}^{T}&0\end{bmatrix}\begin{bmatrix}L_{11}&L_{12}\\ 0&0\end{bmatrix}=\begin{bmatrix}L_{11}^{T}L_{11}&L_{11}^{T}L_{12}\\ L_{12}^{T}L_{11}&L_{12}^{T}L_{12}\end{bmatrix},

(20)

we thus remark that it is cheaper to compute (3), rather than (20). This is due to $PAP^{T}$ is denser than $PHP^{T}$ . Hence, it is better to compute the hub matrix $LL^{T}$ first, when compared with the authoritative matrix $L^{T}L$ , so does the hub and authoritative matrices’ primitive modifications.

Remark 3.4.

Since one can permutate $L$ such that (6) holds, however, the web adjacency matrix is very sparse (usually there is only about ten entries per row), one can even permutate $L$ such that $\widetilde{P}L\widetilde{P}^{T}=\begin{bmatrix}\widetilde{L}_{11}&0\\ \widetilde{L}_{21}&0\end{bmatrix}$ , this phenomenon can be verified by web data matrix from on-line Florida sparse matrix collection²²2https://sparse.tamu.edu/. Particularly, the number of all zero columns may be more than the number of all zero rows. In this case, the authoritative ranking vector is recommended to have priority in computing. But note that this case is very rare.

4 Conclusion

In this paper, we have studied a HITS computation approach via lumping dangling nodes of hub matrix $LL^{T}$ into a single node. Thus we have answered the HITS computation question leaving in the introduction.

The HITS can be computed by lumping approach despite the involved matrices are not stochastic. The approach which we discuss is useful whether the HITS model is search-dependent or not. For the hub vector, Theorem 3.2 shows us that the rankings of nondangling nodes can be computed independently from that of dangling nodes; while rankings of dangling nodes depends on the rankings of nondangling nodes. According to Remark 3.3, the authoritative vector is relatively difficult to compute when compared with the hub vector. Thus we suggest that it is better to compute the hub vector in priority rather than authoritative vector or both of them simultaneously.

Further researches may include how to compute SALSA by the lumping method. Questions like the ranking vector relationship of dangling nodes and nondangling nodes in SALSA model are also worthy of studying.

References

Page et al. [9 66] L. Page, S. Brin, R. Motwani, T. Winograd, The PageRank citation ranking: Bringing order to the web, Stanford Digital Libraries, 1999, (available online from http:// dbpubs.stanford.edu:8090/pub/1999-66.).
Kleinberg [1999] J. M. Kleinberg, Authoritative Sources in a Hyperlinked Environment, J. ACM 46 (1999) 604–632.
Lempel and Moran [2000] R. Lempel, S. Moran, The stochastic approach for link-structure analysis (SALSA) and the TKC effect, Computer Networks 33 (2000) 387–401.
Langville and Meyer [2005] A. N. Langville, C. D. Meyer, A survey of eigenvector methods for web information retrieval, SIAM review 47 (2005) 135–161.
Langville and Meyer [2006] A. N. Langville, C. D. Meyer, Google’s PageRank and Beyond: The Science of Search Engine Rankings, Princeton University Press, 2006.
Zhou [2012] Y.-K. Zhou, Practical acceleration for computing the HITS ExpertRank vectors, Journal of Computational and Applied Mathematics 236 (2012) 4398–4409.
Eldén [2007] L. Eldén, Matrix methods in data mining and pattern recognition, SIAM, 2007.
Brezinski and Redivo-Zaglia [2006] C. Brezinski, M. Redivo-Zaglia, The PageRank Vector: Properties, Computation, Approximation, and Acceleration, SIAM Journal on Matrix Analysis and Applications 28 (2006) 551–575.
Feng et al. [2021] Y.-H. Feng, J.-X. You, Y.-X. Dong, An Extrapolation Iteration and Its Lumped Type Iteration for Computing PageRank, Bulletin of the Iranian Mathematical Society (2021) 1–18.
Lee et al. [2007] C. P. Lee, G. H. Golub, S. A. Zenios, A Two-Stage Algorithm for Computing PageRank and Multistage Generalizations, Internet Mathematics 4 (2007) 299–327.
Dong et al. [2017] Y.-X. Dong, C.-Q. Gu, Z.-B. Chen, An Arnoldi-Inout method accelerated with a two-stage matrix splitting iteration for computing PageRank, Calcolo 54 (2017) 1–23.
Ipsen and Selee [2007] I. C. F. Ipsen, T. M. Selee, PageRank computation with special attention to dangling nodes, SIAM Journal on Matrix Analysis and Applications 29 (2007) 1281–1296.
Dong et al. [2021] Y.-X. Dong, Y.-H. Feng, J.-X. You, J.-R. Guan, Comments on lumping the Google matrix, arXiv preprint arXiv:2107.11080 (2021).

On computing HITS ExpertRank via lumping the hub matrix 111This work was supported by National Natural Science Foundation of China (Grant Nos. 12001363, 71671125 ) .

Abstract

keywords:

MSC:

1 Introduction

2 HITS model

3 Lumping and related theorems

Theorem 3.1.

Theorem 3.2.

Proof.

Remark 3.1.

Remark 3.2.

Remark 3.3.

Remark 3.4.

4 Conclusion

References

References

On computing HITS ExpertRank via lumping the hub matrix ¹¹1This work was supported by National Natural Science Foundation of China (Grant Nos. 12001363, 71671125 ) .