This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

On computing HITS ExpertRank via lumping the hub matrix 111This work was supported by National Natural Science Foundation of China (Grant Nos. 12001363, 71671125 ) .

Yongxin Dong Yuehua Feng yhfeng@sues.edu.cn Jianxin You School of Economics and Management, Tongji University, Shanghai 200092, China School of Mathematics, Physics and Statistics, Shanghai University of Engineering Science, Shanghai 201620, China
Abstract

The dangling nodes is the nodes with no out-links in the web graph. It saves many computational cost and operations provided the dangling nodes are lumped into one node. In this paper, motivated by so many dangling nodes in web graph, we develop theoretical results for HITS by the lumping method. We mainly have three findings. First, the HITS model can be lumped although the matrix involved is not stochastic. Second, the hub vector of the nondangling nodes can be computed separately from dangling nodes, but not vice versa. Third, the authoritative vector of the nondangling nodes is difficult to compute separately from dangling nodes. Therefore, it is better to compute hub vector of the hub matrix in priority, not authoritative vector of the authoritative matrix or them simultaneous.

keywords:
HITS, Lumping, Nondangling nodes, Dangling nodes, Similarity transformation
MSC:
[2020] 65F10, 65F50, 15A18, 15A21, 68P20
journal: XXX

1 Introduction

The PageRank of Page and Brin, Hyperlink-Induced Topic Search (HITS) of Kleinberg, stochastic approach for link-structure analysis (SALSA) of Lempel and Moran use dominant eigenvector of non-negative matrices for ranking web page purpose [1, 2, 3]. PageRank is used in Google, HITS is used in the ask.com search engine, and SALSA is a combination of PageRank and HITS [3, 4]. Since the late 20th century, HITS is an another extremely successful modern web information retrieval application of dominant eigenvector. The resulting two ranking vectors (authority vector and hub vector) from HITS provide the ExpertRanks. The HITS method has broad applications such as product quality ranking, similarity ranking and so on. For discussions on HITS, together with the literature on modifications to overcome its weaknesses, we refer readers to [5].

The eigenproblems related to web information retrieval and data mining can be of huge dimension. Because of memory constraint of computer, the power method has become the dominant method for solving the HITS and PageRank eigenproblems [4, 5]. As the web is very enormous, the web may contain more than 10910^{9} pages and increases quickly and dynamically. It can even take much time (several hours or days) to compute a large web ranking vector. For the search-dependent HITS, the computation problem which only involves users’ query related nodes is relatively small, that is why there is relatively less acceleration work on HITS. While for the search independent HITS, the matrices involved are usually of tremendous dimension and effective numerical accelerations are very desirable [6].

As we know that the power method will be lack of efficiency when the eigen gap between the largest eigenvalue (λ1\lambda_{1}) and the second largest eigenvalue (λ2\lambda_{2} in magnitude), is close enough to 1. The famous Krylov subspace methods can converge faster than the power method, but they are not suitable for such web information problem due to relatively large storage and subspace dimension and so on [4, 5, 7]. Therefore, many acceleration methods for information retrieval model calculations include the aggregation methods [5], extrapolation methods [8, 9], and two stage acceleration methods [10, 11] and other contributions [4, 7] are developed. Among them, most methods consider the difficult computation case that the gap-ratio |λ2||λ1|\frac{|\lambda_{2}|}{|\lambda_{1}|} approximates 1. By lumping Google matrix, Ipsen and Selee analyzed the relationship between rankings of nondangling nodes and rankings of dangling nodes [12]. To improve the computational efficiency of HITS ExpertRank, the filtered power method combining Chebyshev polynomials is proposed[6]. For more theoretical and numerical results on the web information retrieval model are available, see [4, 5, 7, 9, 13]. One may raise a question that whether the HITS model can still have similar lumping results, thus the computation cost can be reduced even if the matrix involved is not stochastic?

The rest of this paper is organized as follows. Section 2 describes the HITS model briefly. Section 3 derives the main approach and theorems for lumping the HITS model. Finally, we give a short conclusion in section 4.

2 HITS model

The HITS model uses an adjacency matrix Ln×nL\in\mathbb{R}^{n\times n} to describe web link structure graph. The eigenvetor of LTLL^{T}L or LLTLL^{T} are employed to reveal the relative importance (rank) of corresponding web pages’ authoritative vectors or hub vectors. Kleinberg [1] invented new matrices defined by

LTLn×n,LLTn×n,L^{T}L\in\mathbb{R}^{n\times n},\quad LL^{T}\in\mathbb{R}^{n\times n}, (1)

respectively, where LL is an adjacent matrix given by

Lij={1,if page i links to page j,0,otherwise.L_{ij}=\left\{\begin{array}[]{l}1,~{}~{}\mbox{if page $i$ links to page $j$,}\\ 0,~{}~{}\rm{otherwise}.\\ \end{array}\right.

In (1), if web page ii has no outlinks (i.e., image files, pdf with on links to other pages), it is called a dangling node; otherwise it is called a nondangling node.

The HITS method updates vav_{a} and vhv_{h} iteratively from some initial vectors va(0)v_{a}^{(0)} and vh(0)v_{h}^{(0)},

va(k)=LTvh(k1),vh(k)=Lva(k),k=1,2,.\displaystyle v_{a}^{(k)}=L^{T}v_{h}^{(k-1)},\quad v_{h}^{(k)}=Lv_{a}^{(k)},\quad k=1,2,\cdots. (2)

Once one of vav_{a} and vhv_{h} is convergent, the other vector is solved by multiplying LL or LTL^{T}. From (2), we have the following expression. The authoritative vector πa\pi_{a} of the authoritative matrix LTLL^{T}L and the hub vector πh\pi_{h} of the hub matrix are defined by computing the principle eigenvector of LTLL^{T}L and LLTLL^{T}

λmaxπa=LTLπa,λmaxπh=LLTπh,whereπa0,πh0,πa1=1,πh=1,\displaystyle\lambda_{\max}\pi_{a}=L^{T}L\pi_{a},~{}\lambda_{\max}\pi_{h}=LL^{T}\pi_{h},~{}\mbox{where}~{}\pi_{a}\geq 0,~{}\pi_{h}\geq 0,~{}\|\pi_{a}\|_{1}=1,~{}\|\pi_{h}\|=1, (3)

respectively. The matrix LTLL^{T}L or LLTLL^{T} is a symmetric positive semi-definite matrix, thus it has nonnegative eigenvalues. ExpertRanks is provided by the authoritative and hub vectors from HITS.

The expression (3) can’t guarantee the uniqueness of λmax\lambda_{\max}, πa\pi_{a} or πh\pi_{h}. To make the uniqueness, we modify LTLL^{T}L and LLTLL^{T} such that they are primitive matrices. The authoritative matrix AA and hub matrix HH are defined by

A=ξLTL+(1ξ)neeT,H=ξLLT+(1ξ)neeT,where0<ξ<1,e=[11]T,\displaystyle A=\xi L^{T}L+\frac{(1-\xi)}{n}ee^{T},~{}H=\xi LL^{T}+\frac{(1-\xi)}{n}ee^{T},~{}\mbox{where}~{}0<\xi<1,~{}e=\begin{bmatrix}1&\cdots&1\end{bmatrix}^{T}, (4)

respectively. Accordingly, the authoritative and hub vector are defined by

πaTA=πaT,πhTH=πhT,whereπa0,πh0,πaTe=1,πhTe=1,\displaystyle\pi_{a}^{T}A=\pi_{a}^{T},\quad\pi_{h}^{T}H=\pi_{h}^{T},\quad\mbox{where}\quad\pi_{a}\geq 0,~{}\pi_{h}\geq 0,~{}\pi_{a}^{T}e=1,~{}\pi_{h}^{T}e=1, (5)

respective, where ee is a suitable length vector of all ones. In this paper, we thus mainly discuss the computation problem (5). In the following section, we try to make theoretical and practical contributions for computing purposes.

3 Lumping and related theorems

The adjacency matrix is lumpable if all dangling nodes are lumped into a single node [10, 12]. According to (1), the adjacency matrix admits the structure

PLPT=[L11L1200]andPLTPT=[L11T0L12T0],\displaystyle PLP^{T}=\begin{bmatrix}L_{11}&L_{12}\\ 0&0\end{bmatrix}\quad\mbox{and}\quad PL^{T}P^{T}=\begin{bmatrix}L_{11}^{T}&0\\ L_{12}^{T}&0\end{bmatrix}, (6)

where PP is a suitable permutation matrix, L11k×kL_{11}\in\mathbb{R}^{k\times k}, L12k×(nk)L_{12}\in\mathbb{R}^{k\times(n-k)} and kk is the number of nondangling nodes. Then we have

PLLTPT=PLPTPLTPT=\displaystyle PLL^{T}P^{T}=PLP^{T}PL^{T}P^{T}= [L11L1200][L11T0L12T0]\displaystyle\begin{bmatrix}L_{11}&L_{12}\\ 0&0\end{bmatrix}\begin{bmatrix}L_{11}^{T}&0\\ L_{12}^{T}&0\end{bmatrix}
=\displaystyle= [L11L11T+L12L12T000].\displaystyle\begin{bmatrix}L_{11}L_{11}^{T}+L_{12}L_{12}^{T}&0\\ 0&0\end{bmatrix}. (7)

After the primitive modification, we obtain

PHPT=\displaystyle PHP^{T}= P(ξLLT+1ξneeT)PT\displaystyle P\left(\xi LL^{T}+\frac{1-\xi}{n}ee^{T}\right)P^{T}
=\displaystyle= ξ[L11L11T+L12L12T000]+1ξneeT\displaystyle\xi\begin{bmatrix}L_{11}L_{11}^{T}+L_{12}L_{12}^{T}&0\\ 0&0\end{bmatrix}+\frac{1-\xi}{n}ee^{T}
=\displaystyle= [ξ(L11L11T+L12L12T)+1ξneeT1ξneeT1ξneeT1ξneeT],\displaystyle\begin{bmatrix}\xi(L_{11}L_{11}^{T}+L_{12}L_{12}^{T})+\frac{1-\xi}{n}ee^{T}&\frac{1-\xi}{n}ee^{T}\\ \frac{1-\xi}{n}ee^{T}&\frac{1-\xi}{n}ee^{T}\end{bmatrix}, (8)

where ee is a suitable length vector of all ones. By using the lumping method and the similarity transformation matrix

Y=Inke^e1T,withe^=ee1=[0,1,,1]Tnk,\displaystyle Y=I_{n-k}-\widehat{e}e_{1}^{T},\quad\mbox{with}\quad\widehat{e}=e-e_{1}=\begin{bmatrix}0,1,\cdots,1\end{bmatrix}^{T}\in\mathbb{R}^{n-k},

where Ink=[e1enk]I_{n-k}=\begin{bmatrix}e_{1}\cdots e_{n-k}\end{bmatrix} denotes the identity matrix of order nkn-k, and ei(i=1,2,,nk)e_{i}(i=1,2,\cdots,n-k) is its ii-th column vector [13]. Define

X=[Ik00Y],\displaystyle X=\begin{bmatrix}I_{k}&0\\ 0&Y\end{bmatrix}, (9)

then we have

XPHPTX1\displaystyle XPHP^{T}X^{-1}
=\displaystyle= [ξ(L11L11T+L12L12T)+1ξneeT1ξneeTY11ξnYeeT1ξnYeeTY1]\displaystyle\begin{bmatrix}\xi(L_{11}L_{11}^{T}+L_{12}L_{12}^{T})+\frac{1-\xi}{n}ee^{T}&\frac{1-\xi}{n}ee^{T}Y^{-1}\\ \frac{1-\xi}{n}Yee^{T}&\frac{1-\xi}{n}Yee^{T}Y^{-1}\end{bmatrix}
=\displaystyle= [ξ(L11L11T+L12L12T)+1ξneeT1ξneeTY1e11ξneeTY1[e2enk]1ξne1TYeeT1ξne1TYeeTY1e11ξne1TYeeTY1[e2enk]1ξn[e2TenkT]YeeT1ξn[e2TenkT]YeeTY1e11ξn[e2TenkT]YeeTY1[e2enk]]\displaystyle\begin{bmatrix}\xi(L_{11}L_{11}^{T}+L_{12}L_{12}^{T})+\frac{1-\xi}{n}ee^{T}&\frac{1-\xi}{n}ee^{T}Y^{-1}e_{1}&\frac{1-\xi}{n}ee^{T}Y^{-1}\begin{bmatrix}e_{2}&\cdots&e_{n-k}\end{bmatrix}\\ \frac{1-\xi}{n}e_{1}^{T}Yee^{T}&\frac{1-\xi}{n}e_{1}^{T}Yee^{T}Y^{-1}e_{1}&\frac{1-\xi}{n}e_{1}^{T}Yee^{T}Y^{-1}\begin{bmatrix}e_{2}&\cdots&e_{n-k}\end{bmatrix}\\ \frac{1-\xi}{n}\begin{bmatrix}e_{2}^{T}\\ \vdots\\ e_{n-k}^{T}\end{bmatrix}Yee^{T}&\frac{1-\xi}{n}\begin{bmatrix}e_{2}^{T}\\ \vdots\\ e_{n-k}^{T}\end{bmatrix}Yee^{T}Y^{-1}e_{1}&\frac{1-\xi}{n}\begin{bmatrix}e_{2}^{T}\\ \vdots\\ e_{n-k}^{T}\end{bmatrix}Yee^{T}Y^{-1}\begin{bmatrix}e_{2}&\cdots&e_{n-k}\end{bmatrix}\end{bmatrix}
=\displaystyle= [ξ(L11L11T+L12L12T)+1ξneeT1ξneeTY1e11ξneeTY1[e2enk]1ξne1TYeeT1ξne1TYeeTY1e11ξne1TYeeTY1[e2enk]000]\displaystyle\begin{bmatrix}\xi(L_{11}L_{11}^{T}+L_{12}L_{12}^{T})+\frac{1-\xi}{n}ee^{T}&\frac{1-\xi}{n}ee^{T}Y^{-1}e_{1}&\frac{1-\xi}{n}ee^{T}Y^{-1}\begin{bmatrix}e_{2}&\cdots&e_{n-k}\end{bmatrix}\\ \frac{1-\xi}{n}e_{1}^{T}Yee^{T}&\frac{1-\xi}{n}e_{1}^{T}Yee^{T}Y^{-1}e_{1}&\frac{1-\xi}{n}e_{1}^{T}Yee^{T}Y^{-1}\begin{bmatrix}e_{2}&\cdots&e_{n-k}\end{bmatrix}\\ 0&0&0\\ \end{bmatrix}
=\displaystyle= [ξ(L11L11T+L12L12T)+1ξneeT1ξn(nk)e1ξneeTY1[e2enk]1ξneT1ξn(nk)1ξneTY1[e2enk]000],\displaystyle\begin{bmatrix}\xi(L_{11}L_{11}^{T}+L_{12}L_{12}^{T})+\frac{1-\xi}{n}ee^{T}&\frac{1-\xi}{n}(n-k)e&\frac{1-\xi}{n}ee^{T}Y^{-1}\begin{bmatrix}e_{2}&\cdots&e_{n-k}\end{bmatrix}\\ \frac{1-\xi}{n}e^{T}&\frac{1-\xi}{n}(n-k)&\frac{1-\xi}{n}e^{T}Y^{-1}\begin{bmatrix}e_{2}&\cdots&e_{n-k}\end{bmatrix}\\ 0&0&0\\ \end{bmatrix}, (10)

where we have used the fact that Y1e1=eY^{-1}e_{1}=e. Thus, we have proved the following theorem.

Theorem 3.1.

With the above notation, let

X=[Ik00Y],whereY=Inke^e1Tande^=ee1=[011]T,X=\begin{bmatrix}I_{k}&0\\ 0&Y\end{bmatrix},\quad\mbox{where}\quad Y=I_{n-k}-\widehat{e}e_{1}^{T}\quad\mbox{and}\quad\widehat{e}=e-e_{1}=\begin{bmatrix}0&1&\cdots&1\end{bmatrix}^{T}, (11)

and PP be a suitable permutation matrix. Then

XPHPTX1=[H(1)H(2)00],\displaystyle XPHP^{T}X^{-1}=\begin{bmatrix}H^{(1)}&H^{(2)}\\ 0&0\end{bmatrix}, (12)

where

H(1)=[H11(1)1ξn(nk)e1ξneT1ξn(nk)]andH(2)=[1ξneeTY1[e2,enk]1ξneTY1[e2,enk]],\displaystyle H^{(1)}=\begin{bmatrix}H^{(1)}_{11}&\frac{1-\xi}{n}(n-k)e\\ \frac{1-\xi}{n}e^{T}&\frac{1-\xi}{n}(n-k)\end{bmatrix}~{}\mbox{and}~{}~{}H^{(2)}=\begin{bmatrix}\frac{1-\xi}{n}ee^{T}Y^{-1}\begin{bmatrix}e_{2},&\cdots&e_{n-k}\end{bmatrix}\\ \frac{1-\xi}{n}e^{T}Y^{-1}\begin{bmatrix}e_{2},&\cdots&e_{n-k}\end{bmatrix}\end{bmatrix},

with H(1)(k+1)×(k+1)H^{(1)}\in\mathbb{R}^{(k+1)\times(k+1)}, H(2)(k+1)×(nk1)H^{(2)}\in\mathbb{R}^{(k+1)\times(n-k-1)} and H11(1)=ξ(L11L11T+L12L12T)+1ξneeTH^{(1)}_{11}=\xi(L_{11}L_{11}^{T}+L_{12}L_{12}^{T})+\frac{1-\xi}{n}ee^{T}. H(1)H^{(1)} has the same nonzero eigenvalues as HH.

The following theorem distinguished the relationship between the hub ranking vector πh\pi_{h} of HH and the stationary distribution σ\sigma of H(1)H^{(1)}. As we can see, the leading kk elements of πh\pi_{h} represent the hub vector due to the nondangling nodes, and the trailing nkn-k elements stand for the hub vector associated with the dangling nodes. Thus the relationship between ranking of dangling nodes and that of nondanling nodes is derived. For ease of the following proof process, we show the structure of the submatrix YY. Separate the first leading row and column,

Y=[10eInk1].\displaystyle Y=\begin{bmatrix}1&0\\ -e&I_{n-k-1}\end{bmatrix}. (13)

Motivated by validating the analytic relationship between ranking vector of nondangling nodes and that of dangling nodes in the hub matrix model, we present our main lumping results for HITS.

Theorem 3.2.

With the above notation, let

σTH(1)=λmaxσT,σk+1,σ0,σ1=1,\displaystyle\sigma^{T}H^{(1)}=\lambda_{\max}\sigma^{T},\quad\sigma\in\mathbb{R}^{k+1},\,\sigma\geq 0,\,\|\sigma\|_{1}=1, (14)

where H(1)H^{(1)} is defined by (12) and partition σT=[σ1:kTσk+1]\sigma^{T}=\begin{bmatrix}\sigma^{T}_{1:k}&\sigma_{k+1}\end{bmatrix}, where σk+1\sigma_{k+1} is a scalar, and λmax\lambda_{\max} is the largest eigenvalue of LTLL^{T}L or LLTLL^{T}. Then the hub vector of HH equals

πhT=[σ1:kT1λmaxσT[1ξneeT1ξneT]]P,\pi_{h}^{T}=\begin{bmatrix}\sigma^{T}_{1:k}&\frac{1}{\lambda_{\max}}\sigma^{T}\begin{bmatrix}\frac{1-\xi}{n}ee^{T}\\ \frac{1-\xi}{n}e^{T}\end{bmatrix}\end{bmatrix}P, (15)

where PP is a suitable permutation matrix satisfying (6).

Proof.

According to Theorem 3.1, the stochastic matrix H(1)H^{(1)} of order k+1k+1 has the same nonzero eigenvalues as HH. From (12) and (14), we can obtain that [σT1λmaxσTH(2)]\left[\sigma^{T}\right.\left.\frac{1}{\lambda_{\max}}\sigma^{T}H^{(2)}\right] is an eigenvector for XPHPTX1XPHP^{T}X^{-1} associated with the eigenvalue λmax\lambda_{\max}. Therefore,

π~T=[σT1λmaxσTH(2)]XP\displaystyle\widetilde{\pi}^{T}=\begin{bmatrix}\sigma^{T}&\frac{1}{\lambda_{\max}}\sigma^{T}H^{(2)}\end{bmatrix}XP (16)

is an eigenvector of HH associated with λmax\lambda_{\max}. Since HH and H(1)H^{(1)} have the same nonzero eigenvalues, and the principle eigenvalue of HH is distinct, the stationary probability distribution σ\sigma of H(1)H^{(1)} is unique. We repartition

π~T=[σ1:kT[σk+11λmaxσTH(2)]][Ik00Y]P.\displaystyle\widetilde{\pi}^{T}=\begin{bmatrix}\sigma_{1:k}^{T}&\begin{bmatrix}\sigma_{k+1}&\frac{1}{\lambda_{\max}}\sigma^{T}H^{(2)}\end{bmatrix}\end{bmatrix}\begin{bmatrix}I_{k}&0\\ 0&Y\end{bmatrix}P. (17)

Multiplying out π~T=[σ1:kT[σk+11λmaxσTH(2)]Y]P\widetilde{\pi}^{T}=\begin{bmatrix}\sigma_{1:k}^{T}&\begin{bmatrix}\sigma_{k+1}&\frac{1}{\lambda_{\max}}\sigma^{T}H^{(2)}\end{bmatrix}Y\end{bmatrix}P. Hence, by (12) and (13), we have

[σk+11λmaxσTH(2)][10eInk1]\displaystyle\begin{bmatrix}\sigma_{k+1}&\frac{1}{\lambda_{\max}}\sigma^{T}H^{(2)}\end{bmatrix}\begin{bmatrix}1&0\\ -e&I_{n-k-1}\end{bmatrix}
=\displaystyle= [σk+11λmaxσTH(2)e1λmaxσTH(2)]\displaystyle\begin{bmatrix}\sigma_{k+1}-\frac{1}{\lambda_{\max}}\sigma^{T}H^{(2)}e&\frac{1}{\lambda_{\max}}\sigma^{T}H^{(2)}\end{bmatrix}
=\displaystyle= [1λmaxσT[1ξn(nk)e1ξn(nk)]1λmaxσTH(2)e1λmaxσT[1ξneeTY1[e2enk]1ξneTY1[e2enk]]]\displaystyle\begin{bmatrix}\frac{1}{\lambda_{\max}}\sigma^{T}\begin{bmatrix}\frac{1-\xi}{n}(n-k)e\\ \frac{1-\xi}{n}(n-k)\end{bmatrix}-\frac{1}{\lambda_{\max}}\sigma^{T}H^{(2)}e&\frac{1}{\lambda_{\max}}\sigma^{T}\begin{bmatrix}\frac{1-\xi}{n}ee^{T}Y^{-1}\begin{bmatrix}e_{2}&\cdots&e_{n-k}\end{bmatrix}\\ \frac{1-\xi}{n}e^{T}Y^{-1}\begin{bmatrix}e_{2}&\cdots&e_{n-k}\end{bmatrix}\end{bmatrix}\end{bmatrix}
=\displaystyle= [1λmaxσT[1ξne(eTe)1ξn(eTe)]1λmaxσT[1ξneeTe^1ξneTe^]1λmaxσT[1ξneeT1ξneT][e2enk]]\displaystyle\begin{bmatrix}\frac{1}{\lambda_{\max}}\sigma^{T}\begin{bmatrix}\frac{1-\xi}{n}e(e^{T}e)\\ \frac{1-\xi}{n}(e^{T}e)\end{bmatrix}-\frac{1}{\lambda_{\max}}\sigma^{T}\begin{bmatrix}\frac{1-\xi}{n}ee^{T}\widehat{e}\\ \frac{1-\xi}{n}e^{T}\widehat{e}\end{bmatrix}&\frac{1}{\lambda_{\max}}\sigma^{T}\begin{bmatrix}\frac{1-\xi}{n}ee^{T}\\ \frac{1-\xi}{n}e^{T}\end{bmatrix}\begin{bmatrix}e_{2}&\ldots&e_{n-k}\end{bmatrix}\end{bmatrix}
=\displaystyle= [1λmaxσT[1ξneeT1ξneT]e11λmaxσT[1ξneeT1ξneT][e2enk]]\displaystyle\begin{bmatrix}\frac{1}{\lambda_{\max}}\sigma^{T}\begin{bmatrix}\frac{1-\xi}{n}ee^{T}\\ \frac{1-\xi}{n}e^{T}\end{bmatrix}e_{1}&\frac{1}{\lambda_{\max}}\sigma^{T}\begin{bmatrix}\frac{1-\xi}{n}ee^{T}\\ \frac{1-\xi}{n}e^{T}\end{bmatrix}\begin{bmatrix}e_{2}&\ldots&e_{n-k}\end{bmatrix}\end{bmatrix}
=\displaystyle= 1λmaxσT[1ξneeT1ξneT],\displaystyle\frac{1}{\lambda_{\max}}\sigma^{T}\begin{bmatrix}\frac{1-\xi}{n}ee^{T}\\ \frac{1-\xi}{n}e^{T}\end{bmatrix}, (18)

due to the fact that

Y1[e2enk]=[e2enk]andσk+1=1λmaxσT[1ξn(nk)e1ξn(nk)].\displaystyle Y^{-1}\begin{bmatrix}e_{2}&\cdots&e_{n-k}\end{bmatrix}=\begin{bmatrix}e_{2}&\cdots&e_{n-k}\end{bmatrix}\quad\mbox{and}\quad\sigma_{k+1}=\frac{1}{\lambda_{\max}}\sigma^{T}\begin{bmatrix}\frac{1-\xi}{n}(n-k)e\\ \frac{1-\xi}{n}(n-k)\end{bmatrix}.

Hence,

π~T=[σ1:kT1λmaxσT[1ξneeT1ξneT]]P,\widetilde{\pi}^{T}=\begin{bmatrix}\sigma^{T}_{1:k}&\frac{1}{\lambda_{\max}}\sigma^{T}\begin{bmatrix}\frac{1-\xi}{n}ee^{T}\\ \frac{1-\xi}{n}e^{T}\end{bmatrix}\end{bmatrix}P, (19)

where PP is a suitable permutation matrix satisfying (6). As discussed above and πh\pi_{h} is unique, we conclude that π~=πh\widetilde{\pi}=\pi_{h} if eTπ~=1e^{T}\widetilde{\pi}=1. ∎

Remark 3.1.

Theorem 3.2 shows that we can compute the ranking vector of submatrix H(1)H^{(1)} which is derived from the hub matrix, and then recover the hub vector according to (15).

Remark 3.2.

One can generalize the concrete invertible matrix YY in (13) in theorems 3.1 and 3.2 by any invertible similarity transformation matrix satisfying the condition Y^e=e1\widehat{Y}e=e_{1}. For more detailed induction, we refer readers to [13].

Remark 3.3.

Since

PAPT=PLTLPT=PLTPTPLPT=\displaystyle PAP^{T}=PL^{T}LP^{T}=PL^{T}P^{T}PLP^{T}= [L11T0L12T0][L11L1200]=[L11TL11L11TL12L12TL11L12TL12],\displaystyle\begin{bmatrix}L_{11}^{T}&0\\ L_{12}^{T}&0\end{bmatrix}\begin{bmatrix}L_{11}&L_{12}\\ 0&0\end{bmatrix}=\begin{bmatrix}L_{11}^{T}L_{11}&L_{11}^{T}L_{12}\\ L_{12}^{T}L_{11}&L_{12}^{T}L_{12}\end{bmatrix}, (20)

we thus remark that it is cheaper to compute (3), rather than (20). This is due to PAPTPAP^{T} is denser than PHPTPHP^{T}. Hence, it is better to compute the hub matrix LLTLL^{T} first, when compared with the authoritative matrix LTLL^{T}L, so does the hub and authoritative matrices’ primitive modifications.

Remark 3.4.

Since one can permutate LL such that (6) holds, however, the web adjacency matrix is very sparse (usually there is only about ten entries per row), one can even permutate LL such that P~LP~T=[L~110L~210]\widetilde{P}L\widetilde{P}^{T}=\begin{bmatrix}\widetilde{L}_{11}&0\\ \widetilde{L}_{21}&0\end{bmatrix}, this phenomenon can be verified by web data matrix from on-line Florida sparse matrix collection222https://sparse.tamu.edu/. Particularly, the number of all zero columns may be more than the number of all zero rows. In this case, the authoritative ranking vector is recommended to have priority in computing. But note that this case is very rare.

4 Conclusion

In this paper, we have studied a HITS computation approach via lumping dangling nodes of hub matrix LLTLL^{T} into a single node. Thus we have answered the HITS computation question leaving in the introduction.

The HITS can be computed by lumping approach despite the involved matrices are not stochastic. The approach which we discuss is useful whether the HITS model is search-dependent or not. For the hub vector, Theorem 3.2 shows us that the rankings of nondangling nodes can be computed independently from that of dangling nodes; while rankings of dangling nodes depends on the rankings of nondangling nodes. According to Remark 3.3, the authoritative vector is relatively difficult to compute when compared with the hub vector. Thus we suggest that it is better to compute the hub vector in priority rather than authoritative vector or both of them simultaneously.

Further researches may include how to compute SALSA by the lumping method. Questions like the ranking vector relationship of dangling nodes and nondangling nodes in SALSA model are also worthy of studying.

References

References

  • Page et al. [9 66] L. Page, S. Brin, R. Motwani, T. Winograd, The PageRank citation ranking: Bringing order to the web, Stanford Digital Libraries, 1999, (available online from http:// dbpubs.stanford.edu:8090/pub/1999-66.).
  • Kleinberg [1999] J. M. Kleinberg, Authoritative Sources in a Hyperlinked Environment, J. ACM 46 (1999) 604–632.
  • Lempel and Moran [2000] R. Lempel, S. Moran, The stochastic approach for link-structure analysis (SALSA) and the TKC effect, Computer Networks 33 (2000) 387–401.
  • Langville and Meyer [2005] A. N. Langville, C. D. Meyer, A survey of eigenvector methods for web information retrieval, SIAM review 47 (2005) 135–161.
  • Langville and Meyer [2006] A. N. Langville, C. D. Meyer, Google’s PageRank and Beyond: The Science of Search Engine Rankings, Princeton University Press, 2006.
  • Zhou [2012] Y.-K. Zhou, Practical acceleration for computing the HITS ExpertRank vectors, Journal of Computational and Applied Mathematics 236 (2012) 4398–4409.
  • Eldén [2007] L. Eldén, Matrix methods in data mining and pattern recognition, SIAM, 2007.
  • Brezinski and Redivo-Zaglia [2006] C. Brezinski, M. Redivo-Zaglia, The PageRank Vector: Properties, Computation, Approximation, and Acceleration, SIAM Journal on Matrix Analysis and Applications 28 (2006) 551–575.
  • Feng et al. [2021] Y.-H. Feng, J.-X. You, Y.-X. Dong, An Extrapolation Iteration and Its Lumped Type Iteration for Computing PageRank, Bulletin of the Iranian Mathematical Society (2021) 1–18.
  • Lee et al. [2007] C. P. Lee, G. H. Golub, S. A. Zenios, A Two-Stage Algorithm for Computing PageRank and Multistage Generalizations, Internet Mathematics 4 (2007) 299–327.
  • Dong et al. [2017] Y.-X. Dong, C.-Q. Gu, Z.-B. Chen, An Arnoldi-Inout method accelerated with a two-stage matrix splitting iteration for computing PageRank, Calcolo 54 (2017) 1–23.
  • Ipsen and Selee [2007] I. C. F. Ipsen, T. M. Selee, PageRank computation with special attention to dangling nodes, SIAM Journal on Matrix Analysis and Applications 29 (2007) 1281–1296.
  • Dong et al. [2021] Y.-X. Dong, Y.-H. Feng, J.-X. You, J.-R. Guan, Comments on lumping the Google matrix, arXiv preprint arXiv:2107.11080 (2021).