This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Sublinear Time Nearest Neighbor Search over Generalized Weighted Manhattan Distance

Huan Hu
Harbin Institute of Technology, China
hit_huhuan@foxmail.com
&Jianzhong Li
Harbin Institute of Technology, China
lijzh@hit.edu.cn
Abstract

Nearest Neighbor Search (NNS) over generalized weighted distances is fundamental to a wide range of applications. The problem of NNS over the generalized weighted square Euclidean distance has been studied in previous work. However, numerous studies have shown that the Manhattan distance could be more effective than the Euclidean distance for high-dimensional NNS, which indicates that the generalized weighted Manhattan distance is possibly more practical than the generalized weighted square Euclidean distance in high dimensions. To the best of our knowledge, no prior work solves the problem of NNS over the generalized weighted Manhattan distance in sublinear time. This paper achieves the goal by proposing two novel hashing schemes (dwl1,l2d_{w}^{l_{1}},l_{2})-ALSH and (dwl1,θd_{w}^{l_{1}},\theta)-ALSH.

1 Introduction

Nearest Neighbor Search (NNS) over generalized weighted distances is fundamental to a wide variety of applications, such as personalized recommendation [11, 14, 18] and kNN classification [3, 19]. Given a set of nn data points DdD\subset\mathbb{R}^{d} and a query point qdq\in\mathbb{R}^{d} with a weight vector wdw\in\mathbb{R}^{d}, NNS over a generalized weighted distance, denoted by dwd_{w}, is to find a point oDo^{*}\in D such that oo^{*} is the closest data point to qq for dwd_{w}. Formally, the goal of NNS over dwd_{w} is to return

o=argminoDdw(o,q).o^{*}=\arg\min_{o\in D}d_{w}(o,q). (1)

Note that the weight vector ww is specified along with qq rather than pre-specified. Moreover, each element of ww can be either positive or non-positive.

The generalized weighted Manhattan distance, denoted by dwl1d_{w}^{l_{1}}, and the generalized weighted square Euclidean distance, denoted by dwl2d_{w}^{l_{2}}, are two typical generalized weighted distances which are derived from the Manhattan distance and the Euclidean distance, respectively. For any two points o=(o1,o2,,od)do=(o_{1},o_{2},\ldots,o_{d})\in\mathbb{R}^{d} and q=(q1,q2,,qd)dq=(q_{1},q_{2},\ldots,q_{d})\in\mathbb{R}^{d}, the distances dwl1(o,q)d_{w}^{l_{1}}(o,q) and dwl2(o,q)d_{w}^{l_{2}}(o,q) are respectively computed as follows:

dwl1(o,q)=i=1dwi|oiqi|dwl2(o,q)=i=1dwi(oiqi)2,\begin{split}d_{w}^{l_{1}}(o,q)=&\sum_{i=1}^{d}w_{i}\left|o_{i}-q_{i}\right|\\ d_{w}^{l_{2}}(o,q)=&\sum_{i=1}^{d}w_{i}\left(o_{i}-q_{i}\right)^{2},\end{split} (2)

where w=(w1,w2,,wd)dw=(w_{1},w_{2},\ldots,w_{d})\in\mathbb{R}^{d}. A recent paper [16] studied the problem of NNS over dwl2d_{w}^{l_{2}} and provided two sublinear time solutions for it. However, to the best of our knowledge, there is no prior work that solves the problem of NNS over dwl1d_{w}^{l_{1}} in sublinear time. Actually, plenty of studies [1, 12] have shown that the Manhattan distance could be more effective than the Euclidean distance for producing meaningful NNS results in high-dimensional spaces. It indicates that NNS over dwl1d_{w}^{l_{1}} is possibly more practical than NNS over dwl2d_{w}^{l_{2}} in many real scenarios. In this paper, we target to propose sublinear time methods for efficiently solving the problem of NNS over dwl1d_{w}^{l_{1}}.

As a matter of fact, existing methods can not handle NNS over dwl1d_{w}^{l_{1}} well. Specifically, the brute-force linear scan scales linearly with data size and thus may yield unsatisfactory performance. The conventional spatial index-based methods [2, 6, 22] can only perform well for NNS in low dimensions due to the “curse of dimensionality” [5]. Locality-Sensitive Hashing (LSH) [27] is a popular approach for approximate NNS and exhibits good performance for high-dimensional cases. In the literature, a number of efficient LSH schemes [7, 9, 10, 13, 15, 17, 26, 28] have been proposed based on LSH families, and some of them can answer NNS queries even in sublinear time. Unfortunately, they can not be applied to answer the NNS queries over dwl1d_{w}^{l_{1}} unless ww is fixed to an all-1 vector.

Recently, Asymmetric Locality-Sensitive Hashing (ALSH) was extended from LSH so that the problems of Maximum Inner Product Search (MIPS) and NNS over dwl2d_{w}^{l_{2}} can be addressed in sublinear time [16, 20, 21, 23, 24]. An ALSH scheme relies on an ALSH family. As far as we know, there is no ALSH family proposed for NNS over dwl1d_{w}^{l_{1}} in previous work. To provide sublinear time solutions for NNS over dwl1d_{w}^{l_{1}} in this paper, we follow the ALSH approach to propose ALSH schemes by introducing ALSH families that are suitable for NNS over dwl1d_{w}^{l_{1}}.

Outline. In Section 2, we review the approaches of LSH and ALSH. In Section 3, we show that there is no LSH or ALSH family for NNS over dwl1d_{w}^{l_{1}} over the entire space d\mathbb{R}^{d} and there is no LSH family for NNS over dwl1d_{w}^{l_{1}} over the bounded spaces in d\mathbb{R}^{d}. Then we seek to find ALSH families for NNS over dwl1d_{w}^{l_{1}} over the bounded spaces in d\mathbb{R}^{d}. As a result, we propose two suitable ALSH families and further obtain two sublinear time ALSH schemes (dwl1,l2d_{w}^{l_{1}},l_{2})-ALSH and (dwl1,θd_{w}^{l_{1}},\theta)-ALSH in Section 4.

2 Preliminaries

Before introducing our proposed solutions to the problem of NNS over dwl1d_{w}^{l_{1}}, we first present the preliminaries on LSH and ALSH.

2.1 Locality-Sensitive Hashing

Let d(,)d(\cdot,\cdot) be a distance function and 𝒵\mathcal{Z} be the space where d(,)d(\cdot,\cdot) is defined. Assume that data points and query points are located in 𝒳𝒵\mathcal{X}\subseteq\mathcal{Z} and 𝒴𝒵\mathcal{Y}\subseteq\mathcal{Z}, respectively. Then, an LSH family is formally defined as follows.

Definition 1 (LSH Family)

An LSH family (h)={h:𝒵BucketIDs}\mathcal{H}_{(h)}=\{h:\mathcal{Z}\rightarrow BucketIDs\} is called (R1,R2,P1,P2)(R_{1},R_{2},P_{1},P_{2})-sensitive if for any o𝒳o\in\mathcal{X} and q𝒴q\in\mathcal{Y}, the following conditions are satisfied:

  • If d(o,q)R1d(o,q)\leq R_{1}, then Pr[h(o)=h(q)]P1[h(o)=h(q)]\geq P_{1};

  • If d(o,q)R2d(o,q)\geq R_{2}, then Pr[h(o)=h(q)]P2[h(o)=h(q)]\leq P_{2};

  • R1<R2R_{1}<R_{2} and P1>P2P_{1}>P_{2}.

As we can see from Definition 1, an LSH family is essentially a set of hash functions that can hash closer points into the same bucket with higher probability. Thus, the basic idea of an LSH scheme is to use an LSH family to hash points such that only the data points that have the same hash code as the query point are likely to be retrieved to find approximate nearest neighbors. In the following, we review two popular LSH families that were proposed for the l2l_{2} distance (a.k.a. the Euclidean distance) and the Angular distance, respectively.

The l2l_{2} distance between any two points o,qdo,q\in\mathbb{R}^{d} is computed as dl2(o,q)=oq2d^{l_{2}}(o,q)=\|o-q\|_{2}, where 2\|\cdot\|_{2} is the l2l_{2}-norm of a vector. The LSH family proposed for the l2l_{2} distance in [7] is (hl2)={hl2:d}\mathcal{H}_{(h^{l_{2}})}=\{h^{l_{2}}:\mathbb{R}^{d}\rightarrow\mathbb{Z}\}, where

hl2(x)=aTx+bw,h^{l_{2}}(x)=\lfloor\frac{a^{T}x+b}{w}\rfloor, (3)

aa is a dd-dimensional vector where each element is chosen independently from the standard normal distribution, bb is a real number chosen uniformly at random from [0,w][0,w], and ww is a user-specified positive constant. Let r=dl2(o,q)r=d^{l_{2}}(o,q). The collision probability function is

Pl2(r)=Pr[hl2(o)=hl2(q)]=12Φ(w/r)22π(w/r)(1e(w2/2r2)),P^{l_{2}}(r)=Pr[h^{l_{2}}(o)=h^{l_{2}}(q)]=1-2\Phi(-w/r)-\frac{2}{\sqrt{2\pi}(w/r)}(1-e^{-(w^{2}/2r^{2})}), (4)

where Φ()\Phi(\cdot) is the cumulative distribution function of the standard normal distribution [7].

The Angular distance between any two points o,qdo,q\in\mathbb{R}^{d} is computed as dθ(o,q)=arccos(oTqo2q2)d^{\theta}(o,q)=\arccos(\frac{o^{T}q}{\|o\|_{2}\|q\|_{2}}). The LSH family proposed for the Angular distance in [4] is (hθ)={hθ:d{0,1}}\mathcal{H}_{(h^{\theta})}=\{h^{\theta}:\mathbb{R}^{d}\rightarrow\{0,1\}\}, where

hθ(x)={0ifaTx<01ifaTx0h^{\theta}(x)=\begin{cases}0&\text{if}\ a^{T}x<0\\ 1&\text{if}\ a^{T}x\geq 0\end{cases} (5)

and aa is a dd-dimensional vector where each element is chosen independently from the standard normal distribution. Let r=dθ(o,q)r=d^{\theta}(o,q). The collision probability function is

Pθ(r)=Pr[hθ(o)=hθ(q)]=1rπ.P^{\theta}(r)=Pr[h^{\theta}(o)=h^{\theta}(q)]=1-\frac{r}{\pi}. (6)

2.2 Asymmetric Locality-Sensitive Hashing

Recent studies have shown that ALSH is an effective approach for solving the problems of MIPS and NNS over dwl2d_{w}^{l_{2}} [16, 21, 23, 24]. An ALSH scheme processes NNS queries in a similar way to an LSH scheme. It relies on an ALSH family. Formally, the definition of an ALSH family is as follows.

Definition 2 (ALSH Family)

An ALSH family (f,g)={f:𝒵BucketIDs}{g:𝒵BucketIDs}\mathcal{H}_{(f,g)}=\{f:\mathcal{Z}\rightarrow BucketIDs\}\bigcup\{g:\mathcal{Z}\rightarrow BucketIDs\} is called (R1,R2,P1,P2)(R_{1},R_{2},P_{1},P_{2})-sensitive if for any data point o𝒳o\in\mathcal{X} and query point q𝒴q\in\mathcal{Y}, the following conditions are satisfied:

  • If d(o,q)R1d(o,q)\leq R_{1}, then Pr[f(o)=g(q)]P1[f(o)=g(q)]\geq P_{1};

  • If d(o,q)R2d(o,q)\geq R_{2}, then Pr[f(o)=g(q)]P2[f(o)=g(q)]\leq P_{2};

  • R1<R2R_{1}<R_{2} and P1>P2P_{1}>P_{2}.

From Definition 2, we can see that an ALSH family (f,g)\mathcal{H}_{(f,g)} consists of a set of hash functions {f:𝒵BucketIDs}\{f:\mathcal{Z}\rightarrow BucketIDs\} for data points and a set of hash functions {g:𝒵BucketIDs}\{g:\mathcal{Z}\rightarrow BucketIDs\} for query points, and it ensures that each query point can collide with closer data points with higher probability. In practice, (f,g)\mathcal{H}_{(f,g)} is often implemented with an LSH family (h)={h:𝒵BucketIDs}\mathcal{H}_{(h^{\prime})}=\{h^{\prime}:\mathcal{Z}^{\prime}\rightarrow BucketIDs\} and two vector functions called Preprocessing Transformation P:𝒳𝒳P:\mathcal{X}\rightarrow\mathcal{X}^{\prime} and Query Transformation Q:𝒴𝒴Q:\mathcal{Y}\rightarrow\mathcal{Y}^{\prime} respectively [16, 21, 23, 24] (here, 𝒳𝒵\mathcal{X}^{\prime}\subseteq\mathcal{Z}^{\prime} and 𝒴𝒵\mathcal{Y}^{\prime}\subseteq\mathcal{Z}^{\prime}). Thus, the hash value of each data point o𝒳o\in\mathcal{X} is computed as f(o)=h(P(o))f(o)=h^{\prime}(P(o)) and the hash value of each query point q𝒴q\in\mathcal{Y} is computed as g(q)=h(Q(q))g(q)=h^{\prime}(Q(q)).

Fundamentally, both LSH and ALSH schemes obtain approximate nearest neighbors by efficiently solving the (R1,R2R_{1},R_{2})-Near Neighbor Search ((R1,R2R_{1},R_{2})-NNS) problem as follows.

Definition 3 ((R𝟏,R𝟐R_{1},R_{2})-NNS)

Given a distance function d(,)d(\cdot,\cdot), two distance thresholds R1R_{1} and R2R_{2} (R1<R2R_{1}<R_{2}) and a data set D𝒳D\subset\mathcal{X}, for any query point q𝒴q\in\mathcal{Y}, the (R1,R2)(R_{1},R_{2})-NNS problem is to return a point oDo\in D satisfying d(o,q)R2d(o,q)\leq R_{2} if there exists a point oDo^{\prime}\in D satisfying d(o,q)R1d(o^{\prime},q)\leq R_{1}.

The theorem below indicates that the (R1,R2)(R_{1},R_{2})-NNS problem can be solved with an LSH or ALSH scheme in sublinear time.

Theorem 1

[7, 16, 21, 23, 24] Given an (R1,R2,P1,P2)(R_{1},R_{2},P_{1},P_{2})-sensitive LSH or ALSH family, one can construct a data structure for solving the problem of (R1,R2)(R_{1},R_{2})-NNS with O(nρdlogn)O(n^{\rho}d\log n) query time and O(n1+ρ)O(n^{1+\rho}) space, where ρ=logP1logP2<1\rho=\frac{\log P_{1}}{\log P_{2}}<1.

3 Negative Results

In this section, we present some negative results on the existence of LSH and ALSH families for NNS over dwl1d_{w}^{l_{1}}.

The following theorem indicates that it is impossible to find an LSH or ALSH family for NNS over dwl1d_{w}^{l_{1}} over 𝒳=𝒴=d\mathcal{X}=\mathcal{Y}=\mathbb{R}^{d} (d3d\geq 3).

Theorem 2

For any d3d\geq 3, R1<R2R_{1}<R_{2} and P1>P2P_{1}>P_{2}, there is no (R1,R2,P1,P2)(R_{1},R_{2},P_{1},P_{2})-sensitive LSH or ALSH family for NNS over dwl1d_{w}^{l_{1}} over 𝒳=𝒴=d\mathcal{X}=\mathcal{Y}=\mathbb{R}^{d}.

Proof.  An LSH (or ALSH) family for NNS over dwl1d_{w}^{l_{1}} over d\mathbb{R}^{d} (d>3d>3) is also an LSH (or ALSH) family for NNS over dwl1d_{w}^{l_{1}} over a three-dimensional subspace, i.e., over 3\mathbb{R}^{3}. Hence, we only need to prove that there is no LSH or ALSH family for NNS over dwl1d_{w}^{l_{1}} over 3\mathbb{R}^{3}. Assume by contradiction that for some R1<R2R_{1}<R_{2} and P1>P2P_{1}>P_{2} there exists an (R1,R2,P1,P2)(R_{1},R_{2},P_{1},P_{2})-sensitive LSH family (h)\mathcal{H}_{(h)} or ALSH family (f,g)\mathcal{H}_{(f,g)} for NNS over dwl1d_{w}^{l_{1}} over 3\mathbb{R}^{3}. Consider a set of NN data points {o1,o2,,oN}3\{o^{1},o^{2},\ldots,o^{N}\}\subset\mathbb{R}^{3} and a set of NN query points {q1,q2,,qN}3\{q^{1},q^{2},\ldots,q^{N}\}\subset\mathbb{R}^{3}, where for 1i,jN1\leq i,j\leq N,

oi\displaystyle o^{i} =(iR1iR2,0,0)\displaystyle=(iR_{1}-iR_{2},0,0) (7)
qj\displaystyle q^{j} =(0,jR1jR2,R1).\displaystyle=(0,jR_{1}-jR_{2},R_{1}).

The weight vector specified along with each query point is set as follows:

w={(1,1,1)ifR1<0(1,1,1)ifR10.w=\begin{cases}(1,-1,-1)&\text{if}\ R_{1}<0\\ (1,-1,1)&\text{if}\ R_{1}\geq 0.\end{cases} (8)

Thus, dwl1(oi,qj)=(ij)(R2R1)+R1d_{w}^{l_{1}}(o^{i},q^{j})=(i-j)(R_{2}-R_{1})+R_{1} for 1i,jN1\leq i,j\leq N. As can be seen, dwl1(oi,qj)R1d_{w}^{l_{1}}(o^{i},q^{j})\leq R_{1} if 1ijN1\leq i\leq j\leq N and dwl1(oi,qj)R2d_{w}^{l_{1}}(o^{i},q^{j})\geq R_{2} if 1j<iN1\leq j<i\leq N. Let AN×NA\in\mathbb{R}^{N\times N} be a sign matrix where each element is

A(i,j)={1ifdwl1(oi,qj)R11ifdwl1(oi,qj)R20otherwise.A(i,j)=\begin{cases}1&\text{if}\ d_{w}^{l_{1}}(o^{i},q^{j})\leq R_{1}\\ -1&\text{if}\ d_{w}^{l_{1}}(o^{i},q^{j})\geq R_{2}\\ 0&\text{otherwise}.\end{cases} (9)

Obviously, AA is triangular with +1 on and above the diagonal and -1 below it. Consider also the matrix BN×NB\in\mathbb{R}^{N\times N} of collision probabilities B(i,j)=Pr[h(oi)=h(qj)]B(i,j)=Pr[h(o^{i})=h(q^{j})] (for (h)\mathcal{H}_{(h)}) or B(i,j)=Pr[f(oi)=g(qj)]B(i,j)=Pr[f(o^{i})=g(q^{j})] (for (f,g)\mathcal{H}_{(f,g)}). Let θ=P1+P12<1\theta=\frac{P_{1}+P_{1}}{2}<1 and ϵ=P1P22>0\epsilon=\frac{P_{1}-P_{2}}{2}>0. It is easy to know that A(i,j)(B(i,j)θ)ϵA(i,j)(B(i,j)-\theta)\geq\epsilon for 1i,jN1\leq i,j\leq N. That is,

ABθϵ1,A\odot\frac{B-\theta}{\epsilon}\geq 1, (10)

where \odot denotes the Hadamard (element-wise) product. From [25], the margin complexity of the sign matrix AA is mc(A)=infAC1Cmaxmc(A)=\inf_{A\odot C\geq 1}\|C\|_{max}, where max\|\cdot\|_{max} is the max-norm of a matrix. Since AA is also an N×NN\times N triangular matrix, the margin complexity of AA is bounded by mc(A)=Ω(logN)mc(A)=\Omega(\log N) according to [8]. Therefore, from Equation 10, we can obtain

Bθϵmax=Ω(logN).\|\frac{B-\theta}{\epsilon}\|_{max}=\Omega(\log N). (11)

Since BB is a collision probability matrix, the max-norm of BB satisfies Bmax1\|B\|_{max}\leq 1 [20]. Shifting BB by 0<θ<10<\theta<1 changes Bmax\|B\|_{max} by at most θ\theta. Thus, we have

Bθmax<2.\|B-\theta\|_{max}<2. (12)

Combining Equations 11 and 12, we can easily derive that ϵ=O(1logN)\epsilon=O(\frac{1}{\log N}). For any ϵ=P1P22>0\epsilon=\frac{P_{1}-P_{2}}{2}>0, we get a contradiction by selecting a large enough NN. \square

The proof of Theorem 2 is similar to that of Theorem 3.1 in [21]. Due to space limitations, for the details of the max-norm and margin complexity involved in the proof of Theorem 2, please refer to http://proceedings.mlr.press/v37/neyshabur15-supp.pdf.

Actually, in real scenarios data points and query points are usually located in bounded spaces. Consider the typical case of 𝒳=𝒴=𝒮\mathcal{X}=\mathcal{Y}=\mathcal{S}, where 𝒮d\mathcal{S}\subset\mathbb{R}^{d} is a bounded space. The following theorem shows nonexistence of an LSH family for NNS over dwl1d_{w}^{l_{1}} over 𝒳=𝒴=𝒮\mathcal{X}=\mathcal{Y}=\mathcal{S}.

Theorem 3

For any d>0d>0, R1<R2R_{1}<R_{2} and P1>P2P_{1}>P_{2}, there is no (R1,R2,P1,P2)(R_{1},R_{2},P_{1},P_{2})-sensitive LSH family for NNS over dwl1d_{w}^{l_{1}} over 𝒳=𝒴=𝒮\mathcal{X}=\mathcal{Y}=\mathcal{S}.

Proof.  Assume by contradiction that for some d>0d>0, R1<R2R_{1}<R_{2} and P1>P2P_{1}>P_{2} there exists an (R1,R2,P1,P2)(R_{1},R_{2},P_{1},P_{2})-sensitive LSH family (h)\mathcal{H}_{(h)} for NNS over dwl1d_{w}^{l_{1}} over 𝒮\mathcal{S}. Let o,q𝒮o,q\in\mathcal{S} where oqo\neq q111Ignore the trivial case that 𝒮\mathcal{S} contains only a single point.. As wdw\in\mathbb{R}^{d}, we can always set ww to a value such that dwl1(o,q)=R1d_{w}^{l_{1}}(o,q)=R_{1} and thus Pr[h(o)=h(q)]P1Pr[h(o)=h(q)]\geq P_{1}. Moreover, we can always set ww to another value such that dwl1(o,q)=R2d_{w}^{l_{1}}(o,q)=R_{2} and thus Pr[h(o)=h(q)]P2Pr[h(o)=h(q)]\leq P_{2}. However, since data points should be hashed before queries arrive, (h)\mathcal{H}_{(h)} can not involve ww. So Pr[h(o)=h(q)]Pr[h(o)=h(q)] is not affected by ww, which leads to a contradiction. \square

Due to the negative results in Theorems 2 and 3, we seek to propose ALSH families for NNS over dwl1d_{w}^{l_{1}} over bounded spaces in Section 4. Notice that if an ALSH family is suitable for NNS over dwl1d_{w}^{l_{1}} over 𝒳=𝒴=[Ml,Mu]d\mathcal{X}=\mathcal{Y}=[M_{l},M_{u}]^{d} (Ml<MuM_{l}<M_{u}), it must also be suitable for NNS over dwl1d_{w}^{l_{1}} over 𝒳=𝒴=𝒮\mathcal{X}=\mathcal{Y}=\mathcal{S} for any 𝒮[Ml,Mu]d\mathcal{S}\subseteq[M_{l},M_{u}]^{d}. Thus, it is sufficient to deal with the case of 𝒳=𝒴=[Ml,Mu]d\mathcal{X}=\mathcal{Y}=[M_{l},M_{u}]^{d}. Further, suppose 𝒳=𝒴=[0,MuMl]d\mathcal{X}=\mathcal{Y}=[0,M_{u}-M_{l}]^{d}. Otherwise, it can be satisfied by shifting o,q[Ml,Mu]do,q\in[M_{l},M_{u}]^{d} without changing the results of NNS over dwl1d_{w}^{l_{1}}.

4 Our Solutions

Let M=(MuMl)tM=\lfloor(M_{u}-M_{l})t\rfloor (t>0t>0). The following Observation 1 indicates that if we find an ALSH family for NNS over dwl1d_{w}^{l_{1}} over 𝒳=𝒴={0,1,,M}d\mathcal{X}=\mathcal{Y}=\{0,1,\ldots,M\}^{d}, a similar ALSH family can be immediately obtained for NNS over dwl1d_{w}^{l_{1}} over 𝒳=𝒴=[0,MuMl]d\mathcal{X}=\mathcal{Y}=[0,M_{u}-M_{l}]^{d}. Thus, we only need to consider NNS over dwl1d_{w}^{l_{1}} over 𝒳=𝒴={0,1,,M}d\mathcal{X}=\mathcal{Y}=\{0,1,\ldots,M\}^{d} in the rest of the paper. Note that in our solutions MM can be an arbitrary positive integer.

Observation 1

Define a vector function: ut(x)=xt=(x1t,x2t,,xdt)u_{t}(x)=\lfloor xt\rfloor=(\lfloor x_{1}t\rfloor,\lfloor x_{2}t\rfloor,\ldots,\lfloor x_{d}t\rfloor), where x=(x1,x2,,xd)[0,MuMl]dx=(x_{1},x_{2},\ldots,x_{d})\in[0,M_{u}-M_{l}]^{d} and t>0t>0. For any d>0d>0, R1<R2R_{1}<R_{2} and P1>P2P_{1}>P_{2}, if (f,g)\mathcal{H}_{(f,g)} is an (R1,R2,P1,P2)(R_{1},R_{2},P_{1},P_{2})-sensitive ALSH family for NNS over dwl1d_{w}^{l_{1}} over 𝒳=𝒴={0,1,,(MuMl)t}d\mathcal{X}=\mathcal{Y}=\{0,1,\ldots,\lfloor(M_{u}-M_{l})t\rfloor\}^{d}, then (fut,gut)\mathcal{H}_{(f\circ u_{t},g\circ u_{t})} must be an (R1,R2,P1,P2)(R_{1}^{\prime},R_{2}^{\prime},P_{1},P_{2})-sensitive ALSH family for NNS over dwl1d_{w}^{l_{1}} over 𝒳=𝒴=[0,MuMl]d\mathcal{X}=\mathcal{Y}=[0,M_{u}-M_{l}]^{d}, where R1=(R1i=1d|wi|)/tR_{1}^{\prime}=(R_{1}-\sum_{i=1}^{d}\left|w_{i}\right|)/t and R2=(R2+i=1d|wi|)/tR_{2}^{\prime}=(R_{2}+\sum_{i=1}^{d}\left|w_{i}\right|)/t.

Proof.  Let o,q[0,MuMl]do,q\in[0,M_{u}-M_{l}]^{d}. Then ot,qt{0,1,,(MuMl)t}d\lfloor ot\rfloor,\lfloor qt\rfloor\in\{0,1,\ldots,\lfloor(M_{u}-M_{l})t\rfloor\}^{d}. A simple calculation shows that dwl1(ot,qt)i=1d|wi|dwl1(ot,qt)=tdwl1(o,q)dwl1(ot,qt)+i=1d|wi|d_{w}^{l_{1}}(\lfloor ot\rfloor,\lfloor qt\rfloor)-\sum_{i=1}^{d}\left|w_{i}\right|\leq d_{w}^{l_{1}}(ot,qt)=td_{w}^{l_{1}}(o,q)\leq d_{w}^{l_{1}}(\lfloor ot\rfloor,\lfloor qt\rfloor)+\sum_{i=1}^{d}\left|w_{i}\right| holds. Thus, we have that dwl1(ot,qt)R1d_{w}^{l_{1}}(\lfloor ot\rfloor,\lfloor qt\rfloor)\leq R_{1} if dwl1(o,q)R1d_{w}^{l_{1}}(o,q)\leq R_{1}^{\prime} and dwl1(ot,qt)R2d_{w}^{l_{1}}(\lfloor ot\rfloor,\lfloor qt\rfloor)\geq R_{2} if dwl1(o,q)R2d_{w}^{l_{1}}(o,q)\geq R_{2}^{\prime}. Further, since f(ot)=f(ut(o))=(fut)(o)f(\lfloor ot\rfloor)=f(u_{t}(o))=(f\circ u_{t})(o) and g(qt)=g(ut(q))=(gut)(q)g(\lfloor qt\rfloor)=g(u_{t}(q))=(g\circ u_{t})(q), we have that Pr[(fut)(o)=(gut)(q)]P1Pr[(f\circ u_{t})(o)=(g\circ u_{t})(q)]\geq P_{1} if dwl1(o,q)R1d_{w}^{l_{1}}(o,q)\leq R_{1}^{\prime} and Pr[(fut)(o)=(gut)(q)]P2Pr[(f\circ u_{t})(o)=(g\circ u_{t})(q)]\leq P_{2} if dwl1(o,q)R2d_{w}^{l_{1}}(o,q)\geq R_{2}^{\prime}. As a result, (fut,gut)\mathcal{H}_{(f\circ u_{t},g\circ u_{t})} is an (R1,R2,P1,P2)(R_{1}^{\prime},R_{2}^{\prime},P_{1},P_{2})-sensitive ALSH family for NNS over dwl1d_{w}^{l_{1}} over 𝒳=𝒴=[0,MuMl]d\mathcal{X}=\mathcal{Y}=[0,M_{u}-M_{l}]^{d} (note: R1<R2R_{1}^{\prime}<R_{2}^{\prime} always holds since R1<R2R_{1}<R_{2}). \square

4.1 From NNS over dwl1d_{w}^{l_{1}} to MIPS

In the following, we take two steps to convert the problem of NNS over dwl1d_{w}^{l_{1}} into the problem of MIPS. As a result of these steps, a novel preprocessing transformation and query transformation are introduced for data points and query points, respectively. The two transformations are essential to our solutions.

Step 1: Convert NNS over dwl1d_{w}^{l_{1}} into NNS over dwHd_{w}^{H}

The generalized weighted Hamming distance dwHd_{w}^{H} is defined on the Hamming space and computed in the same way as the generalized weighted Manhattan distance dwl1d_{w}^{l_{1}}. That is, dwH(o,q)=i=1dwi|oiqi|d_{w}^{H}(o,q)=\sum_{i=1}^{d}w_{i}\left|o_{i}-q_{i}\right| for any w=(w1,w2,,wd)dw=(w_{1},w_{2},\ldots,w_{d})\in\mathbb{R}^{d}, o=(o1,o2,,od){0,1}do=(o_{1},o_{2},\ldots,o_{d})\in\{0,1\}^{d} and q=(q1,q2,,qd){0,1}dq=(q_{1},q_{2},\ldots,q_{d})\in\{0,1\}^{d}.

Inspired by [10], we complete this step by applying unary coding. Specifically, each point x=(x1,x2,,xd){0,1,,M}dx=(x_{1},x_{2},\ldots,x_{d})\in\{0,1,\ldots,M\}^{d} is mapped into a binary vector v(x)=(Unary(x1);Unary(x2);;Unary(xd)){0,1}Mdv(x)=(\text{Unary}(x_{1});\text{Unary}(x_{2});\ldots;\text{Unary}(x_{d}))\in\{0,1\}^{Md}, where (;) is the concatenation and each Unary(xi)=(xi1,xi2,,xiM)\text{Unary}(x_{i})=(x_{i1},x_{i2},\ldots,x_{iM}) is the unary representation of xix_{i}, i.e., a sequence of xix_{i} 1’s followed by (Mxi)(M-x_{i}) 0’s. Then |oiqi|=j=1M|oijqij|\left|o_{i}-q_{i}\right|=\sum_{j=1}^{M}\left|o_{ij}-q_{ij}\right| for o=(o1,o2,,od){0,1,,M}do=(o_{1},o_{2},\ldots,o_{d})\in\{0,1,\ldots,M\}^{d}, q=(q1,q2,,qd){0,1,,M}dq=(q_{1},q_{2},\ldots,q_{d})\in\{0,1,\ldots,M\}^{d} and 1id1\leq i\leq d. Moreover, the weight vector w=(w1,w2,,wd)w=(w_{1},w_{2},\ldots,w_{d}) is mapped into I(w)=(I(w1);I(w2);;I(wd))I(w)=(I(w_{1});I(w_{2});\ldots;I(w_{d})), where each I(wi)I(w_{i}) is a sequence of MM wiw_{i}’s. As a result, we have

dwl1(o,q)=i=1dwi|oiqi|=i=1dwi(j=1M|oijqij|)=i=1dj=1Mwi|oijqij|=dwH(v(o),v(q)),\begin{split}d_{w}^{l_{1}}(o,q)=&\sum_{i=1}^{d}w_{i}\left|o_{i}-q_{i}\right|\\ =&\sum_{i=1}^{d}w_{i}(\sum_{j=1}^{M}\left|o_{ij}-q_{ij}\right|)\\ =&\sum_{i=1}^{d}\sum_{j=1}^{M}w_{i}\left|o_{ij}-q_{ij}\right|\\ =&d_{w}^{H}(v(o),v(q)),\end{split} (13)

where w=(w1,w2,,wd)dw=(w_{1},w_{2},\ldots,w_{d})\in\mathbb{R}^{d}, o=(o1,o2,,od){0,1,,M}do=(o_{1},o_{2},\ldots,o_{d})\in\{0,1,\ldots,M\}^{d} and q=(q1,q2,,qd){0,1,,M}dq=(q_{1},q_{2},\ldots,q_{d})\in\{0,1,\ldots,M\}^{d}. Equation 13 indicates that through the above mappings NNS over dwl1d_{w}^{l_{1}} over 𝒳=𝒴={0,1,,M}d\mathcal{X}=\mathcal{Y}=\{0,1,\ldots,M\}^{d} is converted into NNS over dwHd_{w}^{H} over 𝒳=𝒴={v(x)x{0,1,,M}d}{0,1}Md\mathcal{X}=\mathcal{Y}=\{v(x)\mid x\in\{0,1,\ldots,M\}^{d}\}\subset\{0,1\}^{Md}.

Step 2: Convert NNS over dwHd_{w}^{H} into MIPS

This step is based on the following observation.

Observation 2

For any oij,qij{0,1}o_{ij},q_{ij}\in\{0,1\} and wiw_{i}\in\mathbb{R}, the equation wi|oijqij|=wi(cos(π2oij),sin(π2oij))T(wicos(π2qij),wisin(π2qij))w_{i}\left|o_{ij}-q_{ij}\right|=w_{i}-\left(\cos(\frac{\pi}{2}o_{ij}),\sin(\frac{\pi}{2}o_{ij})\right)^{T}\left(w_{i}\cos(\frac{\pi}{2}q_{ij}),w_{i}\sin(\frac{\pi}{2}q_{ij})\right) always holds.

Proof.  We only need to check two cases. Case 1: If oij=qijo_{ij}=q_{ij}, then wi|oijqij|=0=wi(cos(π2oij),sin(π2oij))T(wicos(π2qij),wisin(π2qij))w_{i}\left|o_{ij}-q_{ij}\right|=0=w_{i}-\left(\cos(\frac{\pi}{2}o_{ij}),\sin(\frac{\pi}{2}o_{ij})\right)^{T}\left(w_{i}\cos(\frac{\pi}{2}q_{ij}),w_{i}\sin(\frac{\pi}{2}q_{ij})\right). Case 2: If oijqijo_{ij}\neq q_{ij}, then wi|oijqij|=wi=wi(cos(π2oij),sin(π2oij))T(wicos(π2qij),wisin(π2qij))w_{i}\left|o_{ij}-q_{ij}\right|=w_{i}=w_{i}-\left(\cos(\frac{\pi}{2}o_{ij}),\sin(\frac{\pi}{2}o_{ij})\right)^{T}\left(w_{i}\cos(\frac{\pi}{2}q_{ij}),w_{i}\sin(\frac{\pi}{2}q_{ij})\right). \square

For any x=(x1,x2,,xd){0,1,,M}dx=(x_{1},x_{2},\ldots,x_{d})\in\{0,1,\ldots,M\}^{d}, we define cos~(π2v(x))\widetilde{\cos}\left(\frac{\pi}{2}v(x)\right) and sin~(π2v(x))\widetilde{\sin}\left(\frac{\pi}{2}v(x)\right) as follows:

cos~(π2v(x))=(cos^(π2Unary(x1));cos^(π2Unary(x2));;cos^(π2Unary(xd)))\displaystyle\widetilde{\cos}\left(\frac{\pi}{2}v(x)\right)=\left(\widehat{\cos}\left(\frac{\pi}{2}\text{Unary}(x_{1})\right);\widehat{\cos}\left(\frac{\pi}{2}\text{Unary}(x_{2})\right);\ldots;\widehat{\cos}\left(\frac{\pi}{2}\text{Unary}(x_{d})\right)\right) (14)
sin~(π2v(x))=(sin^(π2Unary(x1));sin^(π2Unary(x2));;sin^(π2Unary(xd))),\displaystyle\widetilde{\sin}\left(\frac{\pi}{2}v(x)\right)=\left(\widehat{\sin}\left(\frac{\pi}{2}\text{Unary}(x_{1})\right);\widehat{\sin}\left(\frac{\pi}{2}\text{Unary}(x_{2})\right);\ldots;\widehat{\sin}\left(\frac{\pi}{2}\text{Unary}(x_{d})\right)\right), (15)

where

cos^(π2Unary(xi))=(cos(π2xi1),cos(π2xi2),,cos(π2xiM))\displaystyle\widehat{\cos}\left(\frac{\pi}{2}\text{Unary}(x_{i})\right)=\left(\cos\left(\frac{\pi}{2}x_{i1}\right),\cos\left(\frac{\pi}{2}x_{i2}\right),\ldots,\cos\left(\frac{\pi}{2}x_{iM}\right)\right) (16)
sin^(π2Unary(xi))=(sin(π2xi1),sin(π2xi2),,sin(π2xiM)).\displaystyle\widehat{\sin}\left(\frac{\pi}{2}\text{Unary}(x_{i})\right)=\left(\sin\left(\frac{\pi}{2}x_{i1}\right),\sin\left(\frac{\pi}{2}x_{i2}\right),\ldots,\sin\left(\frac{\pi}{2}x_{iM}\right)\right). (17)

According to Observation 2, we have

dwH(v(o),v(q))=i=1dj=1Mwi|oijqij|=i=1dj=1M(wi(cos(π2oij),sin(π2oij))T(wicos(π2qij),wisin(π2qij)))=Mi=1dwidIP(T1(v(o)),T2(v(q))),\begin{split}d_{w}^{H}(v(o),v(q))=&\sum_{i=1}^{d}\sum_{j=1}^{M}w_{i}\left|o_{ij}-q_{ij}\right|\\ =&\sum_{i=1}^{d}\sum_{j=1}^{M}\left(w_{i}-\left(\cos(\frac{\pi}{2}o_{ij}),\sin(\frac{\pi}{2}o_{ij})\right)^{T}\left(w_{i}\cos(\frac{\pi}{2}q_{ij}),w_{i}\sin(\frac{\pi}{2}q_{ij})\right)\right)\\ =&M\sum_{i=1}^{d}w_{i}-d^{IP}\left(T^{1}\left(v(o)\right),T^{2}\left(v(q)\right)\right),\\ \end{split} (18)

where w=(w1,w2,,wd)dw=(w_{1},w_{2},\ldots,w_{d})\in\mathbb{R}^{d}, o=(o1,o2,,od){0,1,,M}do=(o_{1},o_{2},\ldots,o_{d})\in\{0,1,\ldots,M\}^{d}, q=(q1,q2,,qd){0,1,,M}dq=(q_{1},q_{2},\ldots,q_{d})\in\{0,1,\ldots,M\}^{d}, dIP(,)d^{IP}(\cdot,\cdot) is the inner product of two vectors, and T1(v(o))T^{1}(v(o)) and T2(v(q))T^{2}(v(q)) are respectively as follows:

T1(v(o))=(cos~(π2v(o));sin~(π2v(o)))\displaystyle T^{1}(v(o))=(\widetilde{\cos}(\frac{\pi}{2}v(o));\widetilde{\sin}(\frac{\pi}{2}v(o))) (19)
T2(v(q))=(I(w)cos~(π2v(q));I(w)sin~(π2v(q))).\displaystyle T^{2}(v(q))=(I(w)\odot\widetilde{\cos}(\frac{\pi}{2}v(q));I(w)\odot\widetilde{\sin}(\frac{\pi}{2}v(q))). (20)

From Equation 18, we can see that NNS over dwHd_{w}^{H} over 𝒳=𝒴={v(x)x{0,1,,M}d}{0,1}Md\mathcal{X}=\mathcal{Y}=\{v(x)\mid x\in\{0,1,\ldots,M\}^{d}\}\subset\{0,1\}^{Md} can be converted into MIPS over 𝒳={T1(v(x))x{0,1,,M}d}{0,1}2Md\mathcal{X}=\{T^{1}(v(x))\mid x\in\{0,1,\ldots,M\}^{d}\}\subset\{0,1\}^{2Md} and 𝒴={T2(v(x))x{0,1,,M}d}2Md\mathcal{Y}=\{T^{2}(v(x))\mid x\in\{0,1,\ldots,M\}^{d}\}\subset\mathbb{R}^{2Md}.

To sum up, after Steps 1 and 2, we convert NNS over dwl1d_{w}^{l_{1}} into MIPS by using the composite functions T1(v())T^{1}(v(\cdot)) and T2(v())T^{2}(v(\cdot)) that respectively map data points and query points from {0,1,,M}d\{0,1,\ldots,M\}^{d} into two higher-dimensional spaces. Let P(o)=T1(v(o))P(o)=T^{1}(v(o)) and Qw(q)=T2(v(q))Q_{w}(q)=T^{2}(v(q)) for o,q{0,1,,M}do,q\in\{0,1,\ldots,M\}^{d}. The vector functions P()P(\cdot) and Qw()Q_{w}(\cdot) are respectively the preprocessing and query transformations for the ALSH families introduced later.

4.2 ALSH Schemes for NNS over dwl1d_{w}^{l_{1}}

Next, we formally present two ALSH schemes for NNS over dwl1d_{w}^{l_{1}}: the first one is called (dwl1,l2d_{w}^{l_{1}},l_{2})-ALSH and the second one is called (dwl1,θd_{w}^{l_{1}},\theta)-ALSH. (dwl1,l2d_{w}^{l_{1}},l_{2})-ALSH solves the problem of NNS over dwl1d_{w}^{l_{1}} by reducing it to the problem of NNS over the l2l_{2} distance, while (dwl1,θd_{w}^{l_{1}},\theta)-ALSH solves the problem of NNS over dwl1d_{w}^{l_{1}} by reducing it to the problem of NNS over the Angular distance.

4.2.1 (dwl1,l2d_{w}^{l_{1}},l_{2})-ALSH

Based on the transformations P()P(\cdot) and Qw()Q_{w}(\cdot) and the LSH family (hl2)\mathcal{H}_{(h^{l_{2}})} introduced in Section 2.1, (dwl1,l2d_{w}^{l_{1}},l_{2})-ALSH uses the ALSH family (f(dwl1,l2),g(dwl1,l2))={f(dwl1,l2):{0,1,,M}d}{g(dwl1,l2):{0,1,,M}d}\mathcal{H}_{(f^{(d_{w}^{l_{1}},l_{2})},g^{(d_{w}^{l_{1}},l_{2})})}=\{f^{(d_{w}^{l_{1}},l_{2})}:\{0,1,\ldots,M\}^{d}\rightarrow\mathbb{Z}\}\bigcup\{g^{(d_{w}^{l_{1}},l_{2})}:\{0,1,\ldots,M\}^{d}\rightarrow\mathbb{Z}\}, where f(dwl1,l2)(x)=hl2(P(x))f^{(d_{w}^{l_{1}},l_{2})}(x)=h^{l_{2}}(P(x)) and g(dwl1,l2)(x)=hl2(Qw(x))g^{(d_{w}^{l_{1}},l_{2})}(x)=h^{l_{2}}(Q_{w}(x)) for x{0,1,,M}dx\in\{0,1,\ldots,M\}^{d}. Combining Equations 13 and 18 we obtain

dwl1(o,q)=Mi=1dwidIP(P(o),Qw(q)).d_{w}^{l_{1}}(o,q)=M\sum_{i=1}^{d}w_{i}-d^{IP}(P(o),Q_{w}(q)). (21)

It is easy to know

P(o)22=Md\displaystyle\|P(o)\|_{2}^{2}=Md (22)
Qw(q)22=Mi=1dwi2.\displaystyle\|Q_{w}(q)\|_{2}^{2}=M\sum_{i=1}^{d}w_{i}^{2}. (23)

Thus, we have

dl2(P(o),Qw(q))=P(o)Qw(q)2=P(o)22+Qw(q)222dIP(P(o),Qw(q))=M(d+i=1dwi2)2(Mi=1dwidwl1(o,q)).\begin{split}d^{l_{2}}(P(o),Q_{w}(q))&=\|P(o)-Q_{w}(q)\|_{2}\\ &=\sqrt{\|P(o)\|_{2}^{2}+\|Q_{w}(q)\|_{2}^{2}-2d^{IP}(P(o),Q_{w}(q))}\\ &=\sqrt{M\left(d+\sum_{i=1}^{d}w_{i}^{2}\right)-2\left(M\sum_{i=1}^{d}w_{i}-d_{w}^{l_{1}}(o,q)\right)}.\end{split} (24)

Let r=dwl1(o,q)r=d_{w}^{l_{1}}(o,q). According to Equations 4 and 24, the collision probability function with respect to (f(dwl1,l2),g(dwl1,l2))\mathcal{H}_{(f^{(d_{w}^{l_{1}},l_{2})},g^{(d_{w}^{l_{1}},l_{2})})} is

P(dwl1,l2)(r)=Pr[f(dwl1,l2)(o)=g(dwl1,l2)(q)]=Pr[hl2(P(o))=hl2(Qw(q))]=Pl2(P(o)Qw(q)2)=Pl2(M(d+i=1dwi2)2(Mi=1dwir)).\begin{split}P^{(d_{w}^{l_{1}},l_{2})}(r)&=Pr[f^{(d_{w}^{l_{1}},l_{2})}(o)=g^{(d_{w}^{l_{1}},l_{2})}(q)]\\ &=Pr[h^{l_{2}}(P(o))=h^{l_{2}}(Q_{w}(q))]\\ &=P^{l_{2}}(\|P(o)-Q_{w}(q)\|_{2})\\ &=P^{l_{2}}\left(\sqrt{M\left(d+\sum_{i=1}^{d}w_{i}^{2}\right)-2\left(M\sum_{i=1}^{d}w_{i}-r\right)}\right).\end{split} (25)

Since Pl2()P^{l_{2}}(\cdot) is a decreasing function, P(dwl1,l2)(R1)>P(dwl1,l2)(R2)P^{(d_{w}^{l_{1}},l_{2})}(R_{1})>P^{(d_{w}^{l_{1}},l_{2})}(R_{2}) holds for any R1<R2R_{1}<R_{2}. Therefore, we obtain the following Lemma 1.

Lemma 1

(f(dwl1,l2),g(dwl1,l2))\mathcal{H}_{(f^{(d_{w}^{l_{1}},l_{2})},g^{(d_{w}^{l_{1}},l_{2})})} is (R1,R2,P(dwl1,l2)(R1),P(dwl1,l2)(R2))(R_{1},R_{2},P^{(d_{w}^{l_{1}},l_{2})}(R_{1}),P^{(d_{w}^{l_{1}},l_{2})}(R_{2}))-sensitive for any R1<R2R_{1}<R_{2}.

According to Theorem 1 and Lemma 1, we have the following Theorem 4.

Theorem 4

(dwl1,l2d_{w}^{l_{1}},l_{2})-ALSH can solve the problem of (R1,R2)(R_{1},R_{2})-NNS over dwl1d_{w}^{l_{1}} with O(nρ(dwl1,l2)dlogn)O(n^{\rho^{(d_{w}^{l_{1}},l_{2})}}d\log n) query time and O(n1+ρ(dwl1,l2))O(n^{1+\rho^{(d_{w}^{l_{1}},l_{2})}}) space, where ρ(dwl1,l2)=log(P(dwl1,l2)(R1))log(P(dwl1,l2)(R2))<1\rho^{(d_{w}^{l_{1}},l_{2})}=\frac{\log\left(P^{(d_{w}^{l_{1}},l_{2})}(R_{1})\right)}{\log\left(P^{(d_{w}^{l_{1}},l_{2})}(R_{2})\right)}<1.

4.2.2 (dwl1,θd_{w}^{l_{1}},\theta)-ALSH

Now we introduce the scheme of (dwl1,θd_{w}^{l_{1}},\theta)-ALSH. Based on the transformations P()P(\cdot) and Qw()Q_{w}(\cdot) and the LSH family (hθ)\mathcal{H}_{(h^{\theta})} introduced in Section 2.1, (dwl1,θd_{w}^{l_{1}},\theta)-ALSH uses the ALSH family (f(dwl1,θ),g(dwl1,θ))={f(dwl1,θ):{0,1,,M}d{0,1}}{g(dwl1,θ):{0,1,,M}d{0,1}}\mathcal{H}_{(f^{(d_{w}^{l_{1}},\theta)},g^{(d_{w}^{l_{1}},\theta)})}=\{f^{(d_{w}^{l_{1}},\theta)}:\{0,1,\ldots,M\}^{d}\rightarrow\{0,1\}\}\bigcup\{g^{(d_{w}^{l_{1}},\theta)}:\{0,1,\ldots,M\}^{d}\rightarrow\{0,1\}\}, where f(dwl1,θ)(x)=hθ(P(x))f^{(d_{w}^{l_{1}},\theta)}(x)=h^{\theta}(P(x)) and g(dwl1,θ)(x)=hθ(Qw(x))g^{(d_{w}^{l_{1}},\theta)}(x)=h^{\theta}(Q_{w}(x)) for x{0,1,,M}dx\in\{0,1,\ldots,M\}^{d}. According to Equations 21, 22 and 23, the relationship between dwl1(o,q)d_{w}^{l_{1}}(o,q) and dθ(P(o),Qw(q))d^{\theta}(P(o),Q_{w}(q))is as follows:

dθ(P(o),Qw(q))=arccos(P(o)TQw(q)P(o)2Qw(q)2)=arccos(dIP(P(o),Qw(q))P(o)2Qw(q)2)=arccos(Mi=1dwidwl1(o,q)Mdi=1dwi2).\begin{split}d^{\theta}(P(o),Q_{w}(q))&=\arccos\left(\frac{P(o)^{T}Q_{w}(q)}{\|P(o)\|_{2}\|Q_{w}(q)\|_{2}}\right)\\ &=\arccos\left(\frac{d^{IP}(P(o),Q_{w}(q))}{\|P(o)\|_{2}\|Q_{w}(q)\|_{2}}\right)\\ &=\arccos\left(\frac{M\sum_{i=1}^{d}w_{i}-d_{w}^{l_{1}}(o,q)}{M\sqrt{d\sum_{i=1}^{d}w_{i}^{2}}}\right).\end{split} (26)

Let r=dwl1(o,q)r=d_{w}^{l_{1}}(o,q). From Equations 6 and 26, it can be seen that the collision probability function with respect to (f(dwl1,θ),g(dwl1,θ))\mathcal{H}_{(f^{(d_{w}^{l_{1}},\theta)},g^{(d_{w}^{l_{1}},\theta)})} is

P(dwl1,θ)(r)=Pr[f(dwl1,θ)(o)=g(dwl1,θ)(q)]=Pr[hθ(P(o))=hθ(Qw(q))]=11πarccos(P(o)TQw(q)P(o)2Qw(q)2)=11πarccos(Mi=1dwirMdi=1dwi2).\begin{split}P^{(d_{w}^{l_{1}},\theta)}(r)&=Pr[f^{(d_{w}^{l_{1}},\theta)}(o)=g^{(d_{w}^{l_{1}},\theta)}(q)]\\ &=Pr[h^{\theta}(P(o))=h^{\theta}(Q_{w}(q))]\\ &=1-\frac{1}{\pi}\arccos\left(\frac{P(o)^{T}Q_{w}(q)}{\|P(o)\|_{2}\|Q_{w}(q)\|_{2}}\right)\\ &=1-\frac{1}{\pi}\arccos\left(\frac{M\sum_{i=1}^{d}w_{i}-r}{M\sqrt{d\sum_{i=1}^{d}w_{i}^{2}}}\right).\end{split} (27)

It is easy to know that P(dwl1,θ)()P^{(d_{w}^{l_{1}},\theta)}(\cdot) is a decreasing function. Thus, P(dwl1,θ)(R1)>P(dwl1,θ)(R2)P^{(d_{w}^{l_{1}},\theta)}(R_{1})>P^{(d_{w}^{l_{1}},\theta)}(R_{2}) holds for any R1<R2R_{1}<R_{2}. Then we obtain the following Lemma 2.

Lemma 2

(f(dwl1,θ),g(dwl1,θ))\mathcal{H}_{(f^{(d_{w}^{l_{1}},\theta)},g^{(d_{w}^{l_{1}},\theta)})} is (R1,R2,P(dwl1,θ)(R1),P(dwl1,θ)(R2))(R_{1},R_{2},P^{(d_{w}^{l_{1}},\theta)}(R_{1}),P^{(d_{w}^{l_{1}},\theta)}(R_{2}))-sensitive for any R1<R2R_{1}<R_{2}.

Combining Theorem 1 and Lemma 2 we have the following Theorem 5.

Theorem 5

(dwl1,θd_{w}^{l_{1}},\theta)-ALSH can solve the problem of (R1,R2)(R_{1},R_{2})-NNS over dwl1d_{w}^{l_{1}} with O(nρ(dwl1,θ)dlogn)O(n^{\rho^{(d_{w}^{l_{1}},\theta)}}d\log n) query time and O(n1+ρ(dwl1,θ))O(n^{1+\rho^{(d_{w}^{l_{1}},\theta)}}) space, where ρ(dwl1,θ)=log(P(dwl1,θ)(R1))log(P(dwl1,θ)(R2))<1\rho^{(d_{w}^{l_{1}},\theta)}=\frac{\log\left(P^{(d_{w}^{l_{1}},\theta)}(R_{1})\right)}{\log\left(P^{(d_{w}^{l_{1}},\theta)}(R_{2})\right)}<1.

4.2.3 Implementation Skills of (dwl1,l2d_{w}^{l_{1}},l_{2})-ALSH and (dwl1,θd_{w}^{l_{1}},\theta)-ALSH

The scheme of (dwl1,l2d_{w}^{l_{1}},l_{2})-ALSH (or (dwl1,θd_{w}^{l_{1}},\theta)-ALSH) needs to compute the hash values hl2(P(o))h^{l_{2}}(P(o)) and hl2(Qw(q))h^{l_{2}}(Q_{w}(q)) (or hθ(P(o))h^{\theta}(P(o)) and hθ(Qw(q))h^{\theta}(Q_{w}(q))). We can easily know that the running time of computing hl2(P(o))h^{l_{2}}(P(o)) (or hθ(P(o))h^{\theta}(P(o))) is dominated by the time cost of obtaining aTP(o)a^{T}P(o) and the running time of computing hl2(Qw(q))h^{l_{2}}(Q_{w}(q)) (or hθ(Qw(q))h^{\theta}(Q_{w}(q))) is dominated by the time cost of obtaining aTQw(q)a^{T}Q_{w}(q), where aa is a 2Md2Md-dimensional vector where each entry is chosen independently from the standard normal distribution. The naive approach to obtain aTP(o)a^{T}P(o) or aTQw(q)a^{T}Q_{w}(q) is to compute the inner product of the two corresponding vectors. However, it will require 2Md2Md multiplications and 2Md12Md-1 additions, which is expensive when MM is large.

Next, we show how to obtain aTP(o)a^{T}P(o) with only 2d12d-1 additions and obtain aTQw(q)a^{T}Q_{w}(q) with only 2d12d-1 additions and dd multiplications. Suppose a=(a1;a2;;ad;ad+1;ad+2;;a2d)a=(a_{1};a_{2};\ldots;a_{d};a_{d+1};a_{d+2};\ldots;a_{2d}), where ai=(ai1,ai2,,aiM)Ma_{i}=(a_{i1},a_{i2},\ldots,a_{iM})\in\mathbb{R}^{M}. According to Equations 14-17, 19 and 20, we have aTP(o)=i=1daiTcos^(π2Unary(oi))+i=1dad+iTsin^(π2Unary(oi))a^{T}P(o)=\sum_{i=1}^{d}a_{i}^{T}\widehat{\cos}(\frac{\pi}{2}\text{Unary}(o_{i}))+\sum_{i=1}^{d}a_{d+i}^{T}\widehat{\sin}(\frac{\pi}{2}\text{Unary}(o_{i})) and aTQw(q)=i=1daiTcos^(π2Unary(qi))wi+i=1dad+iTsin^(π2Unary(qi))wia^{T}Q_{w}(q)=\sum_{i=1}^{d}a_{i}^{T}\widehat{\cos}(\frac{\pi}{2}\text{Unary}(q_{i}))w_{i}+\sum_{i=1}^{d}a_{d+i}^{T}\widehat{\sin}(\frac{\pi}{2}\text{Unary}(q_{i}))w_{i}. Since cos^(π2Unary(oi))\widehat{\cos}(\frac{\pi}{2}\text{Unary}(o_{i})) is a sequence of oio_{i} 0’s followed by (Moi)(M-o_{i}) 1’s and sin^(π2Unary(oi))\widehat{\sin}(\frac{\pi}{2}\text{Unary}(o_{i})) is a sequence of oio_{i} 1’s followed by (Moi)(M-o_{i}) 0’s, it is easy to know that aiTcos^(π2Unary(oi))a_{i}^{T}\widehat{\cos}(\frac{\pi}{2}\text{Unary}(o_{i})) is the sum of the last MoiM-o_{i} elements of aia_{i} and ad+iTsin^(π2Unary(oi))a_{d+i}^{T}\widehat{\sin}(\frac{\pi}{2}\text{Unary}(o_{i})) is the sum of the first oio_{i} elements of ad+ia_{d+i}. Thus, we preprocess the vector aa to obtain a=(a1;a2;;ad;ad+1;ad+2;;a2d)a^{\prime}=(a_{1}^{\prime};a_{2}^{\prime};\ldots;a_{d}^{\prime};a_{d+1}^{\prime};a_{d+2}^{\prime};\ldots;a_{2d}^{\prime}), where ai=(ai1,ai2,,aiM,ai(M+1))a_{i}^{\prime}=(a_{i1}^{\prime},a_{i2}^{\prime},\ldots,a_{iM}^{\prime},a_{i(M+1)}^{\prime}) and

aij={k=jMaikif 1id and 1jM0if 1id and j=M+10ifd+1i2d and j=1k=1j1aikifd+1i2d and 2jM+1.a_{ij}^{\prime}=\begin{cases}\sum_{k=j}^{M}a_{ik}&\text{if}\ 1\leq i\leq d\text{ and }1\leq j\leq M\\ 0&\text{if}\ 1\leq i\leq d\text{ and }j=M+1\\ 0&\text{if}\ d+1\leq i\leq 2d\text{ and }j=1\\ \sum_{k=1}^{j-1}a_{ik}&\text{if}\ d+1\leq i\leq 2d\text{ and }2\leq j\leq M+1.\\ \end{cases} (28)

Then we have aTP(o)=i=1dai(oi+1)+i=1da(d+i)(oi+1)a^{T}P(o)=\sum_{i=1}^{d}a_{i(o_{i}+1)}^{\prime}+\sum_{i=1}^{d}a_{(d+i)(o_{i}+1)}^{\prime}. It can be seen that aTP(o)a^{T}P(o) can be obtained with 2d12d-1 additions by using aa^{\prime}. Similarly, we have aTQw(q)=i=1dwiai(qi+1)+i=1dwia(d+i)(qi+1)=i=1dwi(ai(qi+1)+a(d+i)(qi+1))a^{T}Q_{w}(q)=\sum_{i=1}^{d}w_{i}a_{i(q_{i}+1)}^{\prime}+\sum_{i=1}^{d}w_{i}a_{(d+i)(q_{i}+1)}^{\prime}=\sum_{i=1}^{d}w_{i}(a_{i(q_{i}+1)}^{\prime}+a_{(d+i)(q_{i}+1)}^{\prime}). Therefore, aTQw(q)a^{T}Q_{w}(q) can be obtained with 2d12d-1 additions and dd multiplications by using aa^{\prime}.

5 Conclusion

This paper studies the fundamental problem of Nearest Neighbor Search (NNS) over the generalized weighted Manhattan distance (dwl1d_{w}^{l_{1}}). As far as we know, there is no prior work that solves the problem in sublinear time. In this paper, we first prove that there is no LSH or ALSH family for dwl1d_{w}^{l_{1}} over the entire space d\mathbb{R}^{d}. Then, we prove that there is still no LSH family suitable for dwl1d_{w}^{l_{1}} over a bounded space. After that, we propose two ALSH families for dwl1d_{w}^{l_{1}} over a bounded space. Based on these ALSH families, two ALSH schemes (dwl1,l2d_{w}^{l_{1}},l_{2})-ALSH and (dwl1,θd_{w}^{l_{1}},\theta)-ALSH are proposed for solving NNS over dwl1d_{w}^{l_{1}} in sublinear time.

References

  • [1] Charu C. Aggarwal, Alexander Hinneburg, and Daniel A. Keim. On the surprising behavior of distance metrics in high dimensional spaces. In ICDT, 2001.
  • [2] Jon Louis Bentley. Multidimensional binary search trees used for associative searching. Commun. ACM, 1975.
  • [3] Gautam Bhattacharya, Koushik Ghosh, and Ananda S. Chowdhury. Granger causality driven AHP for feature weighted knn. Pattern Recognit., 2017.
  • [4] Moses Charikar. Similarity estimation techniques from rounding algorithms. In STOC, 2002.
  • [5] Lei Chen. Curse of dimensionality. In Encyclopedia of Database Systems. 2009.
  • [6] King Lum Cheung and Ada Wai-Chee Fu. Enhanced nearest neighbour search on the r-tree. SIGMOD Rec., 1998.
  • [7] Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In SCG, 2004.
  • [8] Jürgen Forster, Niels Schmitt, Hans Ulrich Simon, and Thorsten Suttorp. Estimating the optimal margins of embeddings in euclidean half spaces. Mach. Learn., 2003.
  • [9] Junhao Gan, Jianlin Feng, Qiong Fang, and Wilfred Ng. Locality-sensitive hashing scheme based on dynamic collision counting. In SIGMOD, 2012.
  • [10] Aristides Gionis, Piotr Indyk, and Rajeev Motwani. Similarity search in high dimensions via hashing. In VLDB, 1999.
  • [11] Yupeng Gu, Bo Zhao, David Hardtke, and Yizhou Sun. Learning global term weights for content-based recommender systems. In WWW, 2016.
  • [12] Alexander Hinneburg, Charu C. Aggarwal, and Daniel A. Keim. What is the nearest neighbor in high dimensional spaces? In VLDB, 2000.
  • [13] Qiang Huang, Jianlin Feng, Qiong Fang, Wilfred Ng, and Wei Wang. Query-aware locality-sensitive hashing scheme for lp{}_{\mbox{p}} norm. VLDB J., 2017.
  • [14] Chein-Shung Hwang, Yi-Ching Su, and Kuo-Cheng Tseng. Using genetic algorithms for personalized recommendation. In ICCCI, Lecture Notes in Computer Science. Springer, 2010.
  • [15] Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In STOC, 1998.
  • [16] Yifan Lei, Qiang Huang, Mohan S. Kankanhalli, and Anthony K. H. Tung. Sublinear time nearest neighbor search over generalized weighted space. In ICML, 2019.
  • [17] Kejing Lu, Hongya Wang, Wei Wang, and Mineichi Kudo. VHP: approximate nearest neighbor search via virtual hypersphere partitioning. Proc. VLDB Endow., 2020.
  • [18] Julian J. McAuley, Christopher Targett, Qinfeng Shi, and Anton van den Hengel. Image-based recommendations on styles and substitutes. In SIGIR, 2015.
  • [19] Alejandro Moreo, Andrea Esuli, and Fabrizio Sebastiani. Learning to weight for text classification. IEEE Trans. Knowl. Data Eng., 2020.
  • [20] Behnam Neyshabur, Yury Makarychev, and Nathan Srebro. Clustering, hamming embedding, generalized LSH and the max norm. In ALT, 2014.
  • [21] Behnam Neyshabur and Nathan Srebro. On symmetric and asymmetric lshs for inner product search. In ICML, 2015.
  • [22] Hanan Samet. Foundations of multidimensional and metric data structures. Morgan Kaufmann, 2006.
  • [23] Anshumali Shrivastava and Ping Li. Asymmetric LSH (ALSH) for sublinear time maximum inner product search (MIPS). In NeurIPS, 2014.
  • [24] Anshumali Shrivastava and Ping Li. Improved asymmetric locality sensitive hashing (ALSH) for maximum inner product search (MIPS). In UAI, 2015.
  • [25] Nathan Srebro and Adi Shraibman. Rank, trace-norm and max-norm. In COLT, Lecture Notes in Computer Science. Springer, 2005.
  • [26] Yufei Tao, Ke Yi, Cheng Sheng, and Panos Kalnis. Efficient and accurate nearest neighbor and closest pair search in high-dimensional space. TODS, 2010.
  • [27] Jingdong Wang, Heng Tao Shen, Jingkuan Song, and Jianqiu Ji. Hashing for similarity search: A survey. CoRR, 2014.
  • [28] Bolong Zheng, Xi Zhao, Lianggui Weng, Nguyen Quoc Viet Hung, Hang Liu, and Christian S. Jensen. PM-LSH: A fast and accurate LSH framework for high-dimensional approximate NN search. Proc. VLDB Endow., 2020.