A Note on Graph-Based Nearest Neighbor Search
Abstract
Nearest neighbor search has found numerous applications in machine learning, data mining and massive data processing systems. The past few years have witnessed the popularity of the graph-based nearest neighbor search paradigm because of its superiority over the space-partitioning algorithms. While a lot of empirical studies demonstrate the efficiency of graph-based algorithms, not much attention has been paid to a more fundamental question: why graph-based algorithms work so well in practice? And which data property affects the efficiency and how? In this paper, we try to answer these questions. Our insight is that “the probability that the neighbors of a point tends to be neighbors in the NN graph” is a crucial data property for query efficiency. For a given dataset, such a property can be qualitatively measured by clustering coefficient of the NN graph.
To show how clustering coefficient affects the performance, we identify that, instead of the global connectivity, the local connectivity around some given query has more direct impact on recall. Specifically, we observed that high clustering coefficient makes most of the nearest neighbors of sit in a maximum strongly connected component (SCC) in the graph. From the algorithmic point of view, we show that the search procedure is actually composed of two phases - the one outside the maximum SCC and the other one in it, which is different from the widely accepted single or multiple paths search models. We proved that the commonly used graph-based search algorithm is guaranteed to traverse the maximum SCC once visiting any point in it. Our analysis reveals that high clustering coefficient leads to large size of the maximum SCC, and thus provides good answer quality with the help of the two-phase search procedure. Extensive empirical results over a comprehensive collection of datasets validate our findings.
I Introduction
Nearest neighbor search among database vectors for a query is a key building block to solve problems such as large-scale image search and information retrieval, recommendation, entity resolution, and sequence matching. As database size and vector dimensionality increase, exact nearest neighbor search becomes expensive and often is considered impractical due to the long search latency. To reduce the search cost, approximate nearest neighbor (ANN) search is used, which provides a better tradeoff among accuracy, latency, and memory overhead.
Roughly speaking, the existing ANN methods can be classified into space-partitioning algorithms and graph-based ones111Please note the categorization is not fixed/unique. The space-partitioning methods further fall into three categories - the tree-based, production quantization (PQ) and locality sensitive hashing (LSH)[1, 2]. Recent empirical study shows that graph-based ANN search algorithms are more efficient than the space-partitioning methods such as PQ and LSH, and thus have been adopted by many commercial applications in Facebook, Microsoft, Taobao and etc. [3, 4, 5].
While a lot of empirical studies validate the efficiency of graph-based ANN search algorithms, not much attention has been paid to a more fundamental question: why graph-based ANN search algorithms are so efficient? And which data property affect the efficiency and how? Two recent papers analyze the asymptotic performance of graph-based methods for datasets uniformly distributed on a -dimensional Euclidean sphere [6, 7]. The worst-case analysis shows that the asymptotic behavior of a greedy graph-based search only matches the optimal hash-based algorithm [8], which is far worse than the practical performance of graph-based algorithms and thus could not answer these questions.
A few conceptual graph models such as Monotonic Search Network Model [9], Delaunay Graph Model [10, 11] and Navigable Small World Model [12, 13] are proposed to inspire the construction of ANN search graphs. As will be discussed in Section II, none of them can explain the success of graph-based algorithms either. Actually, the vast majority (if not all) of practical ANN search graphs uses approximate NN graph as the index structure instead of the conceptual models due to time or space constraints, and thus is fully devoid of the theoretical features provided by these models.
In this paper, we argue that, for a specific dataset, the clustering coefficient [14] of its NN graph is an important indicator on how efficiently graph-based algorithms work. The clustering coefficient of NN graph defines the probability of neighbors of a point are also neighbors. Comprehensive experimental results reveal that higher the clustering coefficient is, more efficiently the graph-based algorithms will perform. Since clustering coefficient is data dependent, graph-based algorithms perform rather worse for datasets such as Random with very small clustering coefficient, whereas do well in datasets such as Sift and Audio with greater ones.
We also study how clustering coefficient affects the performance. The analysis of the complex network indicates that large clustering coefficient leads to high global connectivity [15]. Our insight is that, instead of the global connectivity, the local graph structure is more crucial for high ANN search recall. Particularly, we observed that, for datasets with large clustering coefficient, most of the NN of some given query222This query can be in or not in the dataset lie in the maximum strongly connected component (SCC) in the subgraph composed of these NN. Moreover, we show that the search procedure actually consists of two phases, the one outside the maximum SCC and the one in it, in contrast to the common wisdom of single or multiple path search models. Then, we proved that the commonly used graph search algorithm is guaranteed to visit all NN in the maximum SCC under a mild condition, which suggests that the size of the maximum SCC determines the answer quality of NN search. This sheds light on the strong positive correlation between the clustering coefficient and the result quality, and thus answers the two aforementioned questions.
To sum up, the main contributions of this paper are:
-
•
We introduce a new quantitative measure Clustering Coefficient of NN graph for the difficulty of graph-based nearest neighbor search. To the best of our knowledge, this is the first measure which could explain the efficiency of this class of important ANN search algorithms.
-
•
The conceptual models such as MSNETs and Delaunay graphs claim that NN could be found by walking in a single path. Instead, we found that the search procedure is actually composed of two phases. In the second phase the algorithm traverse the maximum SCC of NN for a query, of which the size is a determining factor for answer quality, i.e., recall.
-
•
We proved that the graph-based search algorithm is guaranteed to visit all points in the maximum SCC once entering it. Extensive empirical results over a comprehensive collection of datasets validate our observations.
We believe that this note could provide a different perspective on graph-based ANN search methods and might inspire more interesting work along this line.
II Graph-Based Nearest Neighbor Search
II-A Graph Construction and Search Algorithms
In the sequel, we will use nodes, points and vertices interchangeably without ambiguity. A directed graph consists of a nonempty vertices set and a set of edges such that each edge is assigned to an ordered pair {} with . Most graph-based algorithms build directed graph to navigate the NN search. To the best of our knowledge, the idea of using graphs to process ANN search can be traced back to the Pathfinder project, which was initiated to support efficient search, classification, and clustering in computer vision systems [9]. In this project, Dearholt et al. designed and implemented the monotonic search network to retrieve the best match of an entity in a collection of geometric objects. Since then, researchers from different communities such as theory, database and pattern recognition explored different ways to construct search graphs, inspired by various graph/network models such as the relative neighborhood graph [16, 17], Delaunay graph [18, 10, 19], KNN graph [20, 21] and navigable small world network [13, 22, 23]. Thanks to its appealing practical performance, the graph-based ANN search paradigm has become an active research direction and quite a lot of new approaches are developed recently [5, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 3, 34].
While motivated by different graph/network models, most (if not all) practical graph-based algorithms essentially use approximate NN graphs as the core index structure. A specific algorithm distinguishes itself from the others mainly in the edge selection heuristics, i.e., the way to select a subset of links between any point and its neighbors. Algorithm 1 depicts the general framework of index construction for graph-based ANN search.
For almost all graph-based methods, the ANN search procedure is based on the same principle as follows. For a query , start at an initial vertex chosen arbitrarily or using some sophisticated selection rule. Moves along an edge to the adjacent vertex with minimum distance to . Repeat this step until the current element is closer to than all its neighbors, and then report as the NN of . We call this the single path search model. Figure 1 illustrates the searching procedure for in a sample graph with points to , where is the starting point and dashed lines indicate the search path. At the end of searching, the NN of q, i.e., is identified.

For practical ANN search networks, e.g., HNSW and NSG, there is no guarantee that a monotonic search path always exist for any given query [22, 5]. As a result, it can be easy to get trapped into the local optima, meaning that is not the NN of . To address this issue, backtracking is solicited – we need to go back to visited vertices and find another outgoing link to restart the procedure. We call this the multiple path search model. Algorithm 2 sketches the commonly adopted search algorithm that allows for backtracking, which will be discussed in detail in Section IV. Figure 2 illustrates a search path with backtracking. The starting point is and is the local optima since the true NN of is . By backtracking the algorithm gets back to , which is further to than and finally find the true NN of .

Please note that is often greater than to achieve better answer quality. For ease of presentation, we assume throughout this paper unless stated otherwise.
II-B Review of Graph Search Models and Their Limitations
While empirical studies demonstrate that the graph-based ANN search algorithms are very competitive, it is widely recognized that the graph-based methods are mostly based on heuristics and not well understood quanlitatively [31, 5]. As an exception, two recent papers take the first step to analyze the asymptotic performance of graph-based methods for datasets uniformly distributed on a -dimensional Euclidean sphere [6, 7]. The worst-case analysis shows that the asymptotic behavior of a greedy graph-based search only matches the optimal hashing-based algorithm [8].
It was experimentally observed that the graph-based methods are orders of magnitude faster than the hashing-based algorithms [26]. Thus, though interesting from a pure theoretical perspective, their theory fails to explain the salient practical performance of the graph-based algorithms. Next, we will review several graph/network models that inspire practical graph-based algorithms, and then point out their limitations.
Monotonic Search Network Model. The monotonic search networks (MSNET) are defined as a family of graphs such that, for any two vertices in the graph, there exists at least one monotonic search path between them [9]. If the query point happens to be equal to a point of S, then a simple greedy search will succeed in locating the query point along a path of monotonically decreasing distance to the query point. The original MSNET is not practical, even for datasets of moderate size, because of its indexing complexity and unbounded average out-degree [9]. A recent proposal, the monotonic relative neighborhood graph, reduces the graph construction time to . This, however, still does not make MSNETs applicable in practice.
Delaunay Graph Model. Given a set of elements in a Euclidean space, the Voronoi diagram associates a Voronoi region with each element, which gives rise to a notion of neighborhood. The significance of this neighborhood is that if a query is closer to a database element than all its neighbors, then we have found the nearest element in the whole database [10, 11]. Delaunay graph is the dual of the Voronoi diagram, where each element is connected to all elements that share a Voronoi edge with . Using the Delaunay graph, Algorithm 2 is guaranteed to find the NN of . Unfortunately, the worst-case combinational complexity of the Voronoi diagram in dimension grows as [35]. In addition, the Delaunay graph quickly reduces to the complete graph as grows, making it infeasible for NN search in high dimension spaces [18].
Navigable Small World Model. Networks with logarithmic or polylogarithmic complexity of the greedy graph search are known as the navigable small world networks [12, 23, 13]. Inspired by this model, Malkov et al. proposed the navigable small world graph (NSW) and hierarchical NSW by introducing “long” links during the approximate NN graph construction, expecting that the greedy routing achieves polylogrithmic complexity for NN search [22]. They demonstrate that the number of hops during the graph routing is polylogrithmic with respect to the network size on a collection of real-life datasets experimentally. However, unlike the ideal navigable small world model, no rigorous theoretical analysis is provided for NSW and HNSW.
We argue that these conceptual models are inadequate in explaining why in most cases the search procedure quickly converges to the nearest neighbor since:
-
•
For ideal models, the MSNET alone gives no hint how the graph-based methods generalize to out-of-sample queries, i.e., queries that are not in . Delaunay graph supports out-of-sample queries, but do not guarantees NN could be found for query . For example, suppose Algorithm 2 can reach by traversing one monotonic search path from to , we actually have no idea whether is the NN of at all because there may be multiple monotonic search pathes and NN of may lie in some other pathes. Navigable small world model only gives intuitive explanations on the existence of short search pathes but have no quantitative justifications that why the NN of can be found.
-
•
More importantly, the vast majority of graph-based algorithms uses approximate NN graph or its variants, instead of the aforementioned conceptual models, as the index structure. By limiting the maximum out-degree, approximate NN graphs are far more sparse than MSNETs and Delaunay graphs, which makes it is fully devoid of the nice the theoretical properties, i.e., the existence of the monotonic search path or Voronoi neighborhood.
To sum up, the existing models fail to illuminate the intuitive appeal of the graph-based methods. We view this as a significant gap between the theory and practice of the graph-based search paradigm. In this paper, we try to explain more quantitatively, from a different perspective, why the approximate NN graph-based methods work so well in practice.
III Clustering Coefficient of NN Graph and Its Impact on Search Performance
In graph theory, the clustering coefficient is a measure of the degree to which nodes in a graph tend to cluster together, which has been used successfully in its own right across a wide range of applications in complex networks. To name a few, Burt uses local clustering as a probe for the existence of so-called “structural holes” in a network, and Dorogovtsev et al. found that falls off with approximately as for certain models of scale-free networks [15], where is the degree of .
There are different ways to define clustering coefficient. In this paper, we adopt the commonly used definition given by Watts and Strogatz [14]. The local clustering coefficient of a vertex is defined as
(1) |
To calculate , we go through all distinct pairs of vertices that are neighbors of in the network, count the number of such pairs that are connected to each other, and divide by the total number of pairs . Figure 3 illustrates the definition of the local clustering coefficient . The degree of is 4 and there exists two edges between its neighbors. Hence, by definition .

The clustering coefficient for the whole network is the average
(2) |
The local clustering coefficient of a vertex describes the likelihood that the neighbours of are also connected, i.e., the probability that two randomly selected neighbors of are neighbors with each other. Roughly speaking, it tells how well the neighborhood of the node is connected. If the neighborhood is fully connected, the local clustering coefficient is 1 and a value close to 0 means that there are hardly any connections in the neighborhood. If most of the nodes in the network have high clustering coefficient, then the network will probably have many edges that connect nodes to each other.
Clustering coefficient of NN graphs depends on and the intrinsic feature of datasets. Table I lists the clustering coefficients for various under three typical datasets As one can see, the larger is, the greater clustering coefficient will be. Moreover, the relative order of the clustering coefficient is stable independent of . In the sequel, we will use the clustering coefficient in the case of as the default since cannot be too large due to the index space constraint.
Dataset | |||||
Sift | 0.1159 | 0.1249 | 0.1371 | 0.1419 | 0.1468 |
Glove | 0.0881 | 0.1029 | 0.1289 | 0.1358 | 0.1427 |
Random | 0.00047 | 0.00074 | 0.00092 | 0.00114 | 0.00139 |
Our key observation is that the clustering coefficient of NN graphs is a informative measure for the efficiency of the graph-based ANN search methods. In this paper, a NN graph is defined as a graph such that for each vertices , there exists bi-directional edges with its most nearest neighbors. This model is reasonable because practical graph-based algorithms such as HNSW and NSG always add bi-directional links between a point and its NN as much as possible [22, 5]. Table II lists the statistics of the datasets, the clustering coefficients of NN graph in increasing order, the recalls of the top- query and the average number of hops in the graph for a collection of datasets under HNSW and NSG, the two state-of-the-art graph-based algorithms333statistics are collected over 1000 random queries. For both methods, the maximum out-degrees (MOD) is 70 and the parameter , which control the number of hops in the graph, is set to 50. Interesting observations can be made as follows:
-
•
NSG consistently outperforms HNSW in recall with slightly greater average number of hops, which approximately translates to the number of distance evaluation since the MODs are identical for both algorithms. This observation agrees with the results reported in [5].
-
•
A more interesting observation is that, with around the same number of average hops in the graph, clustering coefficient and recall are strongly correlated. Particularly, The Pearson correlation coefficients between the clustering coefficient and recall for NSG and HNSW are 0.794 and 0.762, respectively. Besides, independent of the data cardinality and dimensionality, high clustering coefficient (greater than 0.12) often leads to high recall whereas low clustering coefficient (lower than 1.0) results in low recall. As an extreme example, the clustering coefficient of Random dataset is only 0.00074 and thus makes graph-based algorithms become very inefficient. One reason that the recall of NSG is greater than that of HNSW is that the quality of neighbors of NSG is better than HNSW, that is, NSG is much closer to an exact NN graph than HNSW. Please note that the datasets are comprehensive enough in terms of size, dimensionality and data types (images, text, audio and synthetic). Detailed description of these datasets can be found in [24]444https://github.com/DBWangGroupUNSW/.
Dataset | Size | Dim | Clustering coefficient | HNSW | NSG | ||
Recall | # of Hops | Recall | # of Hops | ||||
Random | 1,000,000 | 128 | 0.00074 | 0.0049 | 61.3 | 0.02 | 64.8 |
Gist | 1,000,000 | 960 | 0.080 | 0.5984 | 54.9 | 0.7688 | 54.7 |
NUSWIDE | 268,643 | 500 | 0.096 | 0.4343 | 58.0 | 0.5430 | 59.8 |
GLOVE | 1,192,514 | 100 | 0.103 | 0.4903 | 60.3 | 0.694 | 56.1 |
ImageNet | 2,340,373 | 150 | 0.114 | 0.6643 | 53.3 | 0.8608 | 55.5 |
Sift | 1,000,000 | 128 | 0.125 | 0.8667 | 52.0 | 0.9453 | 54.2 |
Sun | 79,106 | 512 | 0.140 | 0.8941 | 51.1 | 0.9562 | 52.0 |
Cifar | 50,000 | 512 | 0.141 | 0.9196 | 51.0 | 0.9685 | 51.5 |
Deep | 1,000,000 | 256 | 0.144 | 0.8205 | 52.7 | 0.9078 | 54.7 |
MillionSong | 992,272 | 420 | 0.163 | 0.5984 | 51.4 | 0.9608 | 55.1 |
Ukbench | 1,097,907 | 128 | 0.189 | 0.8893 | 51.7 | 0.9545 | 54.5 |
Enron | 94,987 | 1369 | 0.209 | 0.7599 | 52.3 | 0.9421 | 53.3 |
Trevi | 99,900 | 4096 | 0.215 | 0.8845 | 51.1 | 0.9498 | 52.8 |
AUDIO | 53,387 | 192 | 0.253 | 0.9553 | 51.0 | 0.9815 | 52.5 |
MINIST | 69,000 | 784 | 0.286 | 0.9728 | 51.7 | 0.9878 | 53.2 |
Notre | 332,668 | 128 | 0.287 | 0.9248 | 52.4 | 0.9674 | 53.8 |
These observations suggest that the clustering coefficient are a promising measure for the efficiency of graph-based algorithms. Intuitively, higher the clustering coefficient of NN graph is555 should be as small as possible to reduce the memory footprint and query efficiency, the better the graph is connected, which means that graph connectivity has significant impact on the result quality of ANN search. To have an in-depth understanding of how connectivity affects the search performance, we scrutinized the graph traversal steps of a sample of queries and found that the local connectivity, instead of the global one, is the determining factor. To formally characterize the local connectivity, we propose the notion of maximum strongly connected neighborhood as follows.
Definition 1.
A directed graph is strongly connected if there is a path between all pairs of vertices. A strongly connected component (SCC) of a directed graph is a strongly connected subgraph in this graph.
Definition 2.
The -neighborhood of a point , denoted by , is the set of nearest elements of , i.e., .
Please note that the only requirement is the nearest neighbors of belongs to and may be some point in or not in . This definition makes our analysis supports out-of-sample queries.
Definition 3.
A subgraph of is the -neighborhood subgraph associated with a vertex , denoted by , if and .
Definition 4.
The maximum strongly connected neighborhood of , denoted by , is an SCC of such that for all , where are the SCCs of .
Please note that and owns totally different meaning - is the link number per node of NN graph and is determined in graph construction and is the search parameter and thus may vary according to the users’ requirement.
Figure 4 illustrates these definitions using a simple example. to is the top-5 NN of query and there are three undirected edges (equivalent to six directed edges) in ’s 5-neighborhood subgraph . Three SCCs exists in and the maximum SCC is composed of point , and .

To show the impact of on the algorithm performance. Table III lists the 3 SCCs of largest sizes for 100 random NN queries in the case of for three typical datasets. As we can see, the ratios of the size of (SCC1) to are very close the recall listed in Table II for three datasets, respectively. Other state-of-the-art algorithms, such as HNSW, exhibits similar trends.
SCC-id | Sift | Glove | Random | |||
size | ratio | size | ratio | size | ratio | |
SCC1 | 47.8 | 95.6% | 33.8 | 67.6% | 2.4 | 4.8% |
SCC2 | 0.8 | 1.6% | 1.8 | 3.6% | 1.4 | 2.8% |
SCC3 | 0.2 | 0.4% | 1.8 | 3.6% | 1.2 | 2.4% |
To eliminate the bias caused by specific graph construction algorithms, we studied the exact NN graph and found similar results. Table IV lists, for 100 random top-50 NN queries, the average sizes of the top-3 SCCs over Sift, Glove and Random. In this experiment, we only put a directed link from a point to its NN and no link is added manually in the reverse direction, i.e., the NN graph is directed. is set to 50. From Table IV we can see that, independent of specific graph-based algorithm, clustering coefficient also has a significant impact on the size of .
SCC-id | Sift | Glove | Random | |||
size | ratio | size | ratio | size | ratio | |
SCC1 | 36.8 | 73.6% | 30.2 | 60.4% | 1.2 | 2.4% |
SCC2 | 3.6 | 7.2% | 2.4 | 4.8% | 1 | 2% |
SCC3 | 1.6 | 3.2% | 1.6 | 3.2% | 1 | 2% |
We also examined undirected NN graph, where bi-directional link is added manually between a point and its NN. The trend listed in Table V is very similar to that of Table IV except that the sizes of are larger. This is because more links are added in the graph. Actually, practical graph-based algorithms lie somewhere between the undirected and directed NN graph since they always try to add bi-directional links as long as the memory budget is enough. Please note exact NN graphs are not practical because the unaffordable construction time and unlimimted maximum out-degree, which translates to too much memory cost.
SCC-id | Sift | Glove | Random | |||
size | ratio | size | ratio | size | ratio | |
SCC1 | 48.8 | 97.6% | 46.2 | 92.4% | 6 | 12% |
SCC2 | 0.2 | 0.4% | 1 | 2% | 2.2 | 4.4% |
SCC3 | 0 | 0% | 0.2 | 0.4% | 1.4 | 2.8% |
In a nutshell, all these experiments demonstrate that clustering coefficient of NN graph is an informative measure for the size of the maximum strongly connected neighborhood and the performance of graph-based algorithms over a specific dataset. Next, we will analyze how affects the recall for a given query . Particularly, we will show that Algorithm 2, the striking algorithmic component for graph search, can effectively reach and identify all NN . This explains why greater clustering coefficient and larger size of lead to better performance.
IV Two Phase NN Search in Graphs
Query ID | # of Hops in | # of Hops in | Fraction of visited in | ||
1 | 4 | 50 | 38 | 100% | 4 |
2 | 4 | 50 | 43 | 100% | 1 |
3 | 5 | 50 | 48 | 100% | 0 |
4 | 4 | 50 | 50 | 100% | 0 |
5 | 5 | 50 | 45 | 100% | 0 |
Query ID | # of Hops in | # of Hops in | Fraction of visited in | ||
1 | 3 | 52 | 27 | 100% | 7 |
2 | 1 | 59 | 28 | 100% | 0 |
3 | 11 | 39 | 20 | 100% | 8 |
4 | 2 | 49 | 38 | 100% | 4 |
5 | 2 | 48 | 24 | 100% | 6 |
Query ID | # of Hops in | # of Hops in | Fraction of visited in | ||
1 | 79 | 7 | 1 | 100% | 0 |
2 | 83 | 0 | 1 | 0% | 0 |
3 | 50 | 47 | 1 | 100% | 0 |
4 | 46 | 25 | 2 | 100% | 0 |
5 | 56 | 0 | 1 | 0% | 0 |
The common wisdom about Algorithm 2 is as follows. Starting from the entry vertex , which is chosen by random or using some auxiliary method, Algorithm 2 finds a directed path from to the query , hoping that NN of are identified through the walk. Since only local information, i.e., adjacent vertices of the visited vertices, is used, this class of algorithms are termed as the decentralized algorithm [12]. Particularly, for ANN search, Algorithm 2 first follows the out-edges of to get its immediate neighbors, and then examines the distances from these neighbors to . The one with the minimum distance to is selected as the next base vertex for iteration. The same procedure is repeated at each step of the traversal until Algorithm 2 reaches a local optima, namely, the immediate neighbors of the base vertex does not contain a vertex that is closer to than the base vertex itself. Backtracking is used to jump out of the local optima and increase the odd to find the true NN. Recall that we name this search paradigm as multiple path search model.
Different from the traditional point of view, we observe that Algorithm 2 actually is composed of two phases. In the first phase, the algorithm starts with an initial point, walks the graph and encounters a point within . In the second phase, the algorithm traverse and a small number of points not in . Figure 5 depicted the two phase search procedure. Theorem 1 proves that Algorithm 2 is guaranteed to find all points in under a mild condition.

Theorem 1.
Algorithm 2 is guaranteed to visit all points in starting with any point in .
Proof.
We know that any directed graph is said to be a strongly connected component iff all the vertices of the graph are a part of some cycle. Please note that the cycle may not necessarily be a Hamilton cycle. Suppose all vertices not in but adjacent with vertices in are further to than all vertices in . Without loss of generality, suppose the first vertex visited is , then Algorithm 2 will visit all vertices following the cycle and push every vertices into and . Since all vertices in are closer to than other vertices, the distance of the bottom element of will always greater than that of the top element in until all vertices in are visited666all elements in are initialized as infinity at the beginning. Please note that in each loop pop up the element that have been pushed into , which guarantees that Algorithm 2 always terminates. ∎
Theorem 1 suggests a different perspective in understanding the graph-based methods. Rather than searching a single path (without backtracing) or multiple paths (with backtracking) in the graph, the search algorithm actually traverses a strongly connected neighborhood around the query. In other words, high quality , together with Algorithm 2, offers the salient performance. The analysis in Section III reveals the quality of are data dependent and closely related to the clustering coefficient of NN graphs. Therefore, there exists significant performance disparity for different datasets and we could use the clustering coefficient of NN graph as a meaningful measure for the efficiency of the graph-based methods.
It is possible that there exists a few vertices adjacent with vertices in , which is not in but closer to than some vertices in . In this case, the algorithm may also visit such vertices, and the answer quality will be higher than just traversing since more closer vertices outside are visited.
The probability of the search algorithm getting into is exponentially increasing with , the number of being trapped into a local optima and getting back to a distant point before entering , which is expressed as follows. is the probability of getting into along a single path.
(3) |
The rigorous calculation of is infeasible. Empirically, for datasets with relatively large clustering coefficients we observed that (1) Algorithm 2 can quickly reach , and (2) the path length of the first phase is far shorter than that of the second phase. Table VI, Table VII and Table VIII list the numbers of hops in Phase 1 and Phase 2, the size of , the fraction of points in that are visited in Phase 2 and the number of true top- NN not in that are found during Phase 2 (denoted by ) for Sift, Glove and Random, respectively. NSG is used and the statistics of five random query are reported. Please note that HNSW and exact NN graph exhibit the similar trends thus we do not report results for them. Several interesting observations are made:
-
•
As we can see, independent of datasets, the two-phase search model is applicable to all queries listed. As proved in Theorem 1, once the search algorithm enters Phase 2, all points in will be visited, which demonstrates the importance of the quality of .
-
•
Besides the true top- NN in , other true top- NN also probably be visited during Phase 2, especially for Glove where the size of is relatively small. This is mainly caused by the search algorithm jumps into a smaller SCC or visits NN that only own a single directed edge with the maximum SCC.
-
•
For Sift and Glove, where the size of far greater than that of Random, the second phase dominates the search cost and the algorithm jumps into in a very small number of steps. In contrast, it is very hard for the algorithm to find a true top- NN for Random since the size of is too small (in most case is equal to 1). For example, and do not enter Phase 2 and didn’t find any true NN. As a result, the recall of Random is very low.

To train reader’s intuition, Figure 6 illustrates the search procedure of a top-10 query for Sift dataset with NSG. Green point is the query and red points denote the true top-NN in the maximum SCC, which are strongly connected. Dashed lines in blue with single or double arrows represents the the directed or bi-directional edges between points. The solid arrowed lines in yellow depict the search path during NN search. As we can see, starting with the entry point, the algorithm jumps into the maximum SCC in three steps. After traversing the maximum SCC which consists of six true NNs, it continues the search by visiting one true NN (in black) and four other points before the termination condition is met. Since is small in this example, the size of the maximum SCC is not that large. This can be informally explained by the fact that the connectivity increases as the number vertices become large under the same edge connection probability using random network theory [15].
The case of small : Users may be only interested in a small number of nearest neighbors of , say ranging from 1 to 10. In this case, the size and quality of is not that good to achieve high recall. To get precise results, is often set greater than , say 50-200. The net effect is that the search algorithm actually visit , which consists the most points of top- NN if the clustering coefficient is large enough, and then Algorithm 2 identify the best results and output them.
V Related Work
V-A Measrues for Difficulty of Nearest Neighbor Search
The problem of the difficult (approximate) of NN search in a given dataset has drawn much attention in recent years. Beyer et al. and Francois et al. show that NN search wil be meaningless when the number of dimensions goes to infinity [36, 37], respetively. However, they didn’t discuss the non-asymptotic analsisi when the number of dimensions is finite. Moreover, the effect of other crucial properties such as the sparsity of data vectors has not been studied. To the best of our knowledege, He et al. proposed the first concrete measure called Relative Contrast (RC) to evaluate the influence of several data characteristics such as dimensionality, sparsity and dataset size simultaneously on the difficulty of NN search [38]. They present a theoretical analysis to prove how RC determines the complexity of Locality Sensitive Hashing, a popular approximate NN search method. Relative Constrast aslo provides an explanation for a family of heristic hashing algorithm with good practical performance based on PCA. However, no evidence is given that RC can be used to explain the success of graph-based NN search method directly.
Identifying the intrinsic dimensionality (ID) of datasets has been studied for decades since its importance in machine learning, databases and data mining. Recently, local ID gains much attention since it is very useful when data is composed of heterogeneous manifolds. In addition to applications in manifold learning, measures of local ID have been used in the context of evaluate the difficulty of NN search [39]. Several local intrinsic dimensionality models have been proposed, such as the expansion dimension (ED) [40], the generalized expansion dimension (GED) [41], the minimum neighbor distance (MiND) [42], local continuous intrinsic dimension (LID) [43]. While these measures have been shown useful in their own right, non of them is applicable in explaining the salient performance of the graph-based methods.
V-B A Brief Review of the Existing ANN Search Methods
Approximate nearest neighbor search (ANNS) has been a hot topic over decades, it provides fundamental support for many applications of data mining, databases and information retrieval [44, 45, 46]. There is a large amount of significant literature on algorithms for approximate nearest neighbor search, which are mainly divided into the following categories: tree-structure based approaches, hashing-based approaches, quantization-based approaches, and graph-based approaches.
V-B1 tree-structure based approaches
Hierarchical structures (tree) based methods offer a natural way to continuously partition a dataset into discrete regions at multiple scales, such as KD-tree [47], R-tree [48], SR-tree [49]. These methods perform very well when the dimensionality of the data is relatively low. However, it has been proved to be inefficient when the dimensionality of data is high. It has been shown in [50] that when the dimensionality exceeds about 10, existing indexing data structures based on space partitioning are slower than the brute-force, linear-scan approach. Many new hierarchical-structure-based methods [51] are presented to address this limitation.
V-B2 hashing-based approaches
Among the approximate NN search algorithms, the Locality Sensitive Hashing is the most widely used one due to its excellent theoretical guarantees and empirical performance. E2LSH, the classical LSH implementations for norm, cannot solve -ANN search problem directly. In practice, one has to either assume there exists a “magical’ radius , which can lead arbitrarily bad outputs, or uses multiple hashing tables tailored for different raddi, which may lead to prohibitively large space consumption in indexing. To reduce the storage cost, LSB-Forest [52] and C2LSH [53] use the so-called virtual rehashing technique, implicitly or explicitly, to avoid building physical hash tables for each search radius. The index size of LSB-Forest is far greater than that of C2LSH because the former ensures that the worst-case I/O cost is sub-linear to both and whereas the latter has no such guarantee - it only bounds the number of candidates by some constant but ignores the overhead in index access.
Based on the idea of query-aware hashing, the two state-of-the-art algorithms QALSH and SRS further improve the efficiency over C2LSH by using different index structures and search methods, respectively. SRS uses an -dimensional -tree (typically ) to store the pair for each point and transforms the -ANN search in the -dimensional space into the range query in the -dimensional projection space. The rationale is that the probability that a point is the NN of decreases as increases, where During -ANN search, points are accessed according to the increasing order of their .
Motivated by the observation that the optimal metric is application-dependent, LazyLSH [54] is proposed to solve the NN search problem for the fractional distance metrics, i.e., metrics () with a single index. FALCONN is the state-of-the-art LSH scheme for the angular distance, both theoretically and practically [55]. Except for E2LSH and FALCONN, the other algorithms are disk-based and thus can handle datasets that do not fit into the memory.
All of the aforementioned LSH algorithms provide probability guarantees on the result quality (recall and/or precision). To achieve better efficiency, many LSH extensions such as Multi-probe LSH [56], SK-LSH [57], LSH-forest [58] and Selective hashing [59] use heuristics to access more plausible buckets or re-organize datasets, and do not ensure any LSH-like theoretical guarantee.
V-B3 quantization-based approaches
The most common quantization-based methods is product quantization (PQ) [2]. It seeks to perform a similar dimension reduction to hashing algorithms, but in a way that better retains information about the relative distances between points in the original vector space. Formally, a quantizer is a function q mapping a -dimensional vector to a vector , where the index set is finite: . The reproduction values are called centroids. The set of vectors mapped to given index is referred to as a cell, and defined as
The cells of a quantizer form a partition of . So all the vectors lying in the same cell are reconstructed by the same centroid . Due to the huge number of samples required and the complexity of learning the quantizer, PQ uses distinct quantizers to quantize the subvectors separately. An input vector will be divided into m distinct subvectors , . The dimension of each subvector is . An input vector x is mapped as follows:
where is a low-complexity quantizer associated with the subvector. And the codebook is defined as the Cartesian product,
and a centroid of this set is the concatenation of centroids of the subquantizers. All subquantizers have the same finite number of reproduction values, the total number of centroids is .
After using PQ, all database vectors will be replaced by reproduction values. In order to speed up the query, PQ proposes a look-up table to directly get the distance between the reproduction values and the query vector. They propose two methods to compute an approximate Euclidean distance between these vectors: the so-called Asymmetric Distance Computation (ADC) and the Symmetric Distance Computation (SDC). See Figure 7 for an illustration. Let’s take the introduction of ADC as an example.

The database vector is represented by , but the query is not encoded. The distance is approximated by the distance , which is computed using the decomposition
where the squared distances , are computed before the search. The calculation method of SDC is similar to ADC, but the query vector is represented by . SDC limits the memory usage associated with the queries and ADC has a lower distance distortion for a similar complexity.
PQ offers three attractive properties: (1) PQ compresses an input vector into a short code (e.g., 64-bits), which enables it to handle typically one billion data points in memory; (2) the approximate distance between a raw vector and a compressed PQ code is computed efficiently (the so-called asymmetric distance computation (ADC) and the symmetric distance computation (SDC)), which is a good estimation of the original Euclidean distance; and (3) the data structure and coding algorithm are simple, which allow it to hybridize with other indexing structures. Becasue these methods avoid distance calculations on the original data vectors, it will cause a loss of certain calculation accuracy. When the recall rate is close to 1.0, the required length of the candidate list is close to the size of the dataset. Many quantization-based methods try to reduce quantization errors to improve calculation accuracy, such as Optimal Product Quantization (OPQ) [60] and Tree Quantization (TQ) [61].
V-B4 graph-based approaches
Recently, graph-based methods have drawn considerable attention, such as NSG [5], HNSW [22], Efanna [62], and FANNG [27]. Graph-based methods construct a NN graph offline, which can be regard as a big network graph in high-dimensional space. However, the construction complexity of the exact kNN graph, especially when it comes to large datasets, will increase exponentially. Many researchers turn to building an approximated NN graph, but it is still time consuming. There are two main types of graphs: directed graphs and undirected graphs.
At online search stage, they all use greedy-search algorithm or its variants. While these method require to find the initial point in advance, and the easiest way is to choose randomly. During the search, it can quickly converge from the initial point to the neighborhood of the query point. But one problem of this method is that it is easily to converge to local optima and result in a low recall. One way to solve this problem is to provide better initialization candidate set for a query point. Instead of using random selection, choosing to use the Navigating node (the approximate medoid of the dataset) and its neighbors as the candidate. Another method is to try to make the constructed graph monotonous. The edge selection strategy of MRNG, which was first proposed in paper [5], can ensure that the graph is a Monotonic Search Network (MSNET). Ideally, the search path will iterate from the starting point until reaching the query point and ending, this means that no backtracking occurs during the search.
Because the construction of graphs greatly affects search performance, many researchers focus on constructing index graphs. The fundamental issue is how to choose the neighbors of nodes on the graph. We will introduce two state-of-the-art graph neighbor selection strategies: Relative Neighborhood Graphs (RNG) [17] and Monotonic Relative Neighborhood Graphs (MRNG) [5]. Formally, given two points p and q in space, represents an open sphere where the center is q, and is the radius. The is defined as:
FANNG [27] and HNSW [22] adopt the RNG’s edge selection strategy to construct the index. RNG is an edge selection strategy based on undirected graph, and it selects edges by checking whether there is a point in the intersection of two open spheres. In Figure 8(a), node has prepared a set of neighbor candidates for selection. If there is no node in , then and are linked. Otherwise, there is no edge between and . Because , , , and , there are no edges between p and . Although RNG can reduce its out-degree to a constant , it does not have sufficient edges to be a MSNET. NSG adopts the MRNG’s edge selection strategy to construct the index, which is a directed graph. In Figure 8(b), and are linked to each other because there is no node in . and are not linked because and are linked and . However, and are linked because and are not linked and . The graph constructed by MRNG is an MSNET. The common purpose of these two graph construction methods is to reduce the average out-degree of the graph so as to make the graph sparse and reduce the search complexity. These interesting selection strategies have achieved attractive results, which makes that many graph-based methods perform well in search time, such as Efanna [62], KGraph, HNSW and NSG.


VI Conclusion
This paper takes the first step to shed light on why the graph-based search algorithms work so well in practice and suggests that the clustering coefficient of NN graph is an important measure for the efficiency of these algorithms. Detailed analysis is also conducted to show how clustering coefficient affects the local structure of NN graphs. A few open problems still exists. For example, formal analysis under some simplified data model is important to have more rigorous understanding of the graph search procedure.
Acknowledgements
The work reported in this paper is partially supported by NSFC under grant number 61370205, NSF of Xinjiang Key Laboratory under grant number 2019D04024.
References
- [1] M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni, “Locality-sensitive hashing scheme based on p-stable distributions,” in SoCG, 2004, pp. 253–262.
- [2] H. Jégou, M. Douze, and C. Schmid, “Product quantization for nearest neighbor search,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 1, pp. 117–128, 2011.
- [3] M. Douze, A. Sablayrolles, and H. Jégou, “Link and code: Fast indexing with graphs and compact regression codes,” in CVPR, 2018, pp. 3646–3654.
- [4] Y. Dong, P. Indyk, I. P. Razenshteyn, and T. Wagner, “Learning space partitions for nearest neighbor search,” in ICLR, 2020.
- [5] C. Fu, C. Xiang, C. Wang, and D. Cai, “Fast approximate nearest neighbor search with the navigating spreading-out graph,” Proc. VLDB Endow., vol. 12, no. 5, pp. 461–474, 2019.
- [6] T. Laarhoven, “Graph-based time-space trade-offs for approximate near neighbors,” in SoCG, B. Speckmann and C. D. Tóth, Eds., pp. 57:1–57:14.
- [7] L. Prokhorenkova, “Graph-based nearest neighbor search: From practice to theory,” CoRR, vol. abs/1907.00845, 2019.
- [8] A. Andoni, T. Laarhoven, I. P. Razenshteyn, and E. Waingarten, “Optimal hashing-based time-space trade-offs for approximate near neighbors,” in SODA, 2017, pp. 47–66.
- [9] D. W. Dearholt, N. Gonzales, and G. Kurup, “Monotonic search networks for computer vision databases,” in Twenty-Second Asilomar Conference on Signals, Systems and Computers, vol. 2, 1988, pp. 548–553.
- [10] T. B. Sebastian and B. B. Kimia, “Metric-based shape retrieval in large databases,” in ICPR, 2002, pp. 291–296.
- [11] S. Morozov and A. Babenko, “Non-metric similarity graphs for maximum inner product search,” in NIPS, 2018, pp. 4726–4735.
- [12] J. M. Kleinberg, “Navigation in a small world,” Nature, vol. 406, no. 6798, p. 845, 2000.
- [13] Y. Malkov, A. Ponomarenko, A. Logvinov, and V. Krylov, “Approximate nearest neighbor algorithm based on navigable small world graphs,” Inf. Syst., vol. 45, pp. 61–68, 2014.
- [14] D. J. Watts and S. H. Strogatz, “Collective dynamics of ”small-world” networks,” Nature, vol. 393, pp. 440–442, 1998.
- [15] M. Newman, Networks: An Introduction. Oxford University Press, 2010.
- [16] S. Arya and D. M. Mount, “Approximate nearest neighbor queries in fixed dimensions,” in SODA, 1993, pp. 271–280.
- [17] J. W. Jaromczyk and G. T. Toussaint, “Relative neighborhood graphs and their relatives,” Proceedings of the IEEE, vol. 80, no. 9, pp. 1502–1517, 1992.
- [18] G. Navarro, “Searching in metric spaces by spatial approximation,” in SPIRE/CRIWG, 1999, pp. 141–148.
- [19] F. Aurenhammer, “Voronoi diagrams - A survey of a fundamental geometric data structure,” ACM Comput. Surv., vol. 23, no. 3, pp. 345–405, 1991.
- [20] R. Paredes and E. Chávez, “Using the k-nearest neighbor graph for proximity searching in metric spaces,” in SPIRE, 2005, pp. 127–138.
- [21] “KGraph.” [Online]. Available: https://github.com/aaalgo/kgraph
- [22] Y. A. Malkov and D. A. Yashunin, “Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 4, pp. 824–836, 2020.
- [23] J. M. Kleinberg, “The small-world phenomenon: an algorithmic perspective,” in STOC, 2000, pp. 163–170.
- [24] W. Li, Y. Zhang, Y. Sun, W. Wang, W. Zhang, and X. Lin, “Approximate nearest neighbor search on high dimensional data - experiments, analyses, and improvement (v1.0),” CoRR, vol. abs/1610.02455, 2016.
- [25] K. Hajebi, Y. Abbasi-Yadkori, H. Shahbazi, and H. Zhang, “Fast approximate nearest-neighbor search with k-nearest neighbor graph,” in IJCAI, 2011, pp. 1312–1317.
- [26] M. Aumüller, E. Bernhardsson, and A. J. Faithfull, “Ann-benchmarks: A benchmarking tool for approximate nearest neighbor algorithms,” Inf. Syst., vol. 87, 2020.
- [27] B. Harwood and T. Drummond, “FANNG: fast approximate nearest neighbour graphs,” in CVPR, 2016, pp. 5713–5722.
- [28] K. Aoyama, K. Saito, H. Sawada, and N. Ueda, “Fast approximate similarity search based on degree-reduced neighborhood graphs,” in SIGKDD, 2011, pp. 1055–1063.
- [29] M. Iwasaki and D. Miyazaki, “Optimization of indexing based on k-nearest neighbor graph for proximity search in high-dimensional data,” CoRR, vol. abs/1810.07355, 2018.
- [30] M. Iwasaki, “Pruned bi-directed k-nearest neighbor graph for proximity search,” in SISAP, vol. 9939, 2016, pp. 20–33.
- [31] D. Baranchuk and A. Babenko, “Towards similarity graphs constructed by deep reinforcement learning,” CoRR, vol. abs/1911.12122, 2019.
- [32] D. Baranchuk, D. Persiyanov, A. Sinitsin, and A. Babenko, “Learning to route in similarity graphs,” in ICML, vol. 97, 2019, pp. 475–484.
- [33] Z. Zhou, S. Tan, Z. Xu, and P. Li, “Möbius transformation for fast inner product search on graph,” in NeurIPS, 2019, pp. 8216–8227.
- [34] J. Wang and S. Li, “Query-driven iterated neighborhood graph search for large scale indexing,” in ACM MM, 2012, pp. 179–188.
- [35] F. P. Preparata and M. I. Shamos, Computational Geometry - An Introduction. Springer, 1985.
- [36] K. S. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft, “When is ”nearest neighbor” meaningful?” in ICDT, 1999, pp. 217–235.
- [37] D. François, V. Wertz, and M. Verleysen, “The concentration of fractional distances,” IEEE Trans. Knowl. Data Eng., vol. 19, no. 7, pp. 873–886, 2007.
- [38] J. He, S. Kumar, and S.-F. Chang, “On the difficulty of nearest neighbor search,” in ICML, 2012, pp. 1127–1134.
- [39] M. E. Houle and M. Nett, “Rank-based similarity search: Reducing the dimensional dependence,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 1, pp. 136–150, 2015.
- [40] D. R. Karger and M. Ruhl, “Finding nearest neighbors in growth-restricted metrics,” in STOC, 2002, pp. 741–750.
- [41] M. E. Houle, H. Kashima, and M. Nett, “Generalized expansion dimension,” in ICDM Workshops, 2012, pp. 587–594.
- [42] A. Rozza, G. Lombardi, C. Ceruti, E. Casiraghi, and P. Campadelli, “Novel high intrinsic dimensionality estimators,” Mach. Learn., vol. 89, no. 1-2, pp. 37–65, 2012.
- [43] M. E. Houle, “Dimensionality, discriminability, density and distance distributions,” in ICDM Workshops, 2013, pp. 468–473.
- [44] W. G. Aref, A. C. Catlin, J. Fan, A. K. Elmagarmid, M. A. Hammad, I. F. Ilyas, M. S. Marzouk, and X. Zhu, “A video database management system for advancing video database research,” in Multimedia Information Systems, 2002, pp. 8–17.
- [45] R. Fagin, R. Kumar, and D. Sivakumar, “Efficient similarity search and classification via rank aggregation,” in SIGMOD, 2003, pp. 301–312.
- [46] Y. Ke, R. Sukthankar, and L. Huston, “An efficient parts-based near-duplicate and sub-image retrieval system,” in ACM Multimedia, 2004, pp. 869–876.
- [47] J. L. Bentley, “K-d trees for semidynamic point sets,” in SoCG, 1990, pp. 187–197.
- [48] A. Guttman, “R-trees: A dynamic index structure for spatial searching,” in SIGMOD, 1984, pp. 47–57.
- [49] N. Katayama and S. Satoh, “The sr-tree: An index structure for high-dimensional nearest neighbor queries,” in SIGMOD. ACM Press, 1997, pp. 369–380.
- [50] R. Weber, H.-J. Schek, and S. Blott, “A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces,” in VLDB. Morgan Kaufmann, 1998, pp. 194–205.
- [51] P. Ram and K. Sinha, “Revisiting kd-tree for nearest neighbor search,” in KDD, 2019, pp. 1378–1388.
- [52] Y. Tao, K. Yi, C. Sheng, and P. Kalnis, “Quality and efficiency in high dimensional nearest neighbor search,” in SIGMOD, 2009, pp. 563–576.
- [53] J. Gan, J. Feng, Q. Fang, and W. Ng, “Locality-sensitive hashing scheme based on dynamic collision counting,” in SIGMOD, 2012, pp. 541–552.
- [54] Y. Zheng, Q. Guo, A. K. H. Tung, and S. Wu, “Lazylsh: Approximate nearest neighbor search for multiple distance functions with a single index,” in SIGMOD, 2016, pp. 2023–2037.
- [55] A. Andoni, P. Indyk, T. Laarhoven, I. P. Razenshteyn, and L. Schmidt, “Practical and optimal LSH for angular distance,” in NIPS, 2015, pp. 1225–1233.
- [56] Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li, “Multi-probe lsh: Efficient indexing for high-dimensional similarity search,” in VLDB, 2007, pp. 950–961.
- [57] Y. Liu, J. Cui, Z. Huang, H. Li, and H. T. Shen, “SK-LSH: an efficient index structure for approximate nearest neighbor search,” PVLDB, vol. 7, no. 9, pp. 745–756, 2014.
- [58] M. Bawa, T. Condie, and P. Ganesan, “LSH forest: self-tuning indexes for similarity search,” in WWW, 2005, pp. 651–660.
- [59] J. Gao, H. V. Jagadish, B. C. Ooi, and S. Wang, “Selective hashing: Closing the gap between radius search and k-nn search,” in SIGKDD, 2015, pp. 349–358.
- [60] T. Ge, K. He, Q. Ke, and J. Sun, “Optimized product quantization for approximate nearest neighbor search,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 2946–2953.
- [61] A. Babenko and V. Lempitsky, “Tree quantization for large-scale similarity search and classification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 4240–4248.
- [62] C. Fu and D. Cai, “Efanna: An extremely fast approximate nearest neighbor search algorithm based on knn graph,” arXiv preprint arXiv:1609.07228, 2016.