This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Optimal Dyck Reachability for Data-Dependence and Alias Analysis

Krishnendu Chatterjee Institute of Science and Technology, AustriaAm Campus 1Klosterneuburg3400Austria krishnendu.chatterjee@ist.ac.at Bhavya Choudhary Indian Institute of Technology, BombayIIT Area, PowaiMumbai400076India bhavya@cse.iitb.ac.in  and  Andreas Pavlogiannis Institute of Science and Technology, AustriaAm Campus 1Klosterneuburg3400Austria pavlogiannis@ist.ac.at
(2018)
Abstract.

A fundamental algorithmic problem at the heart of static analysis is Dyck reachability. The input is a graph where the edges are labeled with different types of opening and closing parentheses, and the reachability information is computed via paths whose parentheses are properly matched. We present new results for Dyck reachability problems with applications to alias analysis and data-dependence analysis. Our main contributions, that include improved upper bounds as well as lower bounds that establish optimality guarantees, are as follows.

First, we consider Dyck reachability on bidirected graphs, which is the standard way of performing field-sensitive points-to analysis. Given a bidirected graph with nn nodes and mm edges, we present: (i) an algorithm with worst-case running time O(m+nα(n))O(m+n\cdot\alpha(n)), where α(n)\alpha(n) is the inverse Ackermann function, improving the previously known O(n2)O(n^{2}) time bound; (ii) a matching lower bound that shows that our algorithm is optimal wrt to worst-case complexity; and (iii) an optimal average-case upper bound of O(m)O(m) time, improving the previously known O(mlogn)O(m\cdot\log n) bound.

Second, we consider the problem of context-sensitive data-dependence analysis, where the task is to obtain analysis summaries of library code in the presence of callbacks. Our algorithm preprocesses libraries in almost linear time, after which the contribution of the library in the complexity of the client analysis is only linear, and only wrt the number of call sites.

Third, we prove that combinatorial algorithms for Dyck reachability on general graphs with truly sub-cubic bounds cannot be obtained without obtaining sub-cubic combinatorial algorithms for Boolean Matrix Multiplication, which is a long-standing open problem. Thus we establish that the existing combinatorial algorithms for Dyck reachability are (conditionally) optimal for general graphs. We also show that the same hardness holds for graphs of constant treewidth.

Finally, we provide a prototype implementation of our algorithms for both alias analysis and data-dependence analysis. Our experimental evaluation demonstrates that the new algorithms significantly outperform all existing methods on the two problems, over real-world benchmarks.

Data-dependence analysis, CFL reachability, Dyck reachability, Bidirected graphs, treewidth
copyright: rightsretaineddoi: 10.1145/3158118journalyear: 2018journal: PACMPLjournalvolume: 2journalnumber: POPLarticle: 30publicationmonth: 1ccs: Theory of computation Program analysisccs: Theory of computation Graph algorithms analysis

1. Introduction

In this work we present improved upper bounds, lower bounds, and experimental results for algorithmic problems related to Dyck reachability, which is a fundamental problem in static analysis. We present the problem description, its main applications, the existing results, and our contributions.

Static analysis and language reachability. Static analysis techniques obtain information about programs without running them on specific inputs. These techniques explore the program behavior for all possible inputs and all possible executions. For non-trivial programs, it is impossible to explore all the possibilities, and hence various approximations are used. A standard way to express a plethora of static analysis problems is via language reachability that generalizes graph reachability. The input consists of an underlying graph with labels on its edges from a fixed alphabet, and a language, and reachability paths between two nodes must produce strings that belong to the given language (Yannakakis, 1990; Reps, 1997).

CFL and Dyck reachability. An extremely important case of language reachability in static analysis is CFL reachability, where the input language is context-free, which can be used to model, e.g., context-sensitivity or field-sensitivity. The CFL reachability formulation has applications to a very wide range of static analysis problems, such as interprocedural data-flow analysis (Reps et al., 1995), slicing (Reps et al., 1994), shape analysis (Reps, 1995), impact analysis (Arnold, 1996), type-based flow analysis (Rehof and Fähndrich, 2001) and alias/points-to analysis (Shang et al., 2012; Sridharan and Bodík, 2006; Sridharan et al., 2005; Xu et al., 2009; Yan et al., 2011a; Zheng and Rugina, 2008), etc. In practice, widely-used large-scale analysis tools, such as Wala (Wal, 2003) and Soot (Vallée-Rai et al., 1999; Bodden, 2012), equip CFL reachability techniques to perform such analyses. In most of the above cases, the languages used to define the problem are those of properly-matched parenthesis, which are known as Dyck languages, and form a proper subset of context-free languages. Thus Dyck reachability is at the heart of many problems in static analysis.

Alias analysis. Alias analysis has been one of the major types of static analysis and a subject of extensive study (Sridharan et al., 2013; Choi et al., 1993; Landi and Ryder, 1992; Hind, 2001). The task is to decide whether two pointer variables may point to the same object during program execution. As the problem is computationally expensive (Horwitz, 1997; Ramalingam, 1994), practically relevant results are obtained via approximations. One popular way to perform alias analysis is via points-to analysis, where two variables may alias if their points-to sets intersect. Points-to analysis is typically phrased as a Dyck reachability problem on Symbolic Points-to Graphs (SPGs), which contain information about variables, heap objects and parameter passing due to method calls (Xu et al., 2009; Yan et al., 2011a). In alias analysis there is an important distinction between context and field sensitivity, which we describe below.

  • Context vs field sensitivity. Typically, the Dyck parenthesis are used in SPGs to specify two types of constraints. Context sensitivity refers to the requirement that reachability paths must respect the calling context due to method calls and returns. Field sensitivity refers to the requirement that reachability paths must respect field accesses of composite types in Java (Sridharan and Bodík, 2006; Sridharan et al., 2005; Xu et al., 2009; Yan et al., 2011a), or references and dereferences of pointers (Zheng and Rugina, 2008) in C. Considering both types of sensitivity makes the problem undecidable (Reps, 2000). Although one recent workaround is approximation algorithms (Zhang and Su, 2017), the standard approach has been to consider only one type of sensitivity. Field sensitivity has been reported to produce better results, and being more scalable (Lhoták and Hendren, 2006). We focus on context-insensitive, but field-sensitive points-to analysis.

Data-dependence analysis. Data-dependence analysis aims to identify the def-use chains in a program. It has many applications, including slicing (Reps et al., 1994), impact analysis (Arnold, 1996) and bloat detection (Xu et al., 2010). It is also used in compiler optimizations, where data dependencies are used to infer whether it is safe to reorder or parallelize program statements (Kuck et al., 1981). Here we focus on the distinction between library vs client analysis and the challenge of callbacks.

  • Library vs Client. Modern-day software is developed in multiple stages and is interrelated. The vast majority of software development relies on existing libraries and third-party components which are typically huge and complex. At the same time, the analysis of client code is ineffective if not performed in conjunction with the library code. These dynamics give rise to the potential of analyzing library code once, in an offline stage, and creating suitable analysis summaries that are relevant to client behavior only. The benefit of such a process is two-fold. First, library code need only be analyzed once, regardless of the number of clients that link to it. Second, it offers fast client-code analysis, since the expensive cost of analyzing the huge libraries has been spent offline, in an earlier stage. Data-dependence analysis admits a nice separation between library and client code, and has been studied in (Tang et al., 2015; Palepu et al., 2017).

  • The challenge of callbacks. As pointed out recently in (Tang et al., 2015), one major obstacle to effective library summarization is the presence of callbacks. Callback functions are declared and used by the library, but are implemented by the client. Since these functions are missing when the library code is analyzed, library summarization is ineffective and the whole library needs to be reanalyzed on the client side, when callback functions become available.

Algorithmic formulations and existing results. We describe below the key algorithmic problems in the applications mentioned above and the existing results. We focus on data-dependence and alias analysis via Dyck reachability, which is the standard way for performing such analysis. Recall that the problem of Dyck reachability takes as input a (directed) graph, where some edges are marked with opening and closing parenthesis, and the task is to compute for every pair of nodes whether there exists a path between them such that the parenthesis along its edges are matched.

  1. (1)

    Points-to analysis. Context-insensitive, field-sensitive points-to analysis via Dyck reachability is phrased on an SPG GG with nn nodes and mm edges. Additionally, the graph is bidirected, meaning that if GG has an edge (u,v)(u,v) labeled with an opening parenthesis, then it must also have the edge (v,u)(v,u) labeled with the corresponding closing parenthesis. Bidirected graphs are found in most existing works on on-demand alias analysis via Dyck reachability, and their importance has been remarked in various works (Yuan and Eugster, 2009; Zhang et al., 2013).

    The best existing algorithms for the problem appear in the recent work of (Zhang et al., 2013), where two algorithms are proposed. The first has O(n2)O(n^{2}) worst-case time complexity; and the second has O(mlogn)O(m\cdot\log n) average-case time complexity and O(mnlogn)O(m\cdot n\cdot\log n) worst-case complexity. Note that for dense graphs m=Θ(n2)m=\Theta(n^{2}), and the first algorithm has better average-case complexity too.

  2. (2)

    Library/Client data-dependence analysis. The standard algorithmic formulation of context-sensitive data-dependence analysis is via Dyck reachability, where the parenthesis are used to properly match method calls and returns in a context-sensitive way (Reps, 2000; Tang et al., 2015). The algorithmic approach to Library/Client Dyck reachability consists of considering two graphs G1G_{1} and G2G_{2}, for the library and client code respectively. The computation is split into two phases. In the preprocessing phase, the Dyck reachability problem is solved on G1G_{1} (using a CFL/Dyck reachability algorithm), and some summary information is maintained, which is typically in the form of some subgraph G1G^{\prime}_{1} of G1G_{1}. In the query phase, the Dyck reachability is solved on the combination of the two graphs G1G^{\prime}_{1} and G2G_{2}. Let n1n_{1}, n2n_{2} and n1n^{\prime}_{1} be the sizes of G1G_{1}, G2G_{2} and G1G^{\prime}_{1} respectively. The algorithm spends O(n13)O(n_{1}^{3}) time in the preprocessing phase, and O((n1+n2)3)O((n^{\prime}_{1}+n_{2})^{3}) time in the query phase. Hence we have an improvement if n1>>n1n^{\prime}_{1}>>n_{1}.

    In the presence of callbacks, library summarization via CFL reachability is ineffective, as n1n^{\prime}_{1} can be as large as n1n_{1}. To face this challenge, the recent work of (Tang et al., 2015) introduced TAL reachability. This approach spends O(n16)O(n_{1}^{6}) time on the client code (hence more than the CFL reachability algorithm), and is able to produce a summary of size s<n1s<n_{1} even in the presence of callbacks. Afterwards, the client analysis is performed in O((s+n2)6)O((s+n_{2})^{6}) time, and hence the cost due to the library only appears in terms of its summary.

  3. (3)

    Dyck reachability on general graphs. As we have already mentioned, Dyck reachability is a fundamental algorithmic formulation of many types of static analysis. For general graphs (not necessarily bidirected), the existing algorithms require O(n3)O(n^{3}) time, and they essentially solve the more general CFL reachability problem (Yannakakis, 1990). The current best algorithm is due to (Chaudhuri, 2008), which utilizes the well-knwon Four Russians’ Trick to exhibit complexity O(n3/logn)O(n^{3}/\log n). The problem has been shown to be 2NPDA-hard (Heintze and McAllester, 1997), which yields a conditional cubic lower bound in its complexity.

Our contributions. Our main contributions can be characterized in three parts: (a) improved upper bounds; (b) lower bounds with optimality guarantees; and (c) experimental results. We present the details of each of them below.

Improved upper bounds. Our improved upper bounds are as follows:

  1. (1)

    For Dyck reachability on bidirected graphs with nn nodes and mm edges, we present an algorithm with the following bounds: (a) The worst-case complexity bound is O(m+nα(n))O(m+n\cdot\alpha(n)) time and O(m)O(m) space, where α(n)\alpha(n) is the inverse Ackermann function, improving the previously known O(n2)O(n^{2}) time bound. Note that α(n)\alpha(n) is an extremely slowly growing function, and for all practical purposes, α(n)4\alpha(n)\leq 4, and hence practically the worst-case bound of our algorithm is linear. (b) The average-case complexity is O(m)O(m) improving the previously known O(mlogn)O(m\cdot\log n) bound. See Table 1 for a summary.

  2. (2)

    For Library/Client Dyck reachability we exploit the fact that the data-dependence graphs that arise in practice have special structure, namely they contain components of small treewidth. We denote by n1n_{1} and n2n_{2} the size of the library graph and client graph, and by k1k_{1} and k2k_{2} the number of call sites in the library graph and client graph, respectively. We present an algorithm that analyzes the library graph in O(n1+k1logn1)O(n_{1}+k_{1}\cdot\log n_{1}) time and O(n1)O(n_{1}) space. Afterwards, the library and client graphs are analyzed together only in O(n2+k1logn1+k2logn2)O(n_{2}+k_{1}\cdot\log n_{1}+k_{2}\cdot\log n_{2}) time and O(n1+n2)O(n_{1}+n_{2}) space. Hence, since typically n1>>n2n_{1}>>n_{2} and ni>>kin_{i}>>k_{i}, the cost of analyzing the large library occurs only in the preprocessing phase. When the client code needs to be analyzed, the cost incurred due to the library code is small. See Table 2 for a summary.

Lower bounds and optimality guarantees. Along with improved upper bounds we present lower bound and conditional lower bound results that imply optimality guarantees. Note that optimal guarantees for graph algorithms are extremely rare, and we show the algorithms we present have certain optimality guarantees.

  1. (1)

    For Dyck reachability on bidirected graphs we present a matching lower bound of Ω(m+nα(n))\Omega(m+n\cdot\alpha(n)) for the worst-case time complexity. Thus we obtain matching lower and upper bounds for the worst-case complexity, and thus our algorithm is optimal wrt to worst-case complexity. Since the average-case complexity of our algorithm is linear, the algorithm is also optimal wrt the average-case complexity.

  2. (2)

    For Library/Client Dyck reachability note that k1n1k_{1}\leq n_{1} and k2n2k_{2}\leq n_{2}. Hence our algorithm for analyzing library and client code is almost linear time, and hence optimal wrt polynomial improvements.

  3. (3)

    For Dyck reachability on general graphs we present a conditional lower bound. In algorithmic study, a standard problem for showing conditional cubic lower bounds is Boolean Matrix Multiplication (BMM) (Lee, 2002; Henzinger et al., 2015; Vassilevska Williams and Williams, 2010; Abboud and Vassilevska Williams, 2014). While fast matrix multiplication algorithms exist (such as Strassen’s algorithm (Strassen, 1969)), these algorithms are not “combinatorial”111Not combinatorial means algebraic methods (Le Gall, 2014), which are algorithms with large constants. In contrast, combinatorial algorithms are discrete and non-algebraic; for detailed discussion see (Henzinger et al., 2015). The standard conjecture (called the BMM conjecture) is that there is no truly sub-cubic222Truly sub-cubic means polynomial improvement, in contrast to improvement by logarithmic factors such as O(n3/logn)O(n^{3}/\log n) combinatorial algorithm for BMM, which has been widely used in algorithmic studies for obtaining various types of hardness results (Lee, 2002; Henzinger et al., 2015; Vassilevska Williams and Williams, 2010; Abboud and Vassilevska Williams, 2014). We show that Dyck reachability on general graphs even for a single pair is BMM hard. More precisely, we show that for any δ>0\delta>0, any algorithm that solves pair Dyck reachability on general graphs in O(n3δ)O(n^{3-\delta}) time implies an algorithm that solves BMM in O(n3δ/3)O(n^{3-\delta/3}) time. Since all algorithms for Dyck reachability are combinatorial, it establishes a conditional hardness result (under the BMM conjecture) for general Dyck reachability. Additionally, we show that the same hardness result holds for Dyck reachability on graphs of constant treewidth. Our hardness shows that the existing cubic algorithms are optimal (modulo logarithmic-factor improvements), under the BMM conjecture. Existing work establishes that Dyck reachability is 2NPDA-hard (Heintze and McAllester, 1997), which yields a a conditional lower bound. Our result shows that Dyck reachability is also BMM-hard, and even on constant-treewidth graphs, and thus strengthens the conditional cubic lower bound for the problem.

Table 1. Comparison of our results with existing work for Dyck reachability on bidirected graphs with nn nodes and mm edges. We also prove a matching lower-bound for the worst-case analysis.
Worst-case Time Average-case Time Space Reference
Existing O(n2)O\left(n^{2}\right) O(min(n2,mlogn))O\left(\min\left(n^{2},m\cdot\log n\right)\right) O(m)O(m) (Zhang et al., 2013)
Our Result O(m+nα(n))O(m+n\cdot\alpha(n)) O(m)O(m) O(m)O(m) Theorem 3.6 , Corollary 3.7
Table 2. Library/Client CFL reachability on the library graph of size n1n_{1} and the client graph of size n2n_{2}.
ss is the number of library summary nodes, as defined in (Tang et al., 2015).
k1k_{1} is the number of call sites in the library code, with k1<sk_{1}<s.
k2k_{2} is the number of call sites in the client code.
Approach Time Space Reference
Library Client Library Client
CFL O(n13)O\left(n_{1}^{3}\right) O((n1+n2)3)O\left((n_{1}+n_{2})^{3}\right) O(n12)O\left(n_{1}^{2}\right) O((n1+n2)2)O\left((n_{1}+n_{2})^{2}\right) (Tang et al., 2015)
TAL O(n16)O\left(n_{1}^{6}\right) O((s+n2)6)O\left((s+n_{2})^{6}\right) O(n14)O\left(n_{1}^{4}\right) O((s+n2)4)O\left((s+n_{2})^{4}\right) (Tang et al., 2015)
Our Result O(n1+k1logn1)O\left(n_{1}+k_{1}\cdot\log n_{1}\right) O(n2+k1logn1+k2logn2)O\left(n_{2}+k_{1}\cdot\log n_{1}+k_{2}\cdot\log n_{2}\right) O(n1)O(n_{1}) O(n1+n2)O(n_{1}+n_{2}) Theorem 5.8

Experimental results. A key feature of our algorithms are that they are simple to implement. We present experimental results both on alias analysis (see Section 6.1) and library/client data-dependence analysis (see Section 6.2) and show that our algorithms outperform previous approaches for the problems on real-world benchmarks.

Due to lack of space, full proofs can be found in full version of this paper (Chatterjee et al., 2017).

1.1. Other Related Work

We have already discussed in details the most relevant works related to language reachability, alias analysis and data-dependence analysis. We briefly discuss works related to treewidth in program analysis and verification.

Treewidth in algorithms and program analysis. In the context of programming languages, it was shown by (Thorup, 1998) that the control-flow graphs for goto-free programs for many programming languages have constant treewidth, which has been followed by practical approaches as well (such as (Gustedt et al., 2002)). The treewidth property has received a lot of attention in algorithm community, for NP-complete problems (Arnborg and Proskurowski, 1989; Bern et al., 1987; Bodlaender, 1988), combinatorial optimization problems (Bertele and Brioschi, 1972), graph problems such as shortest path (Chaudhuri and Zaroliagis, 1995; Chatterjee et al., 2016b). In algorithmic analysis of programming languages and verification the treewidth property has been exploited in interprocedural analysis (Chatterjee et al., 2015b), concurrent intraprocedural analysis (Chatterjee et al., 2016a), quantitative verification of finite-state graphs (Chatterjee et al., 2015a), etc. To the best of our knowledge the constant-treewidth property has not be considered for data-dependence analysis. Our experimental results show that in practice many real-world benchmarks have the constant-treewidth property, and our algorithms for data-dependence analysis exploit this property to present faster algorithms.

2. Preliminaries

Graphs and paths. We denote by G=(V,E)G=(V,E) a finite directed graph (henceforth called simply a graph) where VV is a set of nn nodes and EV×VE\subseteq V\times V is an edge relation of mm edges. Given a set of nodes XVX\subseteq V, we denote by G[X]=(X,E(X×X))G[X]=(X,E\cap(X\times X)) the subgraph of GG induced by XX. A path PP is a sequence of edges (e1,,er)(e_{1},\dots,e_{r}) and each ei=(xi,yi)e_{i}=(x_{i},y_{i}) is such that x1=ux_{1}=u, yr=vy_{r}=v, and for all 1ir11\leq i\leq r-1 we have yi=xi+1y_{i}=x_{i+1}. The length of PP is |P|=r|P|=r. A path PP is simple if no node repeats in PP (i.e., the path does not contain a cycle). Given two paths P1=(e1,er1)P_{1}=(e_{1},\dots e_{r_{1}}) and P2=(e1,e)P_{2}=(e^{\prime}_{1},\dots e^{\prime}_{\ell}) with er=(x,y)e_{r}=(x,y) and e1=(y,z)e^{\prime}_{1}=(y,z), we denote by P1P2P_{1}\circ P_{2} the concatenation of P2P_{2} on P1P_{1}. We use the notation xPx\in P to denote that a node xx appears in PP, and ePe\in P to denote that an edge ee appears in PP. Given a set BVB\subseteq V, we denote by PBP\cap B the set of nodes of BB that appear in PP. We say that a node uu is reachable from node vv if there exists a path P:uvP:u\rightsquigarrow v.

Dyck Languages. Given a nonnegative integer kk\in\mathbb{N}, we denote by Σk={ϵ}{αi,α¯i}i=1k\Sigma_{k}=\{\epsilon\}\cup\{\alpha_{i},\overline{\alpha}_{i}\}_{i=1}^{k} a finite alphabet of kk parenthesis types, together with a null element ϵ\epsilon. We denote by k\mathcal{L}_{k} the Dyck language over Σk\Sigma_{k}, defined as the language of strings generated by the following context-free grammar 𝒢k\mathcal{G}_{k}:

𝒮𝒮𝒮|𝒜1𝒜¯1||𝒜k𝒜¯k|ϵ;𝒜iαi𝒮;𝒜¯i𝒮α¯i\mathcal{S}\to\mathcal{S}\leavevmode\nobreak\ \mathcal{S}\leavevmode\nobreak\ |\leavevmode\nobreak\ \mathcal{A}_{1}\leavevmode\nobreak\ \overline{\mathcal{A}}_{1}\leavevmode\nobreak\ |\leavevmode\nobreak\ \dots\leavevmode\nobreak\ |\leavevmode\nobreak\ \mathcal{A}_{k}\leavevmode\nobreak\ \overline{\mathcal{A}}_{k}\leavevmode\nobreak\ |\leavevmode\nobreak\ \epsilon\ ;\qquad\mathcal{A}_{i}\to\alpha_{i}\leavevmode\nobreak\ \mathcal{S}\ ;\qquad\overline{\mathcal{A}}_{i}\to\mathcal{S}\leavevmode\nobreak\ \overline{\alpha}_{i}

Given a string ss and a non-terminal symbol XX of the above grammar, we write XsX\vdash s to denote that XX produces ss according to the rules of the grammar. In the rest of the document we consider an alphabet Σk\Sigma_{k} and the corresponding Dyck language k\mathcal{L}_{k}. We also let ΣkO={αi}i=1k\Sigma^{O}_{k}=\{\alpha_{i}\}_{i=1}^{k} and ΣkC={α¯i}i=1k\Sigma^{C}_{k}=\{\overline{\alpha}_{i}\}_{i=1}^{k} be the subsets of Σk\Sigma_{k} of only opening and closing parenthesis, respectively.

Labeled graphs, Dyck reachability, and Dyck SCCs (DSCCs). We denote by G=(V,E)G=(V,E) a Σk\Sigma_{k}-labeled directed graph where VV is the set of nodes and EV×V×ΣkE\subseteq V\times V\times\Sigma_{k} is the set of edges labeled with symbols from Σk\Sigma_{k}. Hence, an edge ee is of the form e=(u,v,λ)e=(u,v,\lambda) where u,vVu,v\in V and λΣk\lambda\in\Sigma_{k}. We require that for every u,vVu,v\in V, there is a unique label λ\lambda such that (u,v,λ)E(u,v,\lambda)\in E. Often we will be interested only on the endpoints of an edge ee, in which case we represent e=(u,v)e=(u,v), and will denote by λ(e)\lambda(e) the label of ee. Given a path PP, we define the label of PP as λ(P)=λ(e1)λ(er)\lambda(P)=\lambda(e_{1})\dots\lambda(e_{r}). Given two nodes u,vu,v, we say that vv is Dyck-reachable from uu if there exists a path P:uvP:u\rightsquigarrow v such that λ(P)k\lambda(P)\in\mathcal{L}_{k}. In that case, PP is called a witness path of the reachability. A set of nodes XVX\subseteq V is called a Dyck SCC (or DSCC) if for every pair of nodes u,vXu,v\in X, we have that uu reaches vv and vv reaches uu. Note that there might exist a DSCC XX and a pair of nodes u,vXu,v\in X such that every witness path P:uvP:u\rightsquigarrow v might be such that PXP\not\subseteq X, i.e., the witness path contains nodes outside the DSCC.

3. Dyck Reachability on Bidirected Graphs

In this section we present an optimal algorithm for solving the Dyck reachability problem on Σk\Sigma_{k}-labeled bidirected graphs GG. First, in Section 3.1, we formally define the problem. Second, in Section 3.2, we describe an algorithm 𝖡𝗂𝖽𝗂𝗋𝖾𝖼𝗍𝖾𝖽𝖱𝖾𝖺𝖼𝗁\mathsf{BidirectedReach} that solves the problem in time O(m+nα(n))O(m+n\cdot\alpha(n)), where nn is the number of nodes of GG, mm is the number of edges of GG, and α(n)\alpha(n) is the inverse Ackermann function. Finally, in Section 3.3, we present an Ω(m+nα(n))\Omega(m+n\cdot\alpha(n)) lower bound.

3.1. Problem Definition

We start with the problem definition of Dyck reachability on bidirected graphs. For the modeling power of bidirected graphs we refer to (Yuan and Eugster, 2009; Zhang et al., 2013) and our Experimental Section 6.1.

Bidirected Graphs. A Σk\Sigma_{k} labeled graph G=(V,E)G=(V,E) is called bidirected if for every pair of nodes u,vVu,v\in V, the following conditions hold. (1) (u,v,ϵ)Eiff(v,u,ϵ)E(u,v,\epsilon)\in E\quad\text{iff}\quad(v,u,\epsilon)\in E; and (2)  for all 1ik1\leq i\leq k we have that (u,v,αi)Eiff(v,u,α¯i)E(u,v,\alpha_{i})\in E\quad\text{iff}\quad(v,u,\overline{\alpha}_{i})\in E. Informally, the edge relation is symmetric, and the labels of symmetric edges are complimentary wrt to opening and closing parenthesis. The following remark captures a key property of bidirected graphs that can be exploited to lead to faster algorithms.

Remark 1 ((Zhang et al., 2013)).

For bidirected graphs the Dyck reachability relation forms an equivalence, i.e., for all bidirected graphs GG, for every pair of nodes u,vu,v, we have that vv is Dyck-reachable from uu iff uu is Dyck-reachable from vv.

Remark 2.

We consider without loss of generality that a bidirected graph GG has no edge (u,v)(u,v) such that λ(u,v)=ϵ\lambda(u,v)=\epsilon, i.e., there are no ϵ\epsilon-labeled edges. This is because in such a case, u,vu,v form a DSCC, and can be merged into a single node. Merging all nodes that share an ϵ\epsilon-labeled edge requires only linear time, and hence can be applied as a preprocessing step at (asymptotically) no extra cost.

Dyck reachability on bidirected graphs. We are given a Σk\Sigma_{k}-labeled bidirected graph G=(V,E)G=(V,E), and our task is to compute for every pair of nodes u,vu,v whether vv is Dyck-reachable from uu. As customary, we consider that k=O(1)k=O(1), i.e., kk is fixed wrt to the input graph (Chaudhuri, 2008). In view of Remark 1, it suffices that the output is a list of DSCCs. Note that the output has size Θ(n)\Theta(n) instead of Θ(n2)\Theta(n^{2}) that would be required for storing one bit of information per u,vu,v pair. Additionally, the pair query time is O(1)O(1), by testing whether the two nodes belong to the same DSCC.

3.2. An Almost Linear-time Algorithm

We present our algorithm 𝖡𝗂𝖽𝗂𝗋𝖾𝖼𝗍𝖾𝖽𝖱𝖾𝖺𝖼𝗁\mathsf{BidirectedReach}, for Dyck reachability on bidirected graphs, with almost linear-time complexity.

Informal description of 𝖡𝗂𝖽𝗂𝗋𝖾𝖼𝗍𝖾𝖽𝖱𝖾𝖺𝖼𝗁\mathsf{BidirectedReach}. We start by providing a high-level description of 𝖡𝗂𝖽𝗂𝗋𝖾𝖼𝗍𝖾𝖽𝖱𝖾𝖺𝖼𝗁\mathsf{BidirectedReach}. The main idea is that for any two distinct nodes u,vu,v to belong to some DSCC XX, there must exist two (not necessarily distinct) nodes x,yx,y that belong to some DSCC YY (possibly X=YX=Y)333That is, xx and yy might refer to the same node, and XX and YY to the same DSCC. and a closing parenthesis α¯iΣkC\overline{\alpha}_{i}\in\Sigma^{C}_{k} such that (x,u,α¯i),(y,v,α¯i)E(x,u,\overline{\alpha}_{i}),(y,v,\overline{\alpha}_{i})\in E. See Figure 1 for an illustration. The algorithm uses a Disjoint Sets data structure to maintain DSCCs discovered so far. Each DSCC is represented as a tree TT rooted on some node xVx\in V, and xx is the only node of TT that has outgoing edges. However, any node of TT can have incoming edges. See Figure 2 for an illustration. Upon discovering that a root node xx of some tree TT has two or more outgoing edges (x,u1,α¯i),(x,u2,α¯i),(x,ur,α¯i)(x,u_{1},\overline{\alpha}_{i}),(x,u_{2},\overline{\alpha}_{i}),\dots(x,u_{r},\overline{\alpha}_{i}), for some α¯iΣkC\overline{\alpha}_{i}\in\Sigma^{C}_{k}, the algorithm uses rr 𝖥𝗂𝗇𝖽\mathsf{Find} operations of the Disjoint Sets data structure to determine the trees TiT_{i} that the nodes uiu_{i} belong to. Afterwards, a 𝖴𝗇𝗂𝗈𝗇\mathsf{Union} operation is performed between all TiT_{i} to form a new tree TT, and all the outgoing edges of the root of each TiT_{i} are merged to the outgoing edges of the root of TT.

xxuuvvzzα¯\overline{\alpha}α\alphaα¯\overline{\alpha}α\alphaα¯\overline{\alpha}α\alphaα\alpha,α¯\overline{\alpha}
(a)
xxuuvvzzα¯\overline{\alpha}α\alphaα¯\overline{\alpha}α\alphaα¯\overline{\alpha}α\alphaα\alpha,α¯\overline{\alpha}
(b)
xxuuvvzzα¯\overline{\alpha}α\alphaα¯\overline{\alpha}α\alphaα¯\overline{\alpha}α\alphaα\alpha,α¯\overline{\alpha}
(c)
Figure 1. Illustration of the merging principle of 𝖡𝗂𝖽𝗂𝗋𝖾𝖼𝗍𝖾𝖽𝖱𝖾𝖺𝖼𝗁\mathsf{BidirectedReach}. (1(a)) The nodes uu and vv are in the same DSCC since node xx has an outgoing edge to each of uu and vv labeled with the closing parenthesis α¯\overline{\alpha}. (1(b)) Similarly, nodes zz and vv belong to the same DSCC, since there exist two nodes uu and vv such that (i)  uu and vv belong to the same DSCC, (ii) uu has an outgoing edge to zz, and vv has an outgoing edge to itself, and (iii) both outgoing edges are labeled with the same closing parenthesis symbol. (1(c)) The final DSCC formation.
ssttuuvvwwxxyyzzα¯1\overline{\alpha}_{1}α¯2\overline{\alpha}_{2}α¯1\overline{\alpha}_{1}α¯2\overline{\alpha}_{2}α¯3\overline{\alpha}_{3}α¯3\overline{\alpha}_{3}α¯2\overline{\alpha}_{2}
Figure 2. A state of 𝖡𝗂𝖽𝗂𝗋𝖾𝖼𝗍𝖾𝖽𝖱𝖾𝖺𝖼𝗁\mathsf{BidirectedReach} consists of a set of trees, with outgoing edges coming only from the root of each tree.

Complexity overview. The cost of every 𝖥𝗂𝗇𝖽\mathsf{Find} and 𝖴𝗇𝗂𝗈𝗇\mathsf{Union} operation is bounded by the inverse Ackermann function α(n)\alpha(n) (see (Tarjan, 1975)), which, for all practical purposes, can be considered constant. Additionally, every edge-merge operation requires constant time, using a linked list for storing the outgoing edges. Although list merging in constant time creates the possibility of duplicate edges, such duplicates come at no additional complexity cost. Since every 𝖴𝗇𝗂𝗈𝗇\mathsf{Union} of kk trees reduces the number of existing edges by k1k-1, the overall complexity of 𝖡𝗂𝖽𝗂𝗋𝖾𝖼𝗍𝖾𝖽𝖱𝖾𝖺𝖼𝗁\mathsf{BidirectedReach} is O(mα(n))O(m\cdot\alpha(n)). We later show how to obtain the O(m+nα(n))O(m+n\cdot\alpha(n)) complexity.

We are now ready to give the formal description of 𝖡𝗂𝖽𝗂𝗋𝖾𝖼𝗍𝖾𝖽𝖱𝖾𝖺𝖼𝗁\mathsf{BidirectedReach}. We start with introducing the Union-Find problem, and its solution given by a disjoint sets data structure.

The Union-Find problem. The Union-Find problem is a well-studied problem in the area of algorithms and data structures (Galil and Italiano, 1991; Cormen et al., 2001). The problem is defined over a universe XX of nn elements, and the task is to maintain partitions of XX under set union operations. Initially, every element xXx\in X belongs to a singleton set {x}\{x\}. A union-find sequence σ\sigma is a sequence of mm (typically mnm\geq n) operations of the following two types.

  1. (1)

    𝖴𝗇𝗂𝗈𝗇(x,y)\mathsf{Union}(x,y), for x,yXx,y\in X, performs a union of the sets that xx and yy belong to.

  2. (2)

    𝖥𝗂𝗇𝖽(x)\mathsf{Find}(x), for xXx\in X, returns the name of the unique set containing xx.

The sequence σ\sigma is presented online, i.e., an operation needs to be completed before the next one is revealed. Additionally, a 𝖴𝗇𝗂𝗈𝗇(x,y)\mathsf{Union}(x,y) operation is allowed in the ii-th position of σ\sigma only if the prefix of σ\sigma up to position i1i-1 places xx and yy on different sets. The output of the problem consists of the answers to 𝖥𝗂𝗇𝖽\mathsf{Find} operations of σ\sigma. It is known that the problem can be solved in O(mα(n))O(m\cdot\alpha(n)) time, by an appropriate Disjoint Sets data structure (Tarjan, 1975), and that this complexity is optimal (Tarjan, 1979; Banachowski, 1980).

The 𝖣𝗂𝗌𝗃𝗈𝗂𝗇𝗍𝖲𝖾𝗍𝗌\mathsf{DisjointSets} data structure. We consider at our disposal a Disjoint Sets data structure 𝖣𝗂𝗌𝗃𝗈𝗂𝗇𝗍𝖲𝖾𝗍𝗌\mathsf{DisjointSets} which maintains a set of subsets of VV under a sequence of set union operations. At all times, the name of each set XX is a node xXx\in X which is considered to be the representative of xx. 𝖣𝗂𝗌𝗃𝗈𝗂𝗇𝗍𝖲𝖾𝗍𝗌\mathsf{DisjointSets} provides the following operations.

  1. (1)

    For a node uu, 𝖬𝖺𝗄𝖾𝖲𝖾𝗍(u)\mathsf{MakeSet}(u) constructs the singleton set {u}\{u\}.

  2. (2)

    For a node uu, 𝖥𝗂𝗇𝖽(u)\mathsf{Find}(u) returns the representative of the set that uu belongs to.

  3. (3)

    For a set of nodes SVS\subseteq V which are pairwise in different sets, and a distinguished node xSx\in S, 𝖴𝗇𝗂𝗈𝗇(S,x)\mathsf{Union}(S,x) performs the union of the sets that the nodes in SS belong to, and makes xx the representative of the new set.

The 𝖣𝗂𝗌𝗃𝗈𝗂𝗇𝗍𝖲𝖾𝗍𝗌\mathsf{DisjointSets} data structure can be straightforwardly obtained from the corresponding Disjoint Sets data structures used to solve the Union-Find problem (Tarjan, 1975), and has O(α(n))O(\alpha(n)) amortized complexity per operation. Typically each set is stored as a rooted tree, and the root node is the representative of the set.

Formal description of 𝖡𝗂𝖽𝗂𝗋𝖾𝖼𝗍𝖾𝖽𝖱𝖾𝖺𝖼𝗁\mathsf{BidirectedReach}. We are now ready to present formally 𝖡𝗂𝖽𝗂𝗋𝖾𝖼𝗍𝖾𝖽𝖱𝖾𝖺𝖼𝗁\mathsf{BidirectedReach} in Algorithm 1. Recall that, in view of Remark 2, we consider that the input graph has no ϵ\epsilon-labeled edges. In the initialization phase, the algorithm constructs a map 𝖤𝖽𝗀𝖾𝗌:V×ΣkCV\mathsf{Edges}:\leavevmode\nobreak\ V\times\Sigma^{C}_{k}\to V^{*}. For each node uVu\in V and closing parenthesis α¯iΣkC\overline{\alpha}_{i}\in\Sigma^{C}_{k}, 𝖤𝖽𝗀𝖾𝗌[u][α¯i]\mathsf{Edges}[u][\overline{\alpha}_{i}] will store the nodes that are found to be reachable from uu via a path PP such that 𝒜¯iλ(P)\overline{\mathcal{A}}_{i}\vdash\lambda(P) (i.e., the label of PP has matching parenthesis except for the last parenthesis α¯i\overline{\alpha}_{i}). Observe that all such nodes must belong to the same DSCC.

The main computation happens in the loop of Line 1. The algorithm maintains a queue 𝒬\mathcal{Q} that acts as a worklist and stores pairs (u,α¯i)(u,\overline{\alpha}_{i}) such that uu is a node that has been found to contain at least two outgoing edges labeled with α¯i\overline{\alpha}_{i}. Upon extracting an element (u,α¯i)(u,\overline{\alpha}_{i}) from the queue, the algorithm obtains the representatives vv of the sets of the nodes in 𝖤𝖽𝗀𝖾𝗌[u][α¯i]\mathsf{Edges}[u][\overline{\alpha}_{i}]. Since all such nodes belong to the same DSCC, the algorithm chooses an element xx to be the new representative, and performs a 𝖴𝗇𝗂𝗈𝗇\mathsf{Union} operation of the underlying sets. The new representative xx gathers the outgoing edges of all other nodes v𝖤𝖽𝗀𝖾𝗌[u][α¯i]v\in\mathsf{Edges}[u][\overline{\alpha}_{i}], and afterwards 𝖤𝖽𝗀𝖾𝗌[u][α¯i]\mathsf{Edges}[u][\overline{\alpha}_{i}] points only to xx.

Input: A Σk\Sigma_{k}-labeled bidirected graph G=(V,E)G=(V,E)
Output: A 𝖣𝗂𝗌𝗃𝗈𝗂𝗇𝗍𝖲𝖾𝗍𝗌\mathsf{DisjointSets} map of DSCCs
// Initialization
1 𝒬\mathcal{Q}\leftarrow an empty queue
2 𝖤𝖽𝗀𝖾𝗌\mathsf{Edges}\leftarrow a map V×ΣkCVV\times\Sigma^{C}_{k}\to V^{*} implemented as a linked list
3 𝖣𝗂𝗌𝗃𝗈𝗂𝗇𝗍𝖲𝖾𝗍𝗌\mathsf{DisjointSets}\leftarrow a disjoint-sets data structure over VV
4 foreach uVu\in V do
5  𝖣𝗂𝗌𝗃𝗈𝗂𝗇𝗍𝖲𝖾𝗍𝗌.𝖬𝖺𝗄𝖾𝖲𝖾𝗍(u)\mathsf{DisjointSets}.\mathsf{MakeSet}(u)
6  for i1i\leftarrow 1 to kk do
7     𝖤𝖽𝗀𝖾𝗌[u][α¯i](v:(u,v,α¯i)E)\mathsf{Edges}[u][\overline{\alpha}_{i}]\leftarrow(v:\leavevmode\nobreak\ (u,v,\overline{\alpha}_{i})\in E)
8     if |𝖤𝖽𝗀𝖾𝗌[u][α¯i]|2|\mathsf{Edges}[u][\overline{\alpha}_{i}]|\geq 2 then  Insert (u,α¯i)(u,\overline{\alpha}_{i}) in 𝒬\mathcal{Q}
9    
10  end for
11 
12 end foreach
// Computation
13 while 𝒬\mathcal{Q} is not empty do
14  Extract (u,α¯i)(u,\overline{\alpha}_{i}) from 𝒬\mathcal{Q}
15  if u=𝖣𝗂𝗌𝗃𝗈𝗂𝗇𝗍𝖲𝖾𝗍𝗌.𝖥𝗂𝗇𝖽(u)u=\mathsf{DisjointSets}.\mathsf{Find}(u) then
16     Let S{𝖣𝗂𝗌𝗃𝗈𝗂𝗇𝗍𝖲𝖾𝗍𝗌.𝖥𝗂𝗇𝖽(w):w𝖤𝖽𝗀𝖾𝗌[u][α¯i]}S\leftarrow\{\mathsf{DisjointSets}.\mathsf{Find}(w):w\in\mathsf{Edges}[u][\overline{\alpha}_{i}]\}
17     if |S|2|S|\geq 2 then
18        Let xx\leftarrow some arbitrary element of S{u}S\setminus\{u\}
19        Make 𝖣𝗂𝗌𝗃𝗈𝗂𝗇𝗍𝖲𝖾𝗍𝗌.𝖴𝗇𝗂𝗈𝗇(S,x)\mathsf{DisjointSets}.\mathsf{Union}(S,x)
20        for j1j\leftarrow 1 to kk do
21           foreach vS{x}v\in S\setminus\{x\} do
22              if uvu\neq v or iji\neq j then
23                 Move 𝖤𝖽𝗀𝖾𝗌[v][α¯j]\mathsf{Edges}[v][\overline{\alpha}_{j}] to 𝖤𝖽𝗀𝖾𝗌[x][α¯j]\mathsf{Edges}[x][\overline{\alpha}_{j}]
24                
25             else
26                 Append (x)(x) to 𝖤𝖽𝗀𝖾𝗌[x][α¯j]\mathsf{Edges}[x][\overline{\alpha}_{j}]
27              end if
28             
29           end foreach
30          if |𝖤𝖽𝗀𝖾𝗌[x][α¯j]|2|\mathsf{Edges}[x][\overline{\alpha}_{j}]|\geq 2 then  Insert (x,α¯j)(x,\overline{\alpha}_{j}) in 𝒬\mathcal{Q}
31          
32        end for
33       
34    else
35        Let xx\leftarrow the single node in SS
36     end if
37    if uSu\not\in S or |S|=1|S|=1 then  𝖤𝖽𝗀𝖾𝗌[u][α¯i](x)\mathsf{Edges}[u][\overline{\alpha}_{i}]\leftarrow(x)
38    
39 
40 end while
return 𝖣𝗂𝗌𝗃𝗈𝗂𝗇𝗍𝖲𝖾𝗍𝗌\mathsf{DisjointSets}
Algorithm 1 𝖡𝗂𝖽𝗂𝗋𝖾𝖼𝗍𝖾𝖽𝖱𝖾𝖺𝖼𝗁\mathsf{BidirectedReach}

Example. Consider the state of the algorithm given by Figure 2, representing the DSCCs of the Union-Find data structure 𝖣𝗂𝗌𝗃𝗈𝗂𝗇𝗍𝖲𝖾𝗍𝗌\mathsf{DisjointSets} (i.e., the undirected trees in the figure) as well as the contents of the 𝖤𝖽𝗀𝖾𝗌\mathsf{Edges} data structure (i.e., the directed edges in the Figure). There are currently 3 DSCCS, with representatives ss, vv and zz. Recall that the queue 𝒬\mathcal{Q} stores (node, closing parenthesis) pairs with the property that the node has at least two outgoing edges labeled with the respective closing parenthesis. Observe that nodes ss and zz have at two outgoing edges each that have the same type of parenthesis, hence they must have been inserted in the queue 𝒬\mathcal{Q} at some point. Assume that 𝒬=[(s,α¯1),(z,α¯3)]\mathcal{Q}=[(s,\overline{\alpha}_{1}),(z,\overline{\alpha}_{3})]. The algorithm will exhibit the following sequence of steps.

  1. (1)

    The element (z,α¯3)(z,\overline{\alpha}_{3}) is extracted from 𝒬\mathcal{Q}. We have 𝖤𝖽𝗀𝖾𝗌[z][α¯3]=(x,y)\mathsf{Edges}[z][\overline{\alpha}_{3}]=(x,y). Observe that xx and yy belong to the same DSCC rooted at vv, hence in Line 1 the algorithm will construct S={v}S=\{v\}. Since |S|=1|S|=1, the algorithm will simply set 𝖤𝖽𝗀𝖾𝗌[z][α¯3]=(v)\mathsf{Edges}[z][\overline{\alpha}_{3}]=(v) in Line 1, and no new DSCC has been formed.

  2. (2)

    The element (s,α¯1)(s,\overline{\alpha}_{1}) is extracted from 𝒬\mathcal{Q}. We have 𝖤𝖽𝗀𝖾𝗌[s][α¯1]=(u,z)\mathsf{Edges}[s][\overline{\alpha}_{1}]=(u,z). Since uu and zz belong to different DSCCs, the algorithm will construct S={s,z}S=\{s,z\}, and perform a 𝖣𝗂𝗌𝗃𝗈𝗂𝗇𝗍𝖲𝖾𝗍𝗌.𝖴𝗇𝗂𝗈𝗇(S,x)\mathsf{DisjointSets}.\mathsf{Union}(S,x) operation, where x=zx=z. Note that union-by-rank will make the tree of zz a subtree of the tree of ss, i.e., zz will become a child of ss. Afterwards, the algorithm swaps the names of zz and ss, as required by the choice of xx in Line 1. Finally, in Line 1, the algorithm will move 𝖤𝖽𝗀𝖾𝗌[s][α¯i]\mathsf{Edges}[s][\overline{\alpha}_{i}] to 𝖤𝖽𝗀𝖾𝗌[z][α¯i]\mathsf{Edges}[z][\overline{\alpha}_{i}] for i=1,2i=1,2. Since now |𝖤𝖽𝗀𝖾𝗌[z][α¯2]|2|\mathsf{Edges}[z][\overline{\alpha}_{2}]|\geq 2, the algorithm inserts (z,α¯2)(z,\overline{\alpha}_{2}) in 𝒬\mathcal{Q}. See Figure 3(a).

  3. (3)

    The element (z,α¯2)(z,\overline{\alpha}_{2}) is extracted from 𝒬\mathcal{Q}. We have 𝖤𝖽𝗀𝖾𝗌[z][α¯2]=(v,z)\mathsf{Edges}[z][\overline{\alpha}_{2}]=(v,z). Since vv and zz belong to different DSCCs, the algorithm will construct S={v,z}S=\{v,z\}, and perform a 𝖣𝗂𝗌𝗃𝗈𝗂𝗇𝗍𝖲𝖾𝗍𝗌.𝖴𝗇𝗂𝗈𝗇(S,x)\mathsf{DisjointSets}.\mathsf{Union}(S,x) operation, where x=vx=v. Note that union-by-rank will make the tree of vv a subtree of the tree of zz, i.e., vv will become a child of zz. Afterwards, the algorithm swaps the names of vv and zz, as required by the choice of xx in Line 1. Finally, in Line 1, the algorithm will move 𝖤𝖽𝗀𝖾𝗌[z][α¯2]\mathsf{Edges}[z][\overline{\alpha}_{2}] to 𝖤𝖽𝗀𝖾𝗌[v][α¯2]\mathsf{Edges}[v][\overline{\alpha}_{2}]. Since now |𝖤𝖽𝗀𝖾𝗌[v][α¯2]|2|\mathsf{Edges}[v][\overline{\alpha}_{2}]|\geq 2, the algorithm inserts (v,α¯2)(v,\overline{\alpha}_{2}) in 𝒬\mathcal{Q}. See Figure 3(b).

  4. (4)

    The element (v,α¯2)(v,\overline{\alpha}_{2}) is extracted from 𝒬\mathcal{Q}. We have 𝖤𝖽𝗀𝖾𝗌[v][α¯2]=(v,t)\mathsf{Edges}[v][\overline{\alpha}_{2}]=(v,t). Observe that vv and tt belong to the same DSCC rooted at vv, hence in Line 1 the algorithm will construct S={v}S=\{v\}. Since |S|=1|S|=1, the algorithm will simply set 𝖤𝖽𝗀𝖾𝗌[v][α¯2]=(v)\mathsf{Edges}[v][\overline{\alpha}_{2}]=(v) in Line 1, and will terminate.

zzttuuvvwwxxyyssα¯1\overline{\alpha}_{1},α¯2\overline{\alpha}_{2}α¯2\overline{\alpha}_{2}α¯2\overline{\alpha}_{2}α¯3\overline{\alpha}_{3}
(a)
vvttuuzzwwxxyyssα¯1\overline{\alpha}_{1},α¯2\overline{\alpha}_{2}, α¯3\overline{\alpha}_{3}α¯2\overline{\alpha}_{2}
(b)
Figure 3. The intermediate stages of 𝖡𝗂𝖽𝗂𝗋𝖾𝖼𝗍𝖾𝖽𝖱𝖾𝖺𝖼𝗁\mathsf{BidirectedReach} starting from the stage of Figure 2.

Correctness. We start with the correctness statement of 𝖡𝗂𝖽𝗂𝗋𝖾𝖼𝗍𝖾𝖽𝖱𝖾𝖺𝖼𝗁\mathsf{BidirectedReach}, which is established in two parts, namely the soundness and completeness, which are shown in the following two lemmas.

Lemma 3.1 (Soundness).

At the end of 𝖡𝗂𝖽𝗂𝗋𝖾𝖼𝗍𝖾𝖽𝖱𝖾𝖺𝖼𝗁\mathsf{BidirectedReach}, for every pair of nodes u,vVu,v\in V, if 𝖣𝗂𝗌𝗃𝗈𝗂𝗇𝗍𝖲𝖾𝗍𝗌.𝖥𝗂𝗇𝖽(u)=𝖣𝗂𝗌𝗃𝗈𝗂𝗇𝗍𝖲𝖾𝗍𝗌.Find(v)\mathsf{DisjointSets}.\mathsf{Find}(u)=\mathsf{DisjointSets}.Find(v) then uu and vv belong to the same DSCC.

Lemma 3.2 (Completeness).

At the end of 𝖡𝗂𝖽𝗂𝗋𝖾𝖼𝗍𝖾𝖽𝖱𝖾𝖺𝖼𝗁\mathsf{BidirectedReach}, for every pair of nodes u,vVu,v\in V in the same DSCC, uu and vv belong to the same set of 𝖣𝗂𝗌𝗃𝗈𝗂𝗇𝗍𝖲𝖾𝗍𝗌\mathsf{DisjointSets}.

Complexity. We now establish the complexity of 𝖡𝗂𝖽𝗂𝗋𝖾𝖼𝗍𝖾𝖽𝖱𝖾𝖺𝖼𝗁\mathsf{BidirectedReach}, in a sequence of lemmas.

Lemma 3.3.

The main loop of Line 1 will be executed O(n)O(n) times.

Proof.

Initially 𝒬\mathcal{Q} is populated by Line 1, which inserts O(n)O(n) elements, as k=O(1)k=O(1). Afterwards, for every k=O(1)\ell\leq k=O(1) elements (u,α¯j)(u,\overline{\alpha}_{j}) inserted in 𝒬\mathcal{Q} via Line 1, there is at least one node vSv\in S which stops being a representative of its own set in 𝖣𝗂𝗌𝗃𝗈𝗂𝗇𝗍𝖲𝖾𝗍𝗌\mathsf{DisjointSets}, and thus will not be in SS in further iterations. Hence 𝒬\mathcal{Q} will contain O(n)O(n) elements in total, and the result follows. ∎

The sets SjS_{j} and SjS^{\prime}_{j}. Consider an element (u,α¯i)(u,\overline{\alpha}_{i}) extracted from 𝒬\mathcal{Q} in the jj-th iteration of the algorithm in Line 1. We denote by SjS^{\prime}_{j} the set 𝖤𝖽𝗀𝖾𝗌[u][α¯i]\mathsf{Edges}[u][\overline{\alpha}_{i}], and by SjS_{j} the set SS constructed in Line 1. If SS was not constructed in that iteration (i.e., the condition in Line 1 does not hold), then we let Sj=S_{j}=\emptyset. It is easy to see that |Sj||Sj||S_{j}|\leq|S^{\prime}_{j}| for all jj. The following crucial lemma bounds the total sizes of the sets SjS^{\prime}_{j} constructed throughout the execution of 𝖡𝗂𝖽𝗂𝗋𝖾𝖼𝗍𝖾𝖽𝖱𝖾𝖺𝖼𝗁\mathsf{BidirectedReach}.

Lemma 3.4.

Let rr be the number of iterations of the main loop in Line 1. We have j=1r|Sj|=O(m)\sum_{j=1}^{r}|S^{\prime}_{j}|=O(m).

Proof.

By Lemma 3.3 we have r=O(n)r=O(n). Let J={j:|Sj|2}J=\{j:|S^{\prime}_{j}|\geq 2\}, and it suffices to prove that jJSj=O(m)\sum_{j\in J}S^{\prime}_{j}=O(m).

We first argue that after a pair (u,α¯i)(u,\overline{\alpha}_{i}) has been extracted from 𝒬\mathcal{Q} in some iteration jJj\in J, the number of edges in 𝖤𝖽𝗀𝖾𝗌\mathsf{Edges} decreases by at least |Sj|1|S^{\prime}_{j}|-1. We consider the following complementary cases depending on the condition of Line 1.

  1. (1)

    If the condition holds, then we have |𝖤𝖽𝗀𝖾𝗌[u][α¯i]|=1|\mathsf{Edges}[u][\overline{\alpha}_{i}]|=1 after Line 1 has been executed.

  2. (2)

    Otherwise, we must have uSu\in S and |S|2|S|\geq 2, hence there exists some xS{u}x\in S\setminus\{u\} chosen in Line 1, and all edges in 𝖤𝖽𝗀𝖾𝗌[u]\mathsf{Edges}[u] will be moved to 𝖤𝖽𝗀𝖾𝗌[x]\mathsf{Edges}[x] for some v=uv=u in Line 1. Hence |𝖤𝖽𝗀𝖾𝗌[u][α¯i]|=0|\mathsf{Edges}[u][\overline{\alpha}_{i}]|=0.

Note that because of Line 1, the edges in 𝖤𝖽𝗀𝖾𝗌[u][α¯i]\mathsf{Edges}[u][\overline{\alpha}_{i}] are not moved to 𝖤𝖽𝗀𝖾𝗌[x][α¯i]\mathsf{Edges}[x][\overline{\alpha}_{i}], hence all 𝖤𝖽𝗀𝖾𝗌[u][α¯i]\mathsf{Edges}[u][\overline{\alpha}_{i}] (except possibly one) will no longer be present at the end of the iteration. Since Sj=𝖤𝖽𝗀𝖾𝗌[u][α¯i]S^{\prime}_{j}=\mathsf{Edges}[u][\overline{\alpha}_{i}] at the beginning of the iteration, we obtain that the number of edges in 𝖤𝖽𝗀𝖾𝗌\mathsf{Edges} decreases by at least |Sj|1|S^{\prime}_{j}|-1.

We define a potential function Φ:\Phi:\mathbb{N}\to\mathbb{N}, such that Φ(j)\Phi(j) equals the number of elements in the data structure 𝖤𝖽𝗀𝖾𝗌\mathsf{Edges} at the beginning of the jj-th iteration of the main loop in Line 1. Note that (i) initially Φ(1)=m\Phi(1)=m, (ii) Φ(j)0\Phi(j)\geq 0 for all jj, and (iii) Φ(j+1)Φ(j)\Phi(j+1)\leq\Phi(j) for all jj, as new edges are never added to 𝖤𝖽𝗀𝖾𝗌\mathsf{Edges}. Let (u,α¯i)(u,\overline{\alpha}_{i}) be an element extracted from 𝒬\mathcal{Q} at the beginning of the jj-the iteration, for some jJj\in J. As shown above, at the end of the iteration we have removed at least |Sj|1|S^{\prime}_{j}|-1 edges from 𝖤𝖽𝗀𝖾𝗌\mathsf{Edges}, and since |Sj|2|S^{\prime}_{j}|\geq 2, we obtain Φ(j+1)Φ(j)|Sj|/2\Phi(j+1)\leq\Phi(j)-|S^{\prime}_{j}|/2. Summing over all jJj\in J, we obtain

jJ|Sj|\displaystyle\sum_{j\in J}|S^{\prime}_{j}| 2jJ(Φ(j)Φ(j+1))\displaystyle\leq 2\cdot\sum_{j\in J}\left(\Phi(j)-\Phi(j+1)\right) [as Φ(j+1)Φ(j)|Sj|/2]\displaystyle\left[\text{as $\Phi(j+1)\leq\Phi(j)-|S^{\prime}_{j}|/2$}\right]
=2=1|J|(Φ(j)Φ(j+1))\displaystyle=2\cdot\sum_{\ell=1}^{|J|}\left(\Phi(j_{\ell})-\Phi(j_{\ell}+1)\right) [for j<j+1]\displaystyle\left[\text{for $j_{\ell}<j_{\ell+1}$}\right]
2Φ(j1)\displaystyle\leq 2\cdot\Phi(j_{1}) [as Φ is decreasing and thus Φ(j+1)Φ(j+1)]\displaystyle\left[\text{as $\Phi$ is decreasing and thus $\Phi(j_{\ell+1})\leq\Phi(j_{\ell}+1)$}\right]
2m\displaystyle\leq 2\cdot m [as Φ(j1)Φ(1)=m]\displaystyle\left[\text{as $\Phi(j_{1})\leq\Phi(1)=m$}\right]

The desired result follows. ∎

Finally, we are ready to establish the complexity of 𝖡𝗂𝖽𝗂𝗋𝖾𝖼𝗍𝖾𝖽𝖱𝖾𝖺𝖼𝗁\mathsf{BidirectedReach}.

Lemma 3.5 (Complexity).

𝖡𝗂𝖽𝗂𝗋𝖾𝖼𝗍𝖾𝖽𝖱𝖾𝖺𝖼𝗁\mathsf{BidirectedReach} requires O(mα(n))O(m\cdot\alpha(n)) time and O(m)O(m) space.

A speedup for non-sparse graphs. Observe that in the case of sparse graphs m=O(n)m=O(n), and Lemma 3.5 yields the complexity O(nα(n))O(n\cdot\alpha(n)). Here we describe a modification of 𝖡𝗂𝖽𝗂𝗋𝖾𝖼𝗍𝖾𝖽𝖱𝖾𝖺𝖼𝗁\mathsf{BidirectedReach} that reduces the complexity from O(mα(n))O(m\cdot\alpha(n)) to O(m+nα(n))O(m+n\cdot\alpha(n)), and thus is faster for graphs where the edges are more than a factor α(n)\alpha(n) as many as the nodes (i.e., m=ω(nα(n))m=\omega(n\cdot\alpha(n))). The key idea is that if a node uu has more than kk outgoing edges initially, then it has two distinct outgoing edges labeled with the same closing parenthesis α¯iΣkC\overline{\alpha}_{i}\in\Sigma^{C}_{k}, and hence the corresponding neighbors can be merged to a single DSCC in a preprocessing step. Once such a merging has taken place, uu only needs to keep a single outgoing edge labeled with α¯i\overline{\alpha}_{i} to that DSCC. This preprocessing phase requires O(m)O(m) time for all nodes, after which there are only O(n)O(n) edges present, by amortizing at most kk edges per node of the original graph (recall that k=O(1)k=O(1)). After this preprocessing step has taken place, 𝖡𝗂𝖽𝗂𝗋𝖾𝖼𝗍𝖾𝖽𝖱𝖾𝖺𝖼𝗁\mathsf{BidirectedReach} is executed with O(n)O(n) edges in its input, and by Lemma 3.5 the complexity is O(nα(n))O(n\cdot\alpha(n)). We conclude the results of this section with the following theorem.

Theorem 3.6 (Worst-case complexity).

Let G=(V,E)G=(V,E) be a Σk\Sigma_{k}-labeled bidirected graph of nn nodes and m=Ω(n)m=\Omega(n) edges. 𝖡𝗂𝖽𝗂𝗋𝖾𝖼𝗍𝖾𝖽𝖱𝖾𝖺𝖼𝗁\mathsf{BidirectedReach} correctly computes the DSCCs of GG and requires O(m+nα(n))O(m+n\cdot\alpha(n)) time and O(m)O(m) space.

Linear-time considerations. Note that α(n)\alpha(n) is an extremely slowly growing function, and for all practical purposes α(n)4\alpha(n)\leq 4. Indeed, the smallest nn for which α(n)=5\alpha(n)=5 far exceeds the estimated number of atoms in the observable universe. Additionally, since it is known that a Disjoint Sets data structure operates in amortized constant expected time per operation (Doyle and Rivest, 1976; Yao, 1985), we obtain the following corollary regarding the expected time complexity of our algorithm.

Corollary 3.7 (Average-case complexity).

For bidirected graphs, the algorithm 𝖡𝗂𝖽𝗂𝗋𝖾𝖼𝗍𝖾𝖽𝖱𝖾𝖺𝖼𝗁\mathsf{BidirectedReach} requires O(m)O(m) expected time for computing DSCCs.

3.3. An Ω(m+nα(n))\Omega(m+n\cdot\alpha(n)) Lower Bound

Theorem 3.6 implies that Dyck reachability on bidirected graphs can be solved in almost-linear time. A theoretically interesting question is whether the problem can be solved in linear time in the worst case. We answer this question in the negative by proving that every algorithm for the problem requires Ω(m+nα(n))\Omega(m+n\cdot\alpha(n)) time, and thereby proving that our algorithm 𝖡𝗂𝖽𝗂𝗋𝖾𝖼𝗍𝖾𝖽𝖱𝖾𝖺𝖼𝗁\mathsf{BidirectedReach} is indeed optimal wrt worst-case complexity.

The Separated Union-Find problem. A sequence σ\sigma of 𝖴𝗇𝗂𝗈𝗇\mathsf{Union}-𝖥𝗂𝗇𝖽\mathsf{Find} operations is called separated if all 𝖥𝗂𝗇𝖽\mathsf{Find} operations occur at the end of σ\sigma. Hence σ=σ1σ2\sigma=\sigma_{1}\circ\sigma_{2}, where σ1\sigma_{1} contains all 𝖴𝗇𝗂𝗈𝗇\mathsf{Union} operations of σ\sigma. We call σ1\sigma_{1} a union sequence and σ2\sigma_{2} a find sequence. The Separated Union-Find problem is the regular Union-Find problem over separated union-find sequences. Note that this version of the problem has an offline flavor, as, at the time when the algorithm is needed to produce output (i.e. when the suffix of 𝖥𝗂𝗇𝖽\mathsf{Find} operations starts) the input has been fixed (i.e., all 𝖴𝗇𝗂𝗈𝗇\mathsf{Union} operations are known). We note that the Separated Union-Find problem is different from the Static Tree Set Union problem (Gabow and Tarjan, 1985), which restricts the type of allowed 𝖴𝗇𝗂𝗈𝗇\mathsf{Union} operations, and for which a linear time algorithm exists on the RAM model. The following lemma states a lower bound on the worst-case complexity of the problem.

Lemma 3.8.

The Separated Union-Find problem over a universe of size nn and sequences of length mm has worst-case complexity Ω(mα(n))\Omega(m\cdot\alpha(n)).

Proof.

The proof is essentially the proof of (Tarjan, 1979, Theorem 4.4), by observing that the sequences constructed there to prove the lower bound are actually separated union-find sequences. ∎

The union graph Gσ1G^{\sigma_{1}}. Let σ1\sigma_{1} be a union sequence over some universe XX. The union graph of σ1\sigma_{1} is a Σ1\Sigma_{1}-labeled bidirected graph Gσ1=(Vσ1,Eσ1)G^{\sigma_{1}}=(V^{\sigma_{1}},E^{\sigma_{1}}), defined as follows (see Figure 4 for an illustration).

  1. (1)

    The node set is Vσ1=X{zi}1i|σ1|V^{\sigma_{1}}=X\cup\{z_{i}\}_{1\leq i\leq|\sigma_{1}|} where the nodes ziz_{i} do not appear in XX.

  2. (2)

    The edge set is Eσ1={(zi,xi,α¯),(zi,yi,α¯)}1i|σ1|E^{\sigma_{1}}=\{(z_{i},x_{i},\overline{\alpha}),(z_{i},y_{i},\overline{\alpha})\}_{1\leq i\leq|\sigma_{1}|}, where xi,yiXx_{i},y_{i}\in X are the elements such that the ii-th operation of σ1\sigma_{1} is 𝖴𝗇𝗂𝗈𝗇(xi,yi)\mathsf{Union}(x_{i},y_{i}).

σ1=𝖴𝗇𝗂𝗈𝗇(u,v),𝖴𝗇𝗂𝗈𝗇(x,y),𝖴𝗇𝗂𝗈𝗇(w,v),𝖴𝗇𝗂𝗈𝗇(w,x)\sigma_{1}=\mathsf{Union}(u,v),\leavevmode\nobreak\ \mathsf{Union}(x,y),\leavevmode\nobreak\ \mathsf{Union}(w,v),\leavevmode\nobreak\ \mathsf{Union}(w,x)uuvvxxyywwz1z_{1}z2z_{2}z3z_{3}z4z_{4}α¯\overline{\alpha}α¯\overline{\alpha}α¯\overline{\alpha}α¯\overline{\alpha}α¯\overline{\alpha}α¯\overline{\alpha}α¯\overline{\alpha}α¯\overline{\alpha}
Figure 4. A union sequence σ1\sigma_{1} and the corresponding graph Gσ1G^{\sigma_{1}}.

A lower bound for Dyck reachability on bidirected graphs. We are now ready to prove our lower bound. The proof consists in showing that there exists no algorithm that solves the problem in o(mα(n))o(m\cdot\alpha(n)) time. Assume towards contradiction otherwise, and let AA^{\prime} be an algorithm that solves the problem in time o(mα(n))o(m\cdot\alpha(n)). We construct an algorithm AA that solves the Separated Union-Find problem in the same time.

Let σ=σ1σ2\sigma=\sigma_{1}\circ\sigma_{2} be a separated union-find sequence, where σ1\sigma_{1} is a union sequence and σ2\sigma_{2} is a find sequence. The algorithm AA operates as follows. It performs no operations until the whole of σ1\sigma_{1} has been revealed. Then, AA^{\prime} constructs the union graph Gσ1G^{\sigma_{1}}, and uses AA^{\prime} to solve the Dyck reachability problem on Gσ1G^{\sigma_{1}}. Finally, every 𝖥𝗂𝗇𝖽(x)\mathsf{Find}(x) operation encountered in σ2\sigma_{2} is handled by AA by using the answer of AA^{\prime} on Gσ1G^{\sigma_{1}}.

It is easy to see that AA handles the input sequence σ\sigma correctly. Indeed, for any sequence of union operations 𝖴𝗇𝗂𝗈𝗇(xi,yi),,𝖴𝗇𝗂𝗈𝗇(xj,yj)\mathsf{Union}(x_{i},y_{i}),\dots,\mathsf{Union}(x_{j},y_{j}) that bring two elements xx and yy to the same set, the edges (zi,xi,α¯),(zi,yi,α¯),,(zj,xj,α¯),(zj,yj,α¯)(z_{i},x_{i},\overline{\alpha}),(z_{i},y_{i},\overline{\alpha}),\dots,(z_{j},x_{j},\overline{\alpha}),(z_{j},y_{j},\overline{\alpha}) must bring xx and yy to the same DSCC of GΣ1G^{\Sigma_{1}}. Finally, the algorithm AA requires O(m)O(m) time for constructing GG and answering all queries, plus o(mα(n))o(m\cdot\alpha(n)) time for running AA^{\prime} on GΣ1G^{\Sigma_{1}}. Hence AA operates in o(mα(n))o(m\cdot\alpha(n)) time, which contradicts Lemma 3.8.

We have thus arrived at the following theorem.

Theorem 3.9 (Lower-bound).

Any Dyck reachability algorithm for bidirected graphs with nn nodes and m=Ω(n)m=\Omega(n) edges requires Ω(m+nα(n))\Omega(m+n\cdot\alpha(n)) time in the worst case.

Theorem 3.9 together with Theorem 3.6 yield the following corollary.

Corollary 3.10 (Optimality).

The Dyck reachability algorithm 𝖡𝗂𝖽𝗂𝗋𝖾𝖼𝗍𝖾𝖽𝖱𝖾𝖺𝖼𝗁\mathsf{BidirectedReach} for bidirected graphs is optimal wrt to worst-case complexity.

4. Dyck Reachability on General Graphs

In this section we present a hardness result regarding the Dyck reachability problem on general graphs, as well as on graphs of constant treewidth.

Complexity of Dyck reachability. Dyck reachability on general graphs is one of the most standard algorithmic formulations of various static analyses. The problem is well-known to admit a cubic-time solution, while the currently best bound is O(n3/logn)O(n^{3}/\log n) due to (Chaudhuri, 2008). Dyck reachability is also known to be 2NPDA-hard (Heintze and McAllester, 1997), which yields a conditional cubic lower bound wrt polynomial improvements. Here we investigate further the complexity of Dyck reachability. We prove that Dych reachability is Boolean Matrix Multiplication (BMM)-hard. Note that since Dyck reachability is a combinatorial graph problem, techniques such as fast-matrix multiplication (e.g. Strassen’s algorithm (Strassen, 1969)) are unlikely to be applicable. Hence we consider combinatorial (i.e., discrete, graph-theoretic) algorithms. The standard BMM-conjecture (Lee, 2002; Henzinger et al., 2015; Vassilevska Williams and Williams, 2010; Abboud and Vassilevska Williams, 2014) states that there is no truly sub-cubic (O(n3δ)O(n^{3-\delta}), for δ>0\delta>0) combinatorial algorithm for Boolean Matrix Multiplication. Given this conjecture, various algorithmic works establish conditional hardness results. Here we show that Dyck reachability is BMM-hard on general graphs, which yields a new conditional cubic lower bound for the problem. Additionally, we show that BMM hardness also holds for Dyck reachability on graphs of constant treewidth. We establish this by showing Dyck reachability on general graphs is hard as CFL parsing.

𝒮𝒯𝒯𝒜𝒮𝒜ab\begin{aligned} \mathcal{S}&\to\mathcal{T}\leavevmode\nobreak\ \mathcal{B}\\ \mathcal{T}&\to\mathcal{A}\leavevmode\nobreak\ \mathcal{S}\\ \mathcal{A}&\to a\\ \mathcal{B}&\to b\end{aligned}
(a)
xxyy𝒮\mathcal{S}𝒯\mathcal{T}𝒜\mathcal{A}\mathcal{B}α¯𝒮\overline{\alpha}_{\mathcal{S}}α¯𝒯\overline{\alpha}_{\mathcal{T}}α¯𝒜\overline{\alpha}_{\mathcal{A}}α¯\overline{\alpha}_{\mathcal{B}}αb\alpha_{b}αa\alpha_{a}α𝒮\alpha_{\mathcal{\mathcal{S}}}αA\alpha_{A}α\alpha_{\mathcal{B}}α𝒯\alpha_{\mathcal{T}}
(b)
vvu0u_{0}u1u_{1}u2u_{2}\cdotsun1u_{n-1}unu_{n}α𝒮\alpha_{\mathcal{S}}α¯s1\overline{\alpha}_{s_{1}}α¯s2\overline{\alpha}_{s_{2}}α¯s3\overline{\alpha}_{s_{3}}α¯sn\overline{\alpha}_{s_{n}}x0x_{0}y0y_{0}x1x_{1}y1y_{1}x2x_{2}y2y_{2}xn1x_{n-1}yn1y_{n-1}
(c)
Figure 5. (5(a)) A grammar 𝒢\mathcal{G} for the language anbna^{n}b^{n}, (5(b)) The gadget graph G𝒢G^{\mathcal{G}}, (5(c)) The parse graph Gs𝒢G_{s}^{\mathcal{G}}, given a string s=s1,dns=s_{1},\dots d_{n}.

The gadget graph G𝒢G^{\mathcal{G}}. Given a Context-free grammar 𝒢\mathcal{G} in Chomsky normal form, we construct the gadget graph G𝒢=(V𝒢,E𝒢)G^{\mathcal{G}}=(V^{\mathcal{G}},E^{\mathcal{G}}) as follows (see Figure 5 (5(a)), (5(b)) for an illustration).

  1. (1)

    The node set V𝒢V^{\mathcal{G}} contains two distinguished nodes x,yx,y, together with a node xix_{i} for the ii-th production rule pip_{i}. Additionally, if pip_{i} is of the form 𝒜𝒞\mathcal{A}\to\mathcal{B}\leavevmode\nobreak\ \mathcal{C}, then V𝒢V^{\mathcal{G}} contains a node yiy_{i}.

  2. (2)

    The edge set E𝒢E^{\mathcal{G}} contains an edge (x,xi,α¯𝒜)(x,x_{i},\overline{\alpha}_{\mathcal{A}}), where 𝒜\mathcal{A} is the left hand side symbol of the ii-th production rule pip_{i} of 𝒢\mathcal{G}. Additionally,

    1. (a)

      if pip_{i} is of the form 𝒜a\mathcal{A}\to a, then E𝒢E^{\mathcal{G}} contains an edge (xi,y,αa)(x_{i},y,\alpha_{a}), else

    2. (b)

      if pip_{i} is of the form 𝒜𝒞\mathcal{A}\to\mathcal{B}\leavevmode\nobreak\ \mathcal{C}, then E𝒢E^{\mathcal{G}} contains the edges (xi,yi,α𝒞)(x_{i},y_{i},\alpha_{\mathcal{C}}) and (yi,y,α)(y_{i},y,\alpha_{\mathcal{B}}).

The parse graph Gs𝒢G^{\mathcal{G}}_{s}. Given a grammar 𝒢\mathcal{G} and an input string s=s1,sns=s_{1},\dots s_{n}, we construct the parse graph Gs𝒢=(Vs𝒢,Es𝒢)G^{\mathcal{G}}_{s}=(V^{\mathcal{G}}_{s},E^{\mathcal{G}}_{s}) as follows. The graph consists of two parts. The first part is a line graph that contains nodes v,u0,u1,unv,u_{0},u_{1},\dots u_{n}, with the edges (v,u0,α𝒮)(v,u_{0},\alpha_{\mathcal{S}}) and (ui1,ui,α¯si)(u_{i-1},u_{i},\overline{\alpha}_{s_{i}}) for all 1in1\leq i\leq n. The second part consists of a nn copies of the gadget graph G𝒢G^{\mathcal{G}}, counting from 0 to n1n-1. Finally, we have a pair of edges (ui,xi,ϵ)(u_{i},x_{i},\epsilon), (yi,ui,ϵ)(y_{i},u_{i},\epsilon) for every 0i<n0\leq i<n, where xix_{i} (resp. yiy_{i}) is the distinguished xx node (resp. yy node) of the ii-th gadget graph. See Figure 5 (5(c)) for an illustration.

Lemma 4.1.

The node unu_{n} is Dyck-reachable from node vv iff ss is generated by 𝒢\mathcal{G}.

Proof.

Given a path PP, we denote by λ¯(P)\overline{\lambda}(P) the substring of λ(P)\lambda(P) that consists of all the closing-parenthesis symbols of λ(P)\lambda(P). The proof follows directly from the following observation: the parse graph Gs𝒢G^{\mathcal{G}}_{s} contains a path P:vunP:v\rightsquigarrow u_{n} with λ(P)\lambda(P)\in\mathcal{L} if and only if λ¯(P)\overline{\lambda}(P) corresponds to a pre-order traversal of a derivation tree of the string ss wrt the grammar 𝒢\mathcal{G}. ∎

Theorem 4.2.

If there exists a combinatorial algorithm that solves the pair Dyck reachability problem in time 𝒯(n)\mathcal{T}(n), where nn is the number of nodes of the input graph, then there exists a combinatorial algorithm that solves the CFL parsing problem in time O(n+𝒯(n))O(n+\mathcal{T}(n)).

Since CFL-parsing is BMM-hard, by combining Theorem 4.2 with (Lee, 2002, Theorem 2) we obtain the following corollary.

Corollary 4.3 (BMM-hardness: Conditional cubic lower bound).

For any fixed δ>0\delta>0, if there is a combinatorial algorithm that solves the pair Dyck reachability problem in O(n3δ)O(n^{3-\delta}) time, then there is a combinatorial algorithm that solves Boolean Matrix Multiplication in O(n3δ/3)O(n^{3-\delta/3}) time.

Remark 3 (BMM hardness for low-treewidth graphs).

Note that since the size of the grammar 𝒢\mathcal{G} is constant, the parse graph Gs𝒢G^{\mathcal{G}}_{s} has constant treewidth. Hence the BMM hardness of Corollary 4.3 also holds if we restrict our attention to Dyck reachability on graphs of constant treewidth.

5. Library/Client Dyck Reachability

In this section we present some new results for library/client Dyck reachability with applications to context-sensitive data-dependence analysis. One crucial step to our improvements is the fact that we consider that the underlying graphs are not arbitrary, but have special structure. We start with Section 5.1 which defines formally the graph models we deal with, and their structural properties. Afterwards, in Section 5.2, we present our algorithms.

5.1. Problem Definition

Here we present a formal definition of the input graphs that we will be considering for library/client Dyck reachability with application to context-sensitive data-dependence analysis. Each input graph GG is not an arbitrary Σk\Sigma_{k}-labeled graph, but has two important structural properties.

  1. (1)

    GG can be naturally partitioned to subgraphs G1,GG_{1},\dots G_{\ell}, such that every GiG_{i} has only ϵ\epsilon-labeled edges. Each such Gi=(Vi,Ei)G_{i}=(V_{i},E_{i}) corresponds to a method of the input program. There are only few nodes of ViV_{i} with incoming edges that are non-ϵ\epsilon-labeled. Similarly, there are only few nodes of ViV_{i} with outgoing edges that are non-ϵ\epsilon-labeled. These nodes correspond to the input parameters and return statements of the ii-th method of the program, which are almost always only a few.

  2. (2)

    Each GiG_{i} is a graph of low treewidth. This is an important graph-theoretic property which, informally, means that GiG_{i} is similar to a tree (although GiG_{i} is not a tree).

We make the above structural properties formal and precise. We start with the first structural property, we captures the fact that the input graph GG consists of many local graphs GiG_{i}, one for each method of the input program, and the parenthesis-labeled edges model context sensitivity.

Program-valid partitionings. Let G=(V,E)G=(V,E) be a Σk\Sigma_{k}-labeled graph. Given some 1ik1\leq i\leq k, we define the following sets.

Vc(αi)={u:(u,v,αi)E}\displaystyle V_{c}(\alpha_{i})=\{u:\exists(u,v,\alpha_{i})\in E\}\quad Ve(αi)={v:(u,v,αi)E}\displaystyle V_{e}(\alpha_{i})=\{v:\exists(u,v,\alpha_{i})\in E\}
Vx(α¯i)={u:(u,v,α¯i)E}\displaystyle V_{x}(\overline{\alpha}_{i})=\{u:\exists(u,v,\overline{\alpha}_{i})\in E\}\quad Vr(α¯i)={v:(u,v,αi)E}\displaystyle V_{r}(\overline{\alpha}_{i})=\{v:\exists(u,v,\alpha_{i})\in E\}

In words, (i) Vc(αi)V_{c}(\alpha_{i}) contains the nodes that have a αi\alpha_{i}-labeled outgoing edge, (ii) Ve(αi)V_{e}(\alpha_{i}) contains the nodes that have a αi\alpha_{i}-labeled incoming edge, (iii) Vx(α¯i)V_{x}(\overline{\alpha}_{i}) contains the nodes that have a α¯i\overline{\alpha}_{i}-labeled outgoing edge, and (iv) Vr(α¯i)V_{r}(\overline{\alpha}_{i}) contains the nodes that have a α¯i\overline{\alpha}_{i}-labeled incoming edge. Additionally, we define the following sets.

Vc=iVc(αi)Ve=iVe(αi)Vx=iVx(α¯i)Vr=iVr(α¯i)V_{c}=\bigcup_{i}V_{c}(\alpha_{i})\quad V_{e}=\bigcup_{i}V_{e}(\alpha_{i})\quad V_{x}=\bigcup_{i}V_{x}(\overline{\alpha}_{i})\quad V_{r}=\bigcup_{i}V_{r}(\overline{\alpha}_{i})

Consider a partitioning 𝒱={V1,,V}\mathcal{V}=\{V_{1},\dots,V_{\ell}\} of the node set VV, i.e., iVi=V\bigcup_{i}V_{i}=V and ViVj=V_{i}\cap V_{j}=\emptyset for all 1i,j1\leq i,j\leq\ell. We say that 𝒱\mathcal{V} is program-valid if the following conditions hold: for every 1ik1\leq i\leq k, there exist some 1j1,j21\leq j_{1},j_{2}\leq\ell such that (i) Vc(αi),Vr(α¯i)Vj1V_{c}(\alpha_{i}),V_{r}(\overline{\alpha}_{i})\subseteq V_{j_{1}}, and (ii) Ve(αi),Vx(α¯i)Vj2V_{e}(\alpha_{i}),V_{x}(\overline{\alpha}_{i})\subseteq V_{j_{2}}. Intuitively, the parenthesis-labeled edges of GG correspond to method calls and returns, and thus model context sensitivity. Each parenthesis type models the calling context, and each G[Vi]G[V_{i}] corresponds to a single method of the program. Since the calling context is tied to two methods (the caller and the callee), conditions (i) and (ii) must hold for the partitioning.

A program-valid partitioning 𝒱={V1,V}\mathcal{V}=\{V_{1},\dots V_{\ell}\} is called bb-bounded if there exists some bb\in\mathbb{N} such that for all 1j1\leq j\leq\ell we have that |VeVj|,|VxVj|b|V_{e}\cap V_{j}|,|V_{x}\cap V_{j}|\leq b. Note that since 𝒱\mathcal{V} is program-valid, this condition also yields that for all 1ik1\leq i\leq k we have that |Vc(αi)|,|Vr(α¯i)|b|V_{c}(\alpha_{i})|,|V_{r}(\overline{\alpha}_{i})|\leq b. In this paper we consider that b=O(1)b=O(1), i.e., bb is constant wrt the size of the input graph. This is true since the sets VeVjV_{e}\cap V_{j} and VxVjV_{x}\cap V_{j} represent the input parameters and the return statements of the jj-th method in the program. Similarly, the sets Vc(αi)V_{c}(\alpha_{i}), Vr(α¯i)V_{r}(\overline{\alpha}_{i}) represent the variables that are passed as input and the variables that capture the return, respectively, of the method that the ii-th call site refers to. In all practical cases each of the above sets has constant size (or even size 11, for return variables).

Program-valid graphs. The graph GG is called program-valid if there exists a constant bb\in\mathbb{N} such that GG has bb-bounded program valid partitioning. Given a such a partitioning 𝒱={V1,,V}\mathcal{V}=\{V_{1},\dots,V_{\ell}\}, we call each graph Gi=(Vi,Ei)=G[Vi]G_{i}=(V_{i},E_{i})=G[V_{i}] a local graph. Given a partitioning of VV to the library partition V1V^{1} and client partition V2V^{2}, 𝒱\mathcal{V} induces a program-valid partitioning on each of the library subgraph G1=G[V1]G^{1}=G[V^{1}] and G2=G[V2]G^{2}=G[V^{2}]. See Figure 6 for an example.

We now present the second structural property of input graphs that we exploit in this work. Namely, for a program-valid input graph GG with a program-valid partitioning 𝒱={V1,,V}\mathcal{V}=\{V_{1},\dots,V_{\ell}\} the local graphs Gi=G[Vi]G_{i}=G[V_{i}] have low treewidth. It is known that the control-flow graphs (CFGs) of goto-free programs have small treewidth (Thorup, 1998). The local graphs GiG_{i} are not CFGs, but rather graphs defined by def-use chains. As we show in this work (see Section 6.2), the local def-use graphs of real-world benchmarks also have small treewidth. Below, we make the above notions precise.

Trees. A (rooted) tree T=(VT,ET)T=(V_{T},E_{T}) is an undirected graph with a distinguished node ww which is the root such that there is a unique simple path Puv:uvP_{u}^{v}:u\rightsquigarrow v for each pair of nodes u,vu,v. Given a tree TT with root ww, the level 𝖫𝗏(u)\mathsf{Lv}(u) of a node uu is the length of the simple path PuwP_{u}^{w} from uu to the root rr. Every node in PuwP_{u}^{w} is an ancestor of uu. If vv is an ancestor of uu, then uu is a descendant of vv. For a pair of nodes u,vVTu,v\in V_{T}, the lowest common ancestor (LCA) of uu and vv is the common ancestor of uu and vv with the largest level. The parent uu of vv is the unique ancestor of vv in level 𝖫𝗏(v)1\mathsf{Lv}(v)-1, and vv is a child of uu. A leaf of TT is a node with no children. For a node uVTu\in V_{T}, we denote by T(u)T(u) the subtree of TT rooted in uu (i.e., the tree consisting of all descendants of uu). The height of TT is maxu𝖫𝗏(u)\max_{u}\mathsf{Lv}(u).

Tree decompositions and treewidth (Robertson and Seymour, 1984). Given a graph GG, a tree-decomposition 𝖳𝗋𝖾𝖾(G)=(VT,ET)\mathsf{Tree}(G)=(V_{T},E_{T}) is a tree with the following properties.

  1. C1:

    VT={B1,,Bb: for all 1ib.BiV}V_{T}=\{B_{1},\dots,B_{b}:\text{ for all }1\leq i\leq b.\ B_{i}\subseteq V\} and BiVTBi=V\bigcup_{B_{i}\in V_{T}}B_{i}=V. That is, each node of 𝖳𝗋𝖾𝖾(G)\mathsf{Tree}(G) is a subset of nodes of GG, and each node of GG appears in some node of 𝖳𝗋𝖾𝖾(G)\mathsf{Tree}(G).

  2. C2:

    For all (u,v)E(u,v)\in E there exists BiVTB_{i}\in V_{T} such that u,vBiu,v\in B_{i}. That is, the endpoints of each edge of GG appear together in some node of 𝖳𝗋𝖾𝖾(G)\mathsf{Tree}(G).

  3. C3:

    For all BiB_{i}, BjB_{j} and any bag BkB_{k} that appears in the simple path BiBjB_{i}\rightsquigarrow B_{j} in 𝖳𝗋𝖾𝖾(G)\mathsf{Tree}(G), we have BiBjBkB_{i}\cap B_{j}\subseteq B_{k}. That is, every node of GG is contained in a contiguous subtree of 𝖳𝗋𝖾𝖾(G)\mathsf{Tree}(G).

To distinguish between the nodes of GG and the nodes of 𝖳𝗋𝖾𝖾(G)\mathsf{Tree}(G), the sets BiB_{i} are called bags. The width of a tree-decomposition 𝖳𝗋𝖾𝖾(G)\mathsf{Tree}(G) is the size of the largest bag minus 1 and the treewidth of GG is the width of a minimum-width tree decomposition of GG. It follows from the definition that if GG has constant treewidth, then m=O(n)m=O(n). For a node uVu\in V, we say that a bag BB is the root bag of uu if BB is the bag with the smallest level among all bags that contain uu, i.e., Bu=argminBVT:uB𝖫𝗏(B)B_{u}=\arg\min_{B\in V_{T}:\leavevmode\nobreak\ u\in B}\mathsf{Lv}\left(B\right). By definition, there is exactly one root bag for each node uu. We often write BuB_{u} for the root bag of node uu, and denote by 𝖫𝗏(u)=𝖫𝗏(Bu)\mathsf{Lv}(u)=\mathsf{Lv}\left(B_{u}\right). Additionally, we denote by B(u,v)B_{(u,v)} the bag of the largest level that is the root bag of one of uu, vv. The following well-known theorem states that tree decompositions of constant-treewidth graphs can be constructed efficiently.

Theorem 5.1 ((Bodlaender and Hagerup, 1995)).

Given a graph G=(V,E)G=(V,E) of nn nodes and treewidth t=O(1)t=O(1), a tree decomposition 𝖳𝗋𝖾𝖾(G)\mathsf{Tree}(G) of O(n)O(n) bags, height O(logn)O(\log n) and width O(t)=O(1)O(t)=O(1) can be constructed in O(n)O(n) time.

The following crucial lemma states the key property of tree decompositions that we exploit in this work towards fast algorithms for Dyck reachability. Intuitively, every bag of a tree decomposition 𝖳𝗋𝖾𝖾(G)\mathsf{Tree}(G) acts as a separator of the graph GG.

Lemma 5.2 ((Bodlaender, 1998, Lemma 3)).

Consider a graph G=(V,E)G=(V,E), a tree-decomposition T=𝖳𝗋𝖾𝖾(G)T=\mathsf{Tree}(G), and a bag BB of TT. Let (𝒞i)i(\mathcal{C}_{i})_{i} be the components of TT created by removing BB from TT, and let ViV_{i} be the set of nodes that appear in bags of component 𝒞i\mathcal{C}_{i}. For every iji\neq j, nodes uViu\in V_{i}, vVjv\in V_{j} and path P:uvP:u\rightsquigarrow v, we have that PBP\cap B\neq\emptyset (i.e., all paths between uu and vv go through some node in BB).

Program-valid treewidth. Let G=(V,E)G=(V,E) be a Σk\Sigma_{k}-labeled program-valid graph, and 𝒱={V1,,V}\mathcal{V}=\{V_{1},\dots,V_{\ell}\} a program-valid partitioning of GG. For each 1i1\leq i\leq\ell, let Gi=(Vi,Ei)=G[Vi]G_{i}=(V_{i},E_{i})=G[V_{i}]. We define the graph Gi=(Vi,Ei)G^{\prime}_{i}=(V_{i},E^{\prime}_{i}) such that

Ei=Ei1jk(Vc(αj)Vi)×(Vr(α¯j)Vi)E^{\prime}_{i}=E_{i}\bigcup_{1\leq j\leq k}{\left(V_{c}(\alpha_{j})\cap V_{i}\right)\times\left(V_{r}(\overline{\alpha}_{j})\cap V_{i}\right)}

and call GiG^{\prime}_{i} the maximal graph of GiG_{i}. In words, the graph GiG^{\prime}_{i} is identical to GiG_{i}, with the exception that GiG^{\prime}_{i} contains an extra edge for every pair of nodes u,vViu,v\in V_{i} such that uu has opening-parenthesis-labeled outgoing edges, and vv has closing-parenthesis-labeled incoming edges. We define the treewidth of 𝒱\mathcal{V} to be the smallest integer tt such that the treewidth of each GiG^{\prime}_{i} is at most tt. We define the width of the pair (G,𝒱)(G,\mathcal{V}) as the treewidth of 𝒱\mathcal{V}, and the program-valid treewidth of GG to be the smallest treewidth among its program-valid partitionings.

The Library/Client Dyck reachability problem on program-valid graphs. Here we define the algorithmic problem that we solve in this section. Let G=(V,E)G=(V,E) be a Σk\Sigma_{k}-labeled, program-valid graph and 𝒱\mathcal{V} a program-valid partitioning of GG that has constant treewidth (kk need not be constant). The set 𝒱\mathcal{V} is further partitioned into two sets, 𝒱1\mathcal{V}^{1} and 𝒱2\mathcal{V}^{2} that correspond to the library and client partitions, respectively. We let V1=Vi𝒱1ViV^{1}=\bigcup_{V_{i}\in\mathcal{V}^{1}}V_{i} and V2=Vi𝒱2ViV^{2}=\bigcup_{V_{i}\in\mathcal{V}^{2}}V_{i}, and define the library graph G1=(V1,E1)=G[V1]G^{1}=(V^{1},E^{1})=G[V^{1}] and the client graph G2=(V2,E2)=G[V1]G^{2}=(V^{2},E^{2})=G[V^{1}].

The task is to answer Dyck reachability queries on GG, where the queries are either (i) single source queries from some uV2u\in V^{2}, or (ii) pair queries for some pair u,vV2u,v\in V^{2}. The computation takes place in two phases. In the preprocessing phase, only the library graph G1G^{1} is revealed, and we are allowed to some preprocessing to compute reachability summaries. In the query phase, the whole graph GG is revealed, and our task is to handle queries fast, by utilizing the preprocessing done on G1G^{1}.

1 if y%2=1y\%2=1 then
2  zx+yz\leftarrow x+y
3else
4  zxyz\leftarrow x\cdot y
5 end if
return zz
Algorithm 2 f1(x,y)f_{1}(x,y)
1 x2x\leftarrow 2
2 y2y\leftarrow 2
3 pf(x,y)p\leftarrow f(x,y)
return pp
Algorithm 3 g()g()
1 if x%2=1x\%2=1 then
2  z2xz\leftarrow 2\cdot x
3else
4  z2x+1z\leftarrow 2\cdot x+1
5 end if
return zz
Algorithm 4 f2(x,y)f_{2}(x,y)
112233441122334455ϕ\phi661122334455ϕ\phi66}1\}_{1}}1\}_{1}}2\}_{2}}2\}_{2}}1\}_{1}}2\}_{2}f1(x,y)f_{1}(x,y)f2(x,y)f_{2}(x,y)g()g()
Figure 6. Example of a library/client program and the corresponding program-valid data-dependence graph. The library consists of method g()g() which has a callback function f(x,y)f(x,y). The client implements f(x,y)f(x,y) either as f1(x,y)f_{1}(x,y) or f2(x,y)f_{2}(x,y). The parenthesis-labeled edge model context-sensitive dependencies on parameter passing and return. Depending on the implementation of ff, there is a data dependence of the variable pp on yy.

5.2. Library/Client Dyck Reachability on Program-valid Graphs

We are now ready to present our method for computing library summaries on program-valid graphs in order to speed up the client-side Dyck reachability. The approach is very similar to the work of (Chatterjee et al., 2015b) for data-flow analysis of recursive state machines.

Outline of our approach. Our approach consists of the following conceptual steps. We let the input graph G=(V,E)G=(V,E) be any program-valid graph of constant treewidth, with a partitioning of VV into the library component V1V^{1} and the client component V2V^{2}. Since GG is program-valid, it has a constant-treewidth, program-valid partitioning 𝒱\mathcal{V}, and we consider 𝒱1\mathcal{V}^{1} to be the restriction of 𝒱\mathcal{V} to the set V1V^{1}. Hence we have 𝒱1={V1,V}\mathcal{V}^{1}=\{V_{1},\dots V_{\ell}\} be a program-valid partitioning of G[V1]G[V^{1}], which also has constant treewidth. Our approach consists of the following steps.

  1. (1)

    We construct a local graph Gi=(Vi,Ei)G_{i}=(V_{i},E_{i}) and the corresponding maximal local graph Gi=(Vi,Ei)G^{\prime}_{i}=(V_{i},E^{\prime}_{i}) for each Vi𝒱V_{i}\in\mathcal{V}. Recall that GiG^{\prime}_{i} is a conventional graph, since, by definition, EiE^{\prime}_{i} contains only ϵ\epsilon-labeled edges. Since 𝒱\mathcal{V} has constant treewidth, each graph GiG^{\prime}_{i} has constant treewidth, and we construct a tree decomposition 𝖳𝗋𝖾𝖾(Gi)\mathsf{Tree}(G^{\prime}_{i}).

  2. (2)

    We exploit the constant-treewidth property of each GiG^{\prime}_{i} to build a data structure 𝒟\mathcal{D} which supports the following two operations: (i) Querying whether a node vv is reachable from a node uu in GiG^{\prime}_{i}, and (ii) Updating GiG_{i} by inserting a new edge (x,y)(x,y). Moreover, each such operation is fast, i.e., it is performed in O(logni)O(\log n_{i}) time.

  3. (3)

    Recall that V1V^{1}, V2V^{2} are the library and client partitions of GG, respectively. In the preprocessing phase, we use the data structure 𝒟\mathcal{D} to preprocess G[V1]G[V^{1}] so that any pair of library nodes that is Dyck-reachable in G[V1]G[V^{1}] is discovered and can be queried fast. Hence this library-side reachability information serves as the summary on the library side.

  4. (4)

    In the query phase, we use 𝒟\mathcal{D} to process the whole graph GG, using the summaries computed in the preprocessing phase.

Step 1. Construction of the local graphs GiG_{i} and the tree decompositions. The local graphs GiG_{i} are extracted from G[V1]G[V^{1}] by means of its program-valid partitioning 𝒱1={V1,V}\mathcal{V}^{1}=\{V_{1},\dots V_{\ell}\}. We consider this partitioning as part of the input, since every local graph GiG_{i} in reality corresponds to a unique method of the input program represented by GG. Let ni=|Vi|n_{i}=|V_{i}|. The maximal local graphs Gi=(Vi,Ei)G^{\prime}_{i}=(V_{i},E^{\prime}_{i}) are constructed as defined in Section 5.1. Each tree decomposition 𝖳𝗋𝖾𝖾(Gi)\mathsf{Tree}(G^{\prime}_{i}) is constructed in O(ni)O(n_{i}) time using Theorem 5.1. Observe that since EiEiE_{i}\subseteq E^{\prime}_{i} (i.e., GiG_{i} is a subgraph of its maximal counterpart GiG^{\prime}_{i}), 𝖳𝗋𝖾𝖾(Gi)\mathsf{Tree}(G^{\prime}_{i}) is also a tree decomposition of GiG_{i}. We define 𝖳𝗋𝖾𝖾(Gi)=𝖳𝗋𝖾𝖾(Gi)\mathsf{Tree}(G_{i})=\mathsf{Tree}(G^{\prime}_{i}) for all 1i1\leq i\leq\ell.

Step 2. Description of the data structure 𝒟\mathcal{D}. Here we describe the data structure 𝒟\mathcal{D}, which is built for a conventional graph Gi=(Vi,Ei)G_{i}=(V_{i},E_{i}) (i.e., EiE_{i} has only ϵ\epsilon-labeled edges) and its tree decomposition 𝖳𝗋𝖾𝖾(Gi)\mathsf{Tree}(G_{i}). The purpose of 𝒟\mathcal{D} is to handle reachability queries on GiG_{i}. The data structure supports three operations, given in Algorithm 5, Algorithm 6 and Algorithm 7.

  1. (1)

    The 𝒟.𝖡𝗎𝗂𝗅𝖽\mathcal{D}.\mathsf{Build} (Algorithm 5) operation builds the data structure for GiG_{i}.

  2. (2)

    The 𝒟.𝖴𝗉𝖽𝖺𝗍𝖾\mathcal{D}.\mathsf{Update} (Algorithm 6) updates the graph GiG_{i} with a new edge (x,y)(x,y), provided that there exists a bag BB such that x,yBx,y\in B.

  3. (3)

    The 𝒟.𝖰𝗎𝖾𝗋𝗒\mathcal{D}.\mathsf{Query} (Algorithm 7) takes as input a pair of nodes x,yx,y and returns 𝖳𝗋𝗎𝖾\mathsf{True} iff yy is reachable from xx in GiG_{i}, considering all the update operations performed so far.

Input: A tree-decomposition 𝖳𝗋𝖾𝖾(Gi)\mathsf{Tree}(G_{i})
1 Traverse 𝖳𝗋𝖾𝖾(Gi)\mathsf{Tree}(G_{i}) bottom up
2 foreach encountered bag BB do
3  Construct the graph G(B)=(B,R(B))G(B)=(B,R(B))
4  Compute the transitive closure G(B)G^{*}(B)
5  foreach (u,v)B(u,v)\in B do
6     if uvu\rightsquigarrow v in G(B)G^{*}(B) then
7        Insert u,vu,v in RR
8    
9  end foreach
10 
11 end foreach
Algorithm 5 𝒟.𝖡𝗎𝗂𝗅𝖽\mathcal{D}.\mathsf{Build}
Input: A new edge (x,y)(x,y)
1 Traverse 𝖳𝗋𝖾𝖾(G)\mathsf{Tree}(G) from B(u,v)B_{(u,v)} to the root
2 foreach encountered bag BB do
3  Construct the graph G(B)=(B,R(B))G(B)=(B,R(B))
4  Compute the transitive closure G(B)G^{*}(B)
5  foreach u,vBu,v\in B do
6     if uvu\rightsquigarrow v in G(B)G^{*}(B) then
7        Insert (u,v)(u,v) in RR
8    
9  end foreach
10 
11 end foreach
Algorithm 6 𝒟.𝖴𝗉𝖽𝖺𝗍𝖾\mathcal{D}.\mathsf{Update}
Input: A pair of nodes x,yx,y
1 Let X{x},Y{y}X\leftarrow\{x\},Y\leftarrow\{y\}
2 Traverse 𝖳𝗋𝖾𝖾(G)\mathsf{Tree}(G) from BxB_{x} to the root
3 foreach encountered bag BB do
4  foreach u,vBu,v\in B do
5     if uXu\in X and (u,v)R(u,v)\in R then
6        Add vv to XX
7    
8  end foreach
9 
10 end foreach
11Traverse 𝖳𝗋𝖾𝖾(G)\mathsf{Tree}(G) from ByB_{y} to the root
12 foreach encountered bag BB do
13  foreach u,vBu,v\in B do
14     if vYv\in Y and (u,v)R(u,v)\in R then
15        Add uu to YY
16    
17  end foreach
18 
19 end foreach
return 𝖳𝗋𝗎𝖾\mathsf{True} iff XYX\cap Y\neq\emptyset
Algorithm 7 𝒟.𝖰𝗎𝖾𝗋𝗒\mathcal{D}.\mathsf{Query}
Input: Method graphs (Gi=(Vi,Ei))1i(G_{i}=(V_{i},E_{i}))_{1\leq i\leq\ell}
1 foreach 1i1\leq i\leq\ell do
2  Construct 𝖳𝗋𝖾𝖾(Gj)\mathsf{Tree}(G_{j})
3  Run 𝒟.𝖡𝗎𝗂𝗅𝖽\mathcal{D}.\mathsf{Build} on 𝖳𝗋𝖾𝖾(Gi)\mathsf{Tree}(G_{i})
4 end foreach
5𝖯𝗈𝗈𝗅{G1,G}\mathsf{Pool}\leftarrow\{G_{1},\dots G_{\ell}\}
6 while 𝖯𝗈𝗈𝗅\mathsf{Pool}\neq\emptyset do
7  Extract GjG_{j} from 𝖯𝗈𝗈𝗅\mathsf{Pool}
8  foreach uVjVe,vVjVxu\in V_{j}\cap V_{e},v\in V_{j}\cap V_{x} do
9     if 𝒟.𝖰𝗎𝖾𝗋𝗒(u,v)\mathcal{D}.\mathsf{Query}(u,v) then
10        foreach x,y:(x,u,αi),(v,y,αi)Ex,y:(x,u,\alpha_{i}),(v,y,\alpha_{i})\in E do
11           Let Gr=(Vr,Er)G_{r}=(V_{r},E_{r}) be the graph s.t. x,yVrx,y\in V_{r}
12           if not 𝒟.𝖰𝗎𝖾𝗋𝗒(x,y)\mathcal{D}.\mathsf{Query}(x,y) then
13              Run 𝒟.𝖴𝗉𝖽𝖺𝗍𝖾\mathcal{D}.\mathsf{Update} on 𝖳𝗋𝖾𝖾(Gr)\mathsf{Tree}(G_{r}) on (x,y)(x,y)
14              Insert GrG_{r} in 𝖯𝗈𝗈𝗅\mathsf{Pool}
15          
16        end foreach
17       
18    
19  end foreach
20 
21 end while
Algorithm 8 𝖯𝗋𝗈𝖼𝖾𝗌𝗌\mathsf{Process}

The reachability set RR. The data structure 𝒟\mathcal{D} is built by storing a reachability set RR between pairs of nodes. The set RR has the crucial property that it stores information only between pairs of nodes that appear in some bag of 𝖳𝗋𝖾𝖾(Gi)\mathsf{Tree}(G_{i}) together. That is, R=BB×BR=\subseteq\bigcup_{B}B\times B. Given a bag BB, we denote by R(B)R(B) the restriction of RR to the nodes of BB. The reachability set is stored as a collection of 2ini2\sum_{i}\cdot n_{i} sets RF(u)R^{F}(u) and RB(u)R^{B}(u), one for every node uViu\in V_{i}. In turn, the set RF(u)R^{F}(u) (resp. RB(u)R^{B}(u)) will store the nodes in BuB_{u} (recall that BuB_{u} is the root bag of node uu) for which it has been discovered that can be reached from uu (resp., that can reach uu). It follows directly from the definition of tree decompositions that if (u,v)Ei(u,v)\in E_{i} is an edge of GiG_{i} then uBvu\in B_{v} or vBuv\in B_{u}. Given a bag BB and nodes u,vBu,v\in B, querying whether (u,v)R(u,v)\in R reduces to testing whether vRF(u)v\in R^{F}(u) or uRB(v)u\in R^{B}(v). Similarly, inserting (u,v)(u,v) to RR reduces to inserting either vv to RF(u)R^{F}(u) (if vBuv\in B_{u}), or uu to RB(v)R^{B}(v) (if uBvu\in B_{v}).

Remark 4.

The map RR requires O(n)O(n) space. Since each GiG_{i} is a constant-treewidth graph, every insert and query operation on RR requires O(1)O(1) time.

Correctness and complexity of 𝒟\mathcal{D}. Here we establish the correctness and complexity of each operation of 𝒟\mathcal{D}.

It is rather straightforward to see that for every pair of nodes (u,v)R(u,v)\in R, we have that vv is reachable from uu. The following lemma states a kind of weak completeness: if vv is reachable from uu via a path of specific type, then (u,v)R(u,v)\in R. Although this is different from strong completeness, which would require that (u,v)R(u,v)\in R whenever vv is reachable from uu, it is sufficient for ensuring completeness of the 𝒟.𝖰𝗎𝖾𝗋𝗒\mathcal{D}.\mathsf{Query} algorithm.

Left-right-contained paths. We introduce the notion of left-right contained paths, which is crucial for stating the correctness of the data structure 𝒟\mathcal{D}. Given a bag BB of 𝖳𝗋𝖾𝖾(Gi)\mathsf{Tree}(G_{i}), we say that a path P:xyP:x\rightsquigarrow y is left-contained in BB if for every node wPw\in P, if wxw\neq x, we have that BwT(B)B_{w}\in T(B). Similarly, PP is right-contained in BB if for every node wPw\in P, if wyw\neq y, we have that BwT(B)B_{w}\in T(B). Finally, PP is left-right-contained in BB if it is both left-contained and right-contained in BB.

Lemma 5.3.

The data structure 𝒟\mathcal{D} maintains the following invariant. For every bag BB and pair of nodes u,vBu,v\in B, if there is a Puv:uvP_{u}^{v}:u\rightsquigarrow v which is left-right contained in BB, then after 𝒟.𝖡𝗎𝗂𝗅𝖽\mathcal{D}.\mathsf{Build} has processed BB, we have (u,v)R(u,v)\in R.

It is rather straightforward that at the end of 𝒟.𝖰𝗎𝖾𝗋𝗒\mathcal{D}.\mathsf{Query}, for every node wXw\in X (resp. wYw\in Y) we have that ww is reachable from xx (resp. yy is reachable from ww). This guarantees that if 𝒟.𝖰𝗎𝖾𝗋𝗒\mathcal{D}.\mathsf{Query} returns 𝖳𝗋𝗎𝖾\mathsf{True}, then yy is indeed reachable from xx, via some node wXYw\in X\cap Y (recall that the intersection is not empty, due to Line 7). The following two lemmas state completeness, namely that if yy is reachable from xx, then 𝒟.𝖰𝗎𝖾𝗋𝗒\mathcal{D}.\mathsf{Query} will return 𝖳𝗋𝗎𝖾\mathsf{True}, and the complexity of 𝒟\mathcal{D} operations.

Lemma 5.4.

On input x,yx,y, if yy is reachable from xx, then 𝒟.𝖰𝗎𝖾𝗋𝗒\mathcal{D}.\mathsf{Query} returns 𝖳𝗋𝗎𝖾\mathsf{True}.

Lemma 5.5.

𝒟.𝖡𝗎𝗂𝗅𝖽\mathcal{D}.\mathsf{Build} requires O(ni)O(n_{i}) time. Every call to 𝒟.𝖴𝗉𝖽𝖺𝗍𝖾\mathcal{D}.\mathsf{Update} and 𝒟.𝖰𝗎𝖾𝗋𝗒\mathcal{D}.\mathsf{Query} requires O(logni)O(\log n_{i}) time.

Step 3. Preprocessing the library graph G[V1]G[V^{1}]. Given the library subgraph G[V1]G[V^{1}] and one copy of the data structure 𝒟\mathcal{D} for each local graph GiG_{i} of G[V1]G[V^{1}], the preprocessing of the library graph is achieved via the algorithm 𝖯𝗋𝗈𝖼𝖾𝗌𝗌\mathsf{Process}, which is presented in Algorithm 8. In high level, 𝖯𝗋𝗈𝖼𝖾𝗌𝗌\mathsf{Process} initially builds the data structure 𝒟\mathcal{D} for each local graph GiG_{i} using 𝒟.𝖡𝗎𝗂𝗅𝖽\mathcal{D}.\mathsf{Build}. Afterwards, it iteratively uses 𝒟.𝖰𝗎𝖾𝗋𝗒\mathcal{D}.\mathsf{Query} to test whether there exists a local graph GjG_{j} and two nodes uVjVeu\in V_{j}\cap V_{e}, vVjVxv\in V_{j}\cap V_{x} such that vv is reachable from uu in GjG_{j}. If so, the algorithm iterates over all nodes x,yx,y such that (x,u,αi)E(x,u,\alpha_{i})\in E and (v,y,α¯i)E(v,y,\overline{\alpha}_{i})\in E, and uses a 𝒟.𝖰𝗎𝖾𝗋𝗒\mathcal{D}.\mathsf{Query} operation to test whether yy is reachable from xx in their respective local graph GrG_{r}. If not, then 𝖯𝗋𝗈𝖼𝖾𝗌𝗌\mathsf{Process} uses a 𝒟.𝖴𝗉𝖽𝖺𝗍𝖾\mathcal{D}.\mathsf{Update} operation to insert the edge x,yx,y in GrG_{r}. Since this new edge might affect the reachability relations among other nodes in VrV_{r}, the graph GrG_{r} is inserted in 𝖯𝗈𝗈𝗅\mathsf{Pool} for further processing. See Algorithm 8 for a formal description. The following two lemmas state the correctness and complexity of 𝖯𝗋𝗈𝖼𝖾𝗌𝗌\mathsf{Process}.

Lemma 5.6.

At the end of 𝖯𝗋𝗈𝖼𝖾𝗌𝗌\mathsf{Process}, for every graph Gi=(Vi,Ei)G_{i}=(V_{i},E_{i}) and pair of nodes u,vViu,v\in V_{i}, we have that vv is reachable from uu in G[V1]G[V^{1}] iff 𝒟.𝖰𝗎𝖾𝗋𝗒\mathcal{D}.\mathsf{Query} returns 𝖳𝗋𝗎𝖾\mathsf{True}.

Lemma 5.7.

Let n=inin=\sum_{i}n_{i}, and k1k_{1} be the number of labels appearing in EV1×V1×ΣkE\subseteq V^{1}\times V^{1}\times\Sigma_{k} (i.e., k1k_{1} is the number of call sites in G[V1]G[V^{1}]). 𝖯𝗋𝗈𝖼𝖾𝗌𝗌\mathsf{Process} requires O(n+k1logn)O(n+k_{1}\cdot\log n) time.

Step 4. Library/Client analysis. We are ready to describe the library summarization for Library/Client Dyck reachability. Let G=(V,E)G=(V,E) be the program-valid graph representing library and client code, and V1,V2V^{1},V^{2} a partitioning of VV to library and client nodes.

  1. (1)

    In the preprocessing phase, the algorithm 𝖯𝗋𝗈𝖼𝖾𝗌𝗌\mathsf{Process} is used to preprocess G[V1]G[V^{1}]. Note that since GG is a program valid graph, so is G[V1]G[V^{1}], hence 𝖯𝗋𝗈𝖼𝖾𝗌𝗌\mathsf{Process} can execute on G[V1]G[V^{1}]. The summaries created are in form of 𝒟.𝖴𝗉𝖽𝖺𝗍𝖾\mathcal{D}.\mathsf{Update} operations performed on edges (x,y)(x,y).

  2. (2)

    In the querying phase, the set V2V^{2} is revealed, and thus the whole of GG. Hence now 𝖯𝗋𝗈𝖼𝖾𝗌𝗌\mathsf{Process} processes GG, without using 𝒟.𝖡𝗎𝗂𝗅𝖽\mathcal{D}.\mathsf{Build} on the graphs GiG_{i} that correspond to library methods, as they have already been processed in step 1. Note that the graphs GiG_{i} that correspond to library methods are used for querying and updating.

It follows immediately from Lemma 5.6 that at the end of the second step, for every local graph Gi=(Vi,Ei)G_{i}=(V_{i},E_{i}) of the client graph, for every pair of nodes u,vViu,v\in V_{i}, vv is Dyck-reachable from uu in the program-valid graph GG if and only if 𝒟.𝖰𝗎𝖾𝗋𝗒\mathcal{D}.\mathsf{Query} returns 𝖳𝗋𝗎𝖾\mathsf{True} on input u,vu,v.

Now we turn our attention to complexity. Let n1=|V1|n_{1}=|V^{1}| and n2=|V2|n_{2}=|V^{2}|. By Lemma 5.7, the time spent for the first step is, O(n1+k1logn1)O(n_{1}+k_{1}\cdot\log n_{1}), and the time spent for the second step is O(n2+k1logn1+k2logn2)O(n_{2}+k_{1}\cdot\log n_{1}+k_{2}\cdot\log n_{2}).

Constant-time queries. Recall that our task is to support O(1)O(1)-time queries about the Dyck reachability of pairs of nodes on the client subgraph G[V2]G[V^{2}]. As Lemma 5.6 shows, after 𝖯𝗋𝗈𝖼𝖾𝗌𝗌\mathsf{Process} has finished, each such query costs O(logn2)O(\log n_{2}) time. We use existing results for reachability queries on constant-treewidth graphs (Chatterjee et al., 2016b, Theorem 6) which allow us to reduce the query time to O(1)O(1), while spending O(n2)O(n_{2}) time in total to process all the graphs.

Theorem 5.8.

Consider a Σk\Sigma_{k}-labeled program-valid graph G=(V,E)G=(V,E) of constant program-valid treewidth, and the library and client subgraphs G1=(V1,E1)G^{1}=(V^{1},E^{1}) and G2=(V2,E2)G^{2}=(V^{2},E^{2}). For i{1,2}i\in\{1,2\} let ni=|Vi|n_{i}=|V^{i}| be the number of nodes, and kik_{i} be the number of call sites in each graph GiG^{i}, with k1+k2=kk_{1}+k_{2}=k. The algorithm 𝖣𝗒𝗇𝖺𝗆𝗂𝖼𝖣𝗒𝖼𝗄\mathsf{DynamicDyck} requires

  1. (1)

    O(n1+k1logn1)O(n_{1}+k_{1}\cdot\log n_{1}) time and O(n1)O(n_{1}) space in the preprocessing phase, and

  2. (2)

    O(n2+k1logn1+k2logn2)O(n_{2}+k_{1}\cdot\log n_{1}+k_{2}\cdot\log n_{2}) time and O(n1+n2)O(n_{1}+n_{2}) space in the query phase,

after which pair reachability queries are handled in O(1)O(1) time.

6. Experimental Results

In this section we report on experimental results obtained for the problems of (i) alias analysis via points-to analysis on SPGs, and (ii) library/client data-dependence analysis.

6.1. Alias Analysis

Implementation. We have implemented our algorithm 𝖡𝗂𝖽𝗂𝗋𝖾𝖼𝗍𝖾𝖽𝖱𝖾𝖺𝖼𝗁\mathsf{BidirectedReach} in C++ and evaluated its performance in performing Dyck reachability on bidirected graphs. The algorithm is implemented as presented in Section 3, together with the preprocessing step that handles the ϵ\epsilon-labeled edges. Besides common coding practices we have performed no engineering optimizations. We have also implemented (Zhang et al., 2013, Algorithm 2), including the Fast-Doubly-Linked-List (FDLL), which was previously shown to be very efficient in practice.

Experimental setup. In our experimental setup we used the DaCapo-2006-10-MR2 suit (Blackburn, 2006), which contains 11 real-world benchmarks. We used the tool reported in (Yan et al., 2011b) to extract the Symbolic Points-to Graphs (SPGs), which in turn uses Soot (Vallée-Rai et al., 1999) to process input Java programs. Our approach is similar to the one reported in (Xu et al., 2009; Yan et al., 2011b; Zhang et al., 2013). The outputs of the two compared methods were verified to ensure validity of the results. No compiler optimizations were used. All experiments were run on a Windows-based laptop with an Intel Core i7-5500U 2.402.40 GHz CPU and 1616 GB of memory, without any compiler optimizations.

SPGs and points-to analysis. For the sake of completeness, we outline the construction of SPGs and the reachability relation they define. A more detailed exposition can be found in (Xu et al., 2009; Yan et al., 2011b; Zhang et al., 2013). An SPG is a graph, the node set of which consists of the following three subsets: (i) variable nodes 𝒱\mathcal{V} that represent variables in the program, (ii) allocation nodes 𝒪\mathcal{O} that represent objects constructed with the new expression, and (iii) symbolic nodes 𝒮\mathcal{S} that represent abstract heap objects. Similarly, there are three types of edges, as follows, where 𝖥𝗂𝖾𝗅𝖽𝗌={fi}1ik\mathsf{Fields}=\{f_{i}\}_{1\leq i\leq k} denotes the set of all fields of composite data types.

  1. (1)

    Edges of the form 𝒱×𝒪×{ϵ}\mathcal{V}\times\mathcal{O}\times\{\epsilon\} represent the objects that variables point to.

  2. (2)

    Edges of the form 𝒱×𝒮×{ϵ}\mathcal{V}\times\mathcal{S}\times\{\epsilon\} represent the abstract heap objects that variables point to.

  3. (3)

    Edges of the form (𝒪𝒮)×(𝒪𝒮)×𝖥𝗂𝖾𝗅𝖽𝗌\left(\mathcal{O}\cup\mathcal{S}\right)\times\left(\mathcal{O}\cup\mathcal{S}\right)\times\mathsf{Fields} represent the fields of objects that other objects point to.

We note that since we focus on context-insensitive points-to analysis, we have not included edges that model calling context in the definition of the SPG. Additionally, only the forward edges labeled with fif_{i} are defined explicitly, and the backwards edges labeled with f¯i\overline{f}_{i} are implicit, since the SPG is treated as bidirected. Memory aliasing between two objects o1,o2𝒮𝒪o_{1},o_{2}\in\mathcal{S}\cup\mathcal{O} occurs when there is a path o1o2o_{1}\rightsquigarrow o_{2}, such that every opening field access fif_{i} is properly matched by a closing field access f¯i\overline{f}_{i}. Hence the Dyck grammar is given by 𝒮𝒮𝒮|fi𝒮f¯i|ϵ\mathcal{S}\to\mathcal{S}\leavevmode\nobreak\ \mathcal{S}\leavevmode\nobreak\ |\leavevmode\nobreak\ f_{i}\leavevmode\nobreak\ \mathcal{S}\leavevmode\nobreak\ \overline{f}_{i}\leavevmode\nobreak\ |\leavevmode\nobreak\ \epsilon. This allows to infer the objects that variable nodes can point to via composite paths that go through many field assignments. See Figure 7 for a minimal example.

z.f=xz.f=xy=z.fy=z.fxxzzyyxxzzyyffff
Figure 7. A minimal program and its (bidirected) SPG. Circles and squares represent variable nodes and object nodes, respectively. Only forward edges are shown.

Analysis of results. The running times of the compared algorithms are shown in Table 3 We can see that the algorithm proposed in this work is much faster than the existing algorithm of (Zhang et al., 2013) in all benchmarks. The highest speedup is achieved in benchmark luindex, where our algorithm is 13x times faster. We also see that all times are overall small.

Table 3. Comparison between our algorithm and the existing from  (Zhang et al., 2013). The first three columns contain the number of fields (Dyck parenthesis), nodes and edges in the SPG of each benchmark. The last two columns contain the running times, in seconds.
Benchmark Fields Nodes Edges Our Algorithm Existing Algorithm
antlr 172 13708 23547 0.428783 1.34152
bloat 316 43671 103361 17.7888 34.6012
chart 711 53500 91869 8.99378 34.9101
eclipse 439 34594 52011 3.62835 12.7697
fop 1064 101507 178338 42.5447 148.034
hsqldb 43 3048 4134 0.012899 0.073863
jython 338 56336 167040 40.239 55.3311
luindex 167 9931 14671 0.068013 0.636346
lusearch 200 12837 21010 0.163561 1.12788
pmd 357 31648 58025 2.21662 8.92306
xalan 41 2342 2979 0.006626 0.045144

6.2. Library/Client Data-dependence Analysis

Implementation. We have implemented our algorithm 𝖣𝗒𝗇𝖺𝗆𝗂𝖼𝖣𝗒𝖼𝗄\mathsf{DynamicDyck} in Java and evaluated its performance in performing Library/Client data-dependency analysis via Dyck reachability. Our algorithm is built on top of Wala (Wal, 2003), and is implemented as presented in Section 5. Besides common coding practices we have performed no engineering optimizations. We used the LibTW library (van Dijk et al., 2006) for computing the tree decompositions of the input graphs, under the greedy degree heuristic.

Experimental setup. We have used the tool of (Tang et al., 2015) for obtaining the data-dependence graphs of Java programs. In turn, that tool uses Wala (Wal, 2003) to build the graphs, and specifies the parts of the graph that correspond to library and client code. Java programs are suitable for Library/Code analysis, since the ubiquitous presence of callback functions makes the library and client code interdependent, so that the two sides cannot be analyzed in isolation. Our algorithm was compared with the TAL reachability and CFL reachability approach, as already implemented in (Tang et al., 2015). The comparison was performed in terms of running time and memory usage, first for the analysis of library code to produce the summaries, and then for the analysis of the library summaries with the client code. The outputs of all three methods were compared to ensure validity of the results. The measurements for our algorithm include the time and memory used for computing the tree decompositions. All experiments were run on a Windows-based laptop with an Intel Core i7-5500U 2.402.40 GHz CPU and 1616 GB of memory, without any compiler optimizations.

Benchmarks. Our benchmark suit is similar to that of (Tang et al., 2015), consisting of 1212 Java programs from SPECjvm2008 (SPE, 2008), together with 44 randomly chosen programs from GitHub (Git, 2008). We note that as reported in (Tang et al., 2015), they are unable to handle the benchmark serial from SPECjvm2008, due to out-of-memory issues when preprocessing the library (recall that the space bound for TAL reachability is O(n4)O(n^{4})). In contrast, our algorithm handles serial easily, and is thus included in the experiments.

Analysis of results. Our experimental comparison is depicted in Table 4 for running time and Table 5 for memory usage We briefly discuss our findings.

Treewidth. First, we comment on the treewidth of the obtained data-dependence graphs, which is reported on Table 4 and Table 5. Recall that our interest is not on the treewidth of the whole data-dependence graph, but on the treewidth of its program-valid partitioning, which yields a subgraph for each method of the input program. In each line of the tables we report the maximum treewidth of each benchmark, i.e. the maximum treewidth over the subgraphs of its program-valid partitioning. We see that the treewidth is typically very small (i.e., in most cases it is 55 or 66) in both library and client code. One exception is the client of mpegaudio, which has large treewidth. Observe that even this corner case of large treewidth was easily handled by our algorithm.

Time. Table 4 shows the time spent by each algorithm for analyzing library and client code separately. We first focus on total time, taken as the sum of the times spent by each algorithm in the library and client graph of each benchmark. We see that in every benchmark, our algorithm significantly outperforms both TAL and CFL reachability, reaching a 10x-speedup compared to TAL (in mpegaudio), and 5x-speedup compared to CFL reachability (in helloworld). Note that the benchmark serial is missing from the figure, as TAL reachability runs out of memory. The benchmark is found on Table 4, where our algorithm achieves a 630x-speedup compared to CFL reachability.

We now turn our attention to the trade-off between library preprocessing and client querying times. Here, the advantage of TAL over CFL reachability is present for handling client code. However, even for client code our algorithm is faster than TAL in all cases except one, and reaches even a 30x-speedup over TAL (in sunflow). Finally, observe that in all cases, the total running time of our algorithm on library and client code combined is much smaller than each of the other methods on library code alone.

Memory. Table 5 compares the total memory used for analyzing library and client code. We see that our algorithm significantly outperforms both TAL and CFL reachability in all benchmarks. Again, TAL uses more memory that CFL in the preprocessing of libraries, but less memory when analyzing client code. However, our algorithm uses even less memory than TAL in all benchmarks. The best performance gain is achieved in serial, where TAL runs out of memory after having consumed more than 1212 GB. For the same benchmark, CFL reachability uses more than 4.34.3 GB. In contrast, our algorithm uses only 130130 MB, thus achieving a 33x-improvement over CFL, and at least a 90x-improvement over TAL. We stress that for memory usage, these are tremendous gains. Finally, observe that for each benchmark, the maximum memory used by our algorithm for analyzing library and client code is smaller than the minimum memory used between library and client, by each of the other two methods.

Improvement independent of callbacks. We note that in contrast to TAL reachability, the improvements of our algorithm are not restricted to the presence of callbacks. Indeed, the algorithms introduced here significantly outperform the CFL approach even in the presence of no callbacks. This is evident from Table 4, which shows that our algorithm processes the library graphs much faster than both CFL and TAL reachability.

Table 4. Running time of our algorithm vs the TAL and CFL approach for data-dependence analysis with library summarization. Times are in milliseconds. MEM-OUT indicates that the algorithm run out of memory. The number of nodes and treewidth reflects the average and maximum case, respectively, among all methods in each benchmark.
Nodes TW Our Algorithm TAL CFL
Benchmark Lib. Cl. Lib. Cl. Lib. Cl. Lib. Cl. Li. Cl.
helloworld 16003 296 5 3 229 5 1044 31 855 578
check 16604 3347 5 4 228 54 1062 72 821 620
compiler 16190 536 5 3 248 11 995 57 876 572
sample 3941 28 4 1 86 1 258 14 368 113
crypto 20094 3216 5 5 273 66 1451 196 961 776
derby 23407 1106 6 3 389 22 1301 83 1003 1100
mpegaudio 28917 27576 5 24 204 177 5358 253 1864 1586
xml 71474 2312 5 3 489 115 5492 100 1891 2570
mushroom 3858 7 4 1 86 1 230 14 349 124
btree 6710 1103 4 4 144 34 583 111 571 197
startup 19312 621 5 3 279 17 1651 110 1087 946
sunflow 15615 85 5 2 217 1 1073 31 811 549
compress 16157 1483 5 3 240 23 1119 112 783 999
parser 7856 112 4 1 172 3 443 21 572 241
scimark 16270 2027 5 5 220 34 1004 70 805 595
serial 69999 468 8 3 440 9 MEM-OUT MEM-OUT 117147 165958
Table 5. Memory usage of our algorithm vs the TAL and CFL approach for data-dependence analysis with library summarization. Memory usage is in Megabytes. MEM-OUT indicates that the algorithm run out of memory. The number of nodes and treewidth reflects the average and maximum case, respectively, among all methods in each benchmark.
Nodes TW Our Algorithm TAL CFL
Benchmark Lib. Cl. Lib. Cl. Lib. Cl. Lib. Cl. Li. Cl.
helloworld 16003 296 5 3 31 27 321 44 104 126
check 16604 3347 5 4 34 31 336 89 132 184
compiler 16190 536 5 3 31 28 329 44 108 137
sample 3941 28 4 1 19 16 232 59 59 64
crypto 20094 3216 5 5 45 45 261 61 127 188
derby 23407 1106 6 3 46 41 600 88 204 265
mpegaudio 28917 27576 5 24 96 96 516 219 262 397
xml 71474 2312 5 3 108 108 463 153 373 480
mushroom 3858 7 4 1 19 16 230 59 58 58
btree 6710 1103 4 4 22 19 308 65 72 89
startup 19312 621 5 3 66 66 345 92 178 230
sunflow 15615 85 5 2 30 27 315 43 102 124
compress 16157 1483 5 3 32 29 338 50 105 131
parser 7856 112 4 1 22 19 320 64 73 83
scimark 16270 2027 5 5 32 29 134 49 106 140
serial 69999 468 8 3 130 130 MEM-OUT MEM-OUT 3964 4314

7. Conclusion

In this work we consider Dyck reachability problems for alias and data-dependence analysis. For alias analysis, bidirected graphs are natural, for which we present improved upper bounds, and present matching lower bounds to show our algorithm is optimal. For data-dependence analysis, we exploit constant treewidth property to present almost-linear time algorithm. We also show that for general graphs Dyck reachability bounds cannot be improved without achieving a major breakthrough.

Acknowledgements.
The research was partly supported by Austrian Science Fund (FWF) Grant No P23499- N23, FWF NFN Grant No S11407-N23 (RiSE/SHiNE), and ERC Start grant (279307: Graph Games).

References

  • (1)
  • Wal (2003) 2003. T. J. Watson Libraries for Analysis (WALA). https://github.com.
  • Git (2008) 2008. GitHub Home. https://github.com.
  • SPE (2008) 2008. SPECjvm2008 Benchmark Suit. http://www.spec.org/jvm2008/.
  • Abboud and Vassilevska Williams (2014) Amir Abboud and Virginia Vassilevska Williams. 2014. Popular Conjectures Imply Strong Lower Bounds for Dynamic Problems. In FOCS. 434–443.
  • Arnborg and Proskurowski (1989) Stefan Arnborg and Andrzej Proskurowski. 1989. Linear time algorithms for NP-hard problems restricted to partial k-trees . Discrete Appl Math (1989).
  • Arnold (1996) Robert S. Arnold. 1996. Software Change Impact Analysis. IEEE Computer Society Press, Los Alamitos, CA, USA.
  • Banachowski (1980) Lech Banachowski. 1980. A complement to Tarjan’s result about the lower bound on the complexity of the set union problem. Inform. Process. Lett. 11, 2 (1980), 59 – 65.
  • Bern et al. (1987) M.W Bern, E.L Lawler, and A.L Wong. 1987. Linear-time computation of optimal subgraphs of decomposable graphs. J Algorithm (1987).
  • Bertele and Brioschi (1972) Umberto Bertele and Francesco Brioschi. 1972. Nonserial Dynamic Programming. Academic Press, Inc., Orlando, FL, USA.
  • Blackburn (2006) Stephen M. et al. Blackburn. 2006. The DaCapo Benchmarks: Java Benchmarking Development and Analysis. In OOPSLA.
  • Bodden (2012) Eric Bodden. 2012. Inter-procedural Data-flow Analysis with IFDS/IDE and Soot. In SOAP. ACM, New York, NY, USA.
  • Bodlaender and Hagerup (1995) HansL. Bodlaender and Torben Hagerup. 1995. Parallel algorithms with optimal speedup for bounded treewidth. Vol. 27. 1725–1746.
  • Bodlaender (1988) Hans L. Bodlaender. 1988. Dynamic programming on graphs with bounded treewidth. In ICALP. Springer.
  • Bodlaender (1998) Hans L. Bodlaender. 1998. A partial k-arboretum of graphs with bounded treewidth. TCS (1998).
  • Chatterjee et al. (2017) Krishnendu Chatterjee, Bhavya Choudhary, and Andreas Pavlogiannis. 2017. Optimal Dyck Reachability for Data-dependence and Alias Analysis. Technical Report. IST Austria. https://repository.ist.ac.at/id/eprint/870
  • Chatterjee et al. (2016a) Krishnendu Chatterjee, Amir Kafshdar Goharshady, Rasmus Ibsen-Jensen, and Andreas Pavlogiannis. 2016a. Algorithms for algebraic path properties in concurrent systems of constant treewidth components. In POPL. 733–747.
  • Chatterjee et al. (2015b) Krishnendu Chatterjee, Rasmus Ibsen-Jensen, Prateesh Goyal, and Andreas Pavlogiannis. 2015b. Faster Algorithms for Algebraic Path Properties in Recursive State Machines with Constant Treewidth. In POPL.
  • Chatterjee et al. (2015a) Krishnendu Chatterjee, Rasmus Ibsen-Jensen, and Andreas Pavlogiannis. 2015a. Faster Algorithms for Quantitative Verification in Constant Treewidth Graphs. In CAV.
  • Chatterjee et al. (2016b) Krishnendu Chatterjee, Rasmus Ibsen-Jensen, and Andreas Pavlogiannis. 2016b. Optimal Reachability and a Space-Time Tradeoff for Distance Queries in Constant-Treewidth Graphs. In 24th Annual European Symposium on Algorithms, ESA 2016, August 22-24, 2016, Aarhus, Denmark. 28:1–28:17.
  • Chaudhuri (2008) Swarat Chaudhuri. 2008. Subcubic Algorithms for Recursive State Machines. In POPL. ACM, New York, NY, USA.
  • Chaudhuri and Zaroliagis (1995) Shiva Chaudhuri and Christos D. Zaroliagis. 1995. Shortest Paths in Digraphs of Small Treewidth. Part I: Sequential Algorithms. Algorithmica (1995).
  • Choi et al. (1993) Jong-Deok Choi, Michael Burke, and Paul Carini. 1993. Efficient Flow-sensitive Interprocedural Computation of Pointer-induced Aliases and Side Effects. In Proceedings of the 20th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’93). ACM, 232–245.
  • Cormen et al. (2001) T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein. 2001. Introduction To Algorithms. MIT Press.
  • Doyle and Rivest (1976) Jon Doyle and Ronald L. Rivest. 1976. Linear expected time of a simple union-find algorithm. Inform. Process. Lett. 5, 5 (1976), 146 – 148.
  • Gabow and Tarjan (1985) Harold N. Gabow and Robert Endre Tarjan. 1985. A linear-time algorithm for a special case of disjoint set union. J. Comput. System Sci. 30, 2 (1985), 209 – 221.
  • Galil and Italiano (1991) Zvi Galil and Giuseppe F. Italiano. 1991. Data Structures and Algorithms for Disjoint Set Union Problems. ACM Comput. Surv. 23, 3 (1991), 319–344.
  • Gustedt et al. (2002) Jens Gustedt, OleA. Mæhle, and JanArne Telle. 2002. The Treewidth of Java Programs. In Algorithm Engineering and Experiments. Springer.
  • Heintze and McAllester (1997) Nevin Heintze and David McAllester. 1997. On the Cubic Bottleneck in Subtyping and Flow Analysis. In Proceedings of the 12th Annual IEEE Symposium on Logic in Computer Science (LICS ’97). IEEE Computer Society, Washington, DC, USA, 342–. http://dl.acm.org/citation.cfm?id=788019.788876
  • Henzinger et al. (2015) Monika Henzinger, Sebastian Krinninger, Danupon Nanongkai, and Thatchaphol Saranurak. 2015. Unifying and Strengthening Hardness for Dynamic Problems via the Online Matrix-Vector Multiplication Conjecture. In STOC. 21–30.
  • Hind (2001) Michael Hind. 2001. Pointer Analysis: Haven’T We Solved This Problem Yet?. In Proceedings of the 2001 ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (PASTE ’01). ACM, 54–61.
  • Horwitz (1997) Susan Horwitz. 1997. Precise Flow-insensitive May-alias Analysis is NP-hard. ACM Trans. Program. Lang. Syst. 19, 1 (1997), 1–6.
  • Kuck et al. (1981) D. J. Kuck, R. H. Kuhn, D. A. Padua, B. Leasure, and M. Wolfe. 1981. Dependence Graphs and Compiler Optimizations. In Proceedings of the 8th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL). 207–218.
  • Landi and Ryder (1992) William Landi and Barbara G. Ryder. 1992. A Safe Approximate Algorithm for Interprocedural Aliasing. In Proceedings of the ACM SIGPLAN 1992 Conference on Programming Language Design and Implementation (PLDI ’92). ACM, 235–248.
  • Le Gall (2014) François Le Gall. 2014. Powers of Tensors and Fast Matrix Multiplication. In Proceedings of the 39th International Symposium on Symbolic and Algebraic Computation (ISSAC). 296–303.
  • Lee (2002) Lillian Lee. 2002. Fast Context-free Grammar Parsing Requires Fast Boolean Matrix Multiplication. J. ACM 49, 1 (2002), 1–15.
  • Lhoták and Hendren (2006) Ondřej Lhoták and Laurie Hendren. 2006. Context-Sensitive Points-to Analysis: Is It Worth It?. In Proceedings of the 15th International Conference on Compiler Construction (CC). 47–64.
  • Palepu et al. (2017) Vijay Krishna Palepu, Guoqing Xu, and James A. Jones. 2017. Dynamic Dependence Summaries. ACM Trans. Softw. Eng. Methodol. 25, 4 (2017), 30:1–30:41.
  • Ramalingam (1994) G. Ramalingam. 1994. The Undecidability of Aliasing. ACM Trans. Program. Lang. Syst. 16, 5 (1994), 1467–1471.
  • Rehof and Fähndrich (2001) Jakob Rehof and Manuel Fähndrich. 2001. Type-base Flow Analysis: From Polymorphic Subtyping to CFL-reachability. In Proceedings of the 28th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL). 54–66.
  • Reps (1995) Thomas Reps. 1995. Shape Analysis As a Generalized Path Problem. In Proceedings of the 1995 ACM SIGPLAN Symposium on Partial Evaluation and Semantics-based Program Manipulation (PEPM ’95). ACM, 1–11.
  • Reps (1997) Thomas Reps. 1997. Program Analysis via Graph Reachability. In Proceedings of the 1997 International Symposium on Logic Programming (ILPS). 5–19.
  • Reps (2000) Thomas Reps. 2000. Undecidability of Context-sensitive Data-dependence Analysis. ACM Trans. Program. Lang. Syst. 22, 1 (2000), 162–186.
  • Reps et al. (1995) Thomas Reps, Susan Horwitz, and Mooly Sagiv. 1995. Precise Interprocedural Dataflow Analysis via Graph Reachability. In Proceedings of the 22Nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’95). ACM, 49–61.
  • Reps et al. (1994) Thomas Reps, Susan Horwitz, Mooly Sagiv, and Genevieve Rosay. 1994. Speeding Up Slicing. SIGSOFT Softw. Eng. Notes 19, 5 (1994), 11–20.
  • Robertson and Seymour (1984) Neil Robertson and P.D Seymour. 1984. Graph minors. III. Planar tree-width. Journal of Combinatorial Theory, Series B (1984).
  • Shang et al. (2012) Lei Shang, Xinwei Xie, and Jingling Xue. 2012. On-demand Dynamic Summary-based Points-to Analysis. In Proceedings of the Tenth International Symposium on Code Generation and Optimization (CGO ’12). ACM, 264–274.
  • Sridharan and Bodík (2006) Manu Sridharan and Rastislav Bodík. 2006. Refinement-based Context-sensitive Points-to Analysis for Java. SIGPLAN Not. 41, 6 (2006), 387–400.
  • Sridharan et al. (2013) Manu Sridharan, Satish Chandra, Julian Dolby, Stephen J. Fink, and Eran Yahav. 2013. Aliasing in Object-Oriented Programming. Chapter Alias Analysis for Object-oriented Programs, 196–232.
  • Sridharan et al. (2005) Manu Sridharan, Denis Gopan, Lexin Shan, and Rastislav Bodík. 2005. Demand-driven Points-to Analysis for Java. In Proceedings of the 20th Annual ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications (OOPSLA ’05). ACM, 59–76.
  • Strassen (1969) Volker Strassen. 1969. Gaussian Elimination is Not Optimal. Numer. Math. 13, 4 (1969), 354–356.
  • Tang et al. (2015) Hao Tang, Xiaoyin Wang, Lingming Zhang, Bing Xie, Lu Zhang, and Hong Mei. 2015. Summary-Based Context-Sensitive Data-Dependence Analysis in Presence of Callbacks. In Proceedings of the 42Nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL). 83–95.
  • Tarjan (1975) Robert Endre Tarjan. 1975. Efficiency of a Good But Not Linear Set Union Algorithm. J. ACM 22, 2 (April 1975), 215–225. https://doi.org/10.1145/321879.321884
  • Tarjan (1979) Robert Endre Tarjan. 1979. A class of algorithms which require nonlinear time to maintain disjoint sets. J. Comput. System Sci. 18, 2 (1979), 110 – 127.
  • Thorup (1998) Mikkel Thorup. 1998. All Structured Programs Have Small Tree Width and Good Register Allocation. Information and Computation (1998).
  • Vallée-Rai et al. (1999) Raja Vallée-Rai, Phong Co, Etienne Gagnon, Laurie Hendren, Patrick Lam, and Vijay Sundaresan. 1999. Soot - a Java bytecode optimization framework. In CASCON ’99. IBM Press.
  • van Dijk et al. (2006) Thomas van Dijk, Jan-Pieter van den Heuvel, and Wouter Slob. 2006. Computing treewidth with LibTW. Technical Report. University of Utrecht.
  • Vassilevska Williams and Williams (2010) Virginia Vassilevska Williams and Ryan Williams. 2010. Subcubic Equivalences between Path, Matrix and Triangle Problems. In FOCS. 645–654.
  • Xu et al. (2010) Guoqing Xu, Nick Mitchell, Matthew Arnold, Atanas Rountev, Edith Schonberg, and Gary Sevitsky. 2010. Finding Low-utility Data Structures. In Proceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). 174–186.
  • Xu et al. (2009) Guoqing Xu, Atanas Rountev, and Manu Sridharan. 2009. Scaling CFL-Reachability-Based Points-To Analysis Using Context-Sensitive Must-Not-Alias Analysis. Springer Berlin Heidelberg, 98–122.
  • Yan et al. (2011a) Dacong Yan, Guoqing Xu, and Atanas Rountev. 2011a. Demand-driven Context-sensitive Alias Analysis for Java. In Proceedings of the 2011 International Symposium on Software Testing and Analysis (ISSTA ’11). ACM, 155–165.
  • Yan et al. (2011b) Dacong Yan, Guoqing Xu, and Atanas Rountev. 2011b. Demand-driven Context-sensitive Alias Analysis for Java. In Proceedings of the 2011 International Symposium on Software Testing and Analysis (ISSTA). 155–165.
  • Yannakakis (1990) Mihalis Yannakakis. 1990. Graph-theoretic Methods in Database Theory. In Proceedings of the Ninth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). 230–242.
  • Yao (1985) Andrew C. Yao. 1985. On the Expected Performance of Path Compression Algorithms. SIAM J. Comput. 14, 1 (1985), 129–133.
  • Yuan and Eugster (2009) Hao Yuan and Patrick Eugster. 2009. An Efficient Algorithm for Solving the Dyck-CFL Reachability Problem on Trees. In Proceedings of the 18th European Symposium on Programming Languages and Systems: Held As Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009 (ESOP). 175–189.
  • Zhang et al. (2013) Qirun Zhang, Michael R. Lyu, Hao Yuan, and Zhendong Su. 2013. Fast Algorithms for Dyck-CFL-reachability with Applications to Alias Analysis (PLDI). ACM.
  • Zhang and Su (2017) Qirun Zhang and Zhendong Su. 2017. Context-sensitive Data-dependence Analysis via Linear Conjunctive Language Reachability. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL). 344–358.
  • Zheng and Rugina (2008) Xin Zheng and Radu Rugina. 2008. Demand-driven Alias Analysis for C. In Proceedings of the 35th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’08). ACM, 197–208.