This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Ad Hoc Table Retrieval using Semantic Similarity

Shuo Zhang University of Stavanger shuo.zhang@uis.no  and  Krisztian Balog University of Stavanger krisztian.balog@uis.no
(2018)
Abstract.

We introduce and address the problem of ad hoc table retrieval: answering a keyword query with a ranked list of tables. This task is not only interesting on its own account, but is also being used as a core component in many other table-based information access scenarios, such as table completion or table mining. The main novel contribution of this work is a method for performing semantic matching between queries and tables. Specifically, we (i) represent queries and tables in multiple semantic spaces (both discrete sparse and continuous dense vector representations) and (ii) introduce various similarity measures for matching those semantic representations. We consider all possible combinations of semantic representations and similarity measures and use these as features in a supervised learning model. Using a purpose-built test collection based on Wikipedia tables, we demonstrate significant and substantial improvements over a state-of-the-art baseline.

Table retrieval, table search, semantic matching, semantic representations, semantic similarity
journalyear: 2018copyright: iw3c2w3conference: The 2018 Web Conference; April 23–27, 2018; Lyon, Francebooktitle: WWW 2018: The 2018 Web Conference, April 23–27, 2018, Lyon, Francedoi: 10.1145/3178876.3186067isbn: 978-1-4503-5639-8/18/04ccs: Information systems Similarity measuresccs: Information systems Environment-specific retrievalccs: Information systems Learning to rank

1. Introduction

Tables are a powerful, versatile, and easy-to-use tool for organizing and working with data. Because of this, a massive number of tables can be found “out there,” on the Web or in Wikipedia, representing a vast and rich source of structured information. Recently, a growing body of work has begun to tap into utilizing the knowledge contained in tables. A wide and diverse range of tasks have been undertaken, including but not limited to (i) searching for tables (in response to a keyword query (Cafarella et al., 2008b; Cafarella et al., 2009; Venetis et al., 2011; Pimplikar and Sarawagi, 2012; Balakrishnan et al., 2015; Nguyen et al., 2015) or a seed table (Das Sarma et al., 2012)), (ii) extracting knowledge from tables (such as RDF triples (Munoz et al., 2014)), and (iii) augmenting tables (with new columns (Das Sarma et al., 2012; Cafarella et al., 2009; Lehmberg et al., 2015; Yakout et al., 2012; Bhagavatula et al., 2013; Zhang and Balog, 2017b), rows (Das Sarma et al., 2012; Yakout et al., 2012; Zhang and Balog, 2017b), cell values (Ahmadov et al., 2015), or links to entities (Bhagavatula et al., 2015)).

Searching for tables is an important problem on its own, in addition to being a core building block in many other table-related tasks. Yet, it has not received due attention, and especially not from an information retrieval perspective. This paper aims to fill that gap. We define the ad hoc table retrieval task as follows: given a keyword query, return a ranked list of tables from a table corpus that are relevant to the query. See Fig. 1 for an illustration.

Refer to caption
Figure 1. Ad hoc table retrieval: given a keyword query, the system returns a ranked list of tables.

It should be acknowledged that this task is not entirely new, in fact, it has been around for a while in the database community (also known there as relation ranking(Cafarella et al., 2008b; Cafarella et al., 2009; Venetis et al., 2011; Bhagavatula et al., 2013). However, public test collections and proper evaluation methodology are lacking, in addition to the need for better ranking techniques.

Tables can be ranked much like documents, by considering the words contained in them (Cafarella et al., 2008b; Cafarella et al., 2009; Pimplikar and Sarawagi, 2012). Ranking may be further improved by incorporating additional signals related to table quality. Intuitively, high quality tables are topically coherent; other indicators may be related to the pages that contain them (e.g., if they are linked by other pages (Bhagavatula et al., 2013)). However, a major limitation of prior approaches is that they only consider lexical matching between the contents of tables and queries. This gives rise to our main research objective: Can we move beyond lexical matching and improve table retrieval performance by incorporating semantic matching?

We consider two main kinds of semantic representations. One is based on concepts, such as entities and categories. Another is based on continuous vector representations of words and of entities (i.e., word and graph embeddings). We introduce a framework that handles matching in different semantic spaces in a uniform way, by modeling both the table and the query as sets of semantic vectors. We propose two general strategies (early and late fusion), yielding four different measures for computing the similarity between queries and tables based on their semantic representations.

As we have mentioned above, another key area where prior work has insufficiencies is evaluation. First, there is no publicly available test collection for this task. Second, evaluation has been performed using set-based metrics (counting the number of relevant tables in the top-kk results), which is a very rudimentary way of measuring retrieval effectiveness. We address this by developing a purpose-built test collection, comprising of 1.6M tables from Wikipedia, and a set of queries with graded relevance judgments. We establish a learning-to-rank baseline that encompasses a rich set of features from prior work, and outperforms the best approaches known in the literature. We show that the semantic matching methods we propose can substantially and significantly improve retrieval performance over this strong baseline.

In summary, this paper makes the following contributions:

  • We introduce and formalize the ad hoc table ranking task, and present both unsupervised and supervised baseline approaches (Sect. 2).

  • We present a set of novel semantic matching methods that go beyond lexical similarity (Sect. 3).

  • We develop a standard test collection for this task (Sect. 4) and demonstrate the effectiveness of our approaches (Sect. 5).

The test collection and the outputs of the reported methods are made available at https://github.com/iai-group/www2018-table.

2. Ad Hoc Table Retrieval

We formalize the ad hoc table retrieval task, explain what information is associated with a table, and introduce baseline methods.

2.1. Problem Statement

Given a keyword query qq, ad hoc table retrieval is the task of returning a ranked list of tables, (T1,,Tk)(T_{1},\dots,T_{k}), from a collection of tables CC. Being an ad hoc task, the relevance of each returned table TiT_{i} is assessed independently of all other returned tables Tj,ijT_{j},i\neq j. Hence, the ranking of tables boils down to the problem of assigning a score to each table in the corpus: 𝑠𝑐𝑜𝑟𝑒(q,T)\mathit{score}(q,T). Tables are then sorted in descending order of their scores.

2.2. The Anatomy of a Table

We shall assume that the following information is available for each table in the corpus; the letters refer to Figure 2.

  1. (a)

    Page title, where the table was extracted from.

  2. (b)

    Section title, i.e., the heading of the particular section where the table is embedded.

  3. (c)

    Table caption, providing a brief explanation.

  4. (d)

    Table headings, i.e., a list of column heading labels.

  5. (e)

    Table body, i.e., all table cells (including column headings).

Refer to caption
Figure 2. Table embedded in a Wikipedia page.

2.3. Unsupervised Ranking

An easy and straightforward way to perform the table ranking task is by adopting standard document ranking methods. Cafarella et al. (2009); Cafarella et al. (2008b) utilize web search engines to retrieve relevant documents; tables are then extracted from the highest-ranked documents. Rather than relying on external services, we represent tables as either single- or multi-field documents and apply standard documents retrieval techniques.

Table 1. Baseline features for table retrieval.
Query features Source Value
QLEN Number of query terms (Tyree et al., 2011) {1,…,n}
IDFf\mathrm{IDF}_{f} Sum of query IDF scores in field ff (Qin et al., 2010) [0,)[0,\infty)
Table features
#rows The number of rows in the table (Cafarella et al., 2008b; Bhagavatula et al., 2013) {1,…,n}
#cols The number of columns in the table (Cafarella et al., 2008b; Bhagavatula et al., 2013) {1,…,n}
#of NULLs in table The number of empty table cells (Cafarella et al., 2008b; Bhagavatula et al., 2013) {0,…,n}
PMI The ACSDb-based schema coherency score (Cafarella et al., 2008b) (,)(-\infty,\infty)
inLinks Number of in-links to the page embedding the table (Bhagavatula et al., 2013) {0,…,n}
outLinks Number of out-links from the page embedding the table (Bhagavatula et al., 2013) {0,…,n}
pageViews Number of page views (Bhagavatula et al., 2013) {0,…,n}
tableImportance Inverse of number of tables on the page (Bhagavatula et al., 2013) (0,1](0,1]
tablePageFraction Ratio of table size to page size (Bhagavatula et al., 2013) (0,1](0,1]
Query-table features
#hitsLC Total query term frequency in the leftmost column cells (Cafarella et al., 2008b) {0,…,n}
#hitsSLC Total query term frequency in second-to-leftmost column cells (Cafarella et al., 2008b) {0,…,n}
#hitsB Total query term frequency in the table body (Cafarella et al., 2008b) {0,…,n}
qInPgTitle Ratio of the number of query tokens found in page title to total number of tokens (Bhagavatula et al., 2013) [0,1][0,1]
qInTableTitle Ratio of the number of query tokens found in table title to total number of tokens (Bhagavatula et al., 2013) [0,1][0,1]
yRank Rank of the table’s Wikipedia page in Web search engine results for the query (Bhagavatula et al., 2013) {1,…,n}
MLM similarity Language modeling score between query and multi-field document repr. of the table (Chen et al., 2016) (-\infty,0)

2.3.1. Single-field Document Representation

In the simplest case, all text associated with a given table is used as the table’s representation. This representation is then scored using existing retrieval methods, such as BM25 or language models.

2.3.2. Multi-field Document Representation

Rather than collapsing all textual content into a single-field document, it may be organized into multiple fields, such as table caption, table headers, table body, etc. (cf. Sect. 2.2). For multi-field ranking, Pimplikar and Sarawagi (2012) employ a late fusion strategy (Zhang and Balog, 2017a). That is, each field is scored independently against the query, then a weighted sum of the field-level similarity scores is taken:

(1) 𝑠𝑐𝑜𝑟𝑒(q,T)=iwi×𝑠𝑐𝑜𝑟𝑒(q,fi),\mathit{score}(q,T)=\sum_{i}w_{i}\times\mathit{score}(q,f_{i})~,

where fif_{i} denotes the iith (document) field for table TT and wiw_{i} is the corresponding field weight (such that iwi=1\sum_{i}w_{i}=1). 𝑠𝑐𝑜𝑟𝑒(q,fi)\mathit{score}(q,f_{i}) may be computed using any standard retrieval method. We use language models in our experiments.

2.4. Supervised Ranking

The state-of-the-art in document retrieval (and in many other retrieval tasks) is to employ supervised learning (Liu, 2011). Features may be categorized into three groups: (i) document, (ii) query, and (iii) query-document features (Qin et al., 2010). Analogously, we distinguish between three types of features: (i) table, (ii) query, and (iii) query-table features. In Table 1, we summarize the features from previous work on table search (Cafarella et al., 2008b; Bhagavatula et al., 2013). We also include a number of additional features that have been used in other retrieval tasks, such as document and entity ranking; we do not regard these as novel contributions.

2.4.1. Query Features

Query features have been shown to improve retrieval performance for document ranking (Macdonald et al., 2012). We adopt two query features from document retrieval, namely, the number of terms in the query (Tyree et al., 2011), and query IDF (Qin et al., 2010) according to: 𝐼𝐷𝐹f(q)=tq𝐼𝐷𝐹f(t)\mathit{IDF}_{f}(q)=\sum_{t\in q}\mathit{IDF}_{f}(t), where 𝐼𝐷𝐹f(t)\mathit{IDF}_{f}(t) is the IDF score of term tt in field ff. This feature is computed for the following fields: page title, section title, table caption, table heading, table body, and “catch-all” (the concatenation of all textual content in the table).

2.4.2. Table Features

Table features depend only on the table itself and aim to reflect the quality of the given table (irrespective of the query). Some features are simple characteristics, like the number of rows, columns, and empty cells (Cafarella et al., 2008b; Bhagavatula et al., 2013). A table’s PMI is computed by calculating the PMI values between all pairs of column headings of that table, and then taking their average. Following (Cafarella et al., 2008b), we compute PMI by obtaining frequency statistics from the Attribute Correlation Statistics Database (ACSDb) (Cafarella et al., 2008a), which contains table heading information derived from millions of tables extracted from a large web crawl.

Another group of features has to do with the page that embeds the table, by considering its connectivity (inLinks and outLinks), popularity (pageViews), and the table’s importance within the page (tableImportance and tablePageFraction).

2.4.3. Query-Table Features

Features in the last group express the degree of matching between the query and a given table. This matching may be based on occurrences of query terms in the page title (qInPgTitle) or in the table caption (qInTableTitle). Alternatively, it may be based on specific parts of the table, such as the leftmost column (#hitsLC), second-to-left column (#hitsSLC), or table body (#hitsB). Tables are typically embedded in (web) pages. The rank at which a table’s parent page is retrieved by an external search engine is also used as a feature (yRank). (In our experiments, we use the Wikipedia search API to obtain this ranking.) Furthermore, we take the Mixture of Language Models (MLM) similarity score (Ogilvie and Callan, 2003) as a feature, which is actually the best performing method among the four text-based baseline methods (cf. Sect. 5). Importantly, all these features are based on lexical matching. Our goal in this paper is to also enable semantic matching; this is what we shall discuss in the next section.

Refer to caption
Figure 3. Our methods for computing query-table similarity using semantic representations.

3. Semantic Matching

This section presents our main contribution, which is a set of novel semantic matching methods for table retrieval. The main idea is to go beyond lexical matching by representing both queries and tables in some semantic space, and measuring the similarity of those semantic (vector) representations. Our approach consists of three main steps, which are illustrated in Figure 3. These are as follows (moving from outwards to inwards on the figure):

  1. (1)

    The “raw” content of a query/table is represented as a set of terms, where terms can be either words or entities (Sect. 3.1).

  2. (2)

    Each of the raw terms is mapped to a semantic vector representation (Sect. 3.2).

  3. (3)

    The semantic similarity (matching score) between a query-table pair is computed based on their semantic vector representations (Sect. 3.3).

We compute query-table similarity using all possible combinations of semantic representations and similarity measures, and use the resulting semantic similarity scores as features in a learning-to-rank approach. Table 2 summarizes these features.

3.1. Content Extraction

We represent the “raw” content of the query/table as a set of terms, where terms can be either words (string tokens) or entities (from a knowledge base). We denote these as {q1,,qn}\{q_{1},\dots,q_{n}\} and {t1,,tm}\{t_{1},\dots,t_{m}\} for query qq and table TT, respectively.

3.1.1. Word-based

It is a natural choice to simply use word tokens to represent query/table content. That is, {q1,,qn}\{q_{1},\dots,q_{n}\} is comprised of the unique words in the query. As for the table, we let {t1,,tm}\{t_{1},\dots,t_{m}\} contain all unique words from the title, caption, and headings of the table. Mind that at this stage we are only considering the presence/absence of words. During the query-table similarity matching, the importance of the words will also be taken into account (Sect. 3.3.1).

3.1.2. Entity-based

Many tables are focused on specific entities (Zhang and Balog, 2017b). Therefore, considering the entities contained in a table amounts to a meaningful representation of its content. We use the DBpedia knowledge base as our entity repository. Since we work with tables extracted from Wikipedia, the entity annotations are readily available (otherwise, entity annotations could be obtained automatically, see, e.g.,  (Venetis et al., 2011)). Importantly, instead of blindly including all entities mentioned in the table, we wish to focus on salient entities. It has been observed in prior work (Venetis et al., 2011; Bhagavatula et al., 2015) that tables often have a core column, containing mostly entities, while the rest of the columns contain properties of these entities (many of which are entities themselves). We write EccE_{cc} to denote the set of entities that are contained in the core column of the table, and describe our core column detection method in Sect. 3.1.3. In addition to the entities taken directly from the body part of the table, we also include entities that are related to the page title (TptT_{pt}) and to the table caption (TtcT_{tc}). We obtain those by using the page title and the table caption, respectively, to retrieve relevant entities from the knowledge base. We write Rk(s)R_{k}(s) to denote the set of top-kk entities retrieved for the query ss. We detail the entity ranking method in Sect. 3.1.4. Finally, the table is represented as the union of three sets of entities, originating from the core column, page title, and table caption: {t1,,tm}=EccRk(Tpt)Rk(Ttc)\{t_{1},\dots,t_{m}\}=E_{cc}\cup R_{k}(T_{pt})\cup R_{k}(T_{tc}).

To get an entity-based representation for the query, we issue the query against a knowledge base to retrieve relevant entities, using the same retrieval method as above. I.e., {q1,,qn}=Rk(q)\{q_{1},\dots,q_{n}\}=R_{k}(q).

3.1.3. Core Column Detection

We introduce a simple and effective core column detection method. It is based on the notion of column entity rate, which is defined as the ratio of cells in a column that contain an entity. We write 𝑐𝑒𝑟(Tc[j])\mathit{cer}(T_{c[j]}) to denote the column entity rate of column jj in table TT. Then, the index of the core column becomes: argmaxj=1..T|c|𝑐𝑒𝑟(Tc[j])\arg\max_{j=1..T_{|c|}}\mathit{cer}(T_{c[j]}), where T|c|T_{|c|} is the number of columns in TT.

3.1.4. Entity Retrieval

We employ a fielded entity representation with five fields (names, categories, attributes, similar entity names, and related entity names) and rank entities using the Mixture of Language Models approach (Ogilvie and Callan, 2003). The field weights are set uniformly. This corresponds to the MLM-all model in (Hasibi et al., 2017b) and is shown to be a solid baseline. We return the top-kk entities, where kk is set to 10.

Table 2. Semantic similarity features. Each row represents 4 features (one for each similarity matching method, cf. Table 3). All features are in [1,1][-1,1].
Features Semantic repr. Raw repr.
Entity_* Bag-of-entities entities
Category_* Bag-of-categories entities
Word_* Word embeddings words
Graph_* Graph embeddings entities

3.2. Semantic Representations

Next, we embed the query/table terms in a semantic space. That is, we map each table term tit_{i} to a vector representation ti\vec{t}_{i}, where ti[j]\vec{t}_{i}[j] refers to the jjth element of that vector. For queries, the process goes analogously. We discuss two main kinds of semantic spaces, bag-of-concepts and embeddings, with two alternatives within each. The former uses sparse and discrete, while the latter employs dense and continuous-valued vectors. A particularly nice property of our semantic matching framework is that it allows us to deal with these two different types of representations in a unified way.

3.2.1. Bag-of-concepts

One alternative for moving from the lexical to the semantic space is to represent tables/queries using specific concepts. In this work, we use entities and categories from a knowledge base. These two semantic spaces have been used in the past for various retrieval tasks, in duet with the traditional bag-of-words content representation. For example, entity-based representations have been used for document retrieval (Xiong et al., 2017; Raviv et al., 2016) and category-based representations have been used for entity retrieval (Balog et al., 2011). One important difference from previous work is that instead of representing the entire query/table using a single semantic vector, we map each individual query/table term to a separate semantic vector, thereby obtaining a richer representation.

We use the entity-based raw representation from the previous section, that is, tit_{i} and qjq_{j} are specific entities. Below, we explain how table terms tjt_{j} are projected to ti\vec{t_{i}}, which is a sparse discrete vector in the entity/category space; for query terms it follows analogously.

Bag-of-entities:

Each element in ti\vec{t}_{i} corresponds to a unique entity. Thus, the dimensionality of ti\vec{t}_{i} is the number of entities in the knowledge base (on the order of millions). ti[j]\vec{t}_{i}[j] has a value of 11 if entities ii and jj are related (there exists a link between them in the knowledge base), and 0 otherwise.

Bag-of-categories:

Each element in ti\vec{t}_{i} corresponds to a Wikipedia category. Thus, the dimensionality of ti\vec{t}_{i} amounts to the number of Wikipedia categories (on the order hundreds of thousands). The value of ti[j]\vec{t}_{i}[j] is 11 if entity ii is assigned to Wikipedia category jj, and 0 otherwise.

3.2.2. Embeddings

Recently, unsupervised representation learning methods have been proposed for obtaining embeddings that predict a distributional context, i.e., word embeddings (Mikolov et al., 2013; Pennington et al., 2014) or graph embeddings (Perozzi et al., 2014; Tang et al., 2015; Ristoski and Paulheim, 2016). Such vector representations have been utilized successfully in a range of IR tasks, including ad hoc retrieval (Ganguly et al., 2015; Mitra et al., 2016), contextual suggestion (Manotumruksa et al., 2016), cross-lingual IR (Vulić and Moens, 2015), community question answering (Zhou et al., 2015), short text similarity (Kenter and de Rijke, 2015), and sponsored search (Grbovic et al., 2015). We consider both word-based and entity-based raw representations from the previous section and use the corresponding (pre-trained) embeddings as follows.

Word embeddings:

We map each query/table word to a word embedding. Specifically, we use word2vec (Mikolov et al., 2013) with 300 dimensions, trained on Google News data.

Graph embeddings:

We map each query/table entity to a graph embedding. In particular, we use RDF2vec (Ristoski and Paulheim, 2016) with 200 dimensions, trained on DBpedia 2015-10.

3.3. Similarity Measures

The final step is concerned with the computation of the similarity between a query-table pair, based on the semantic vector representations we have obtained for them. We introduce two main strategies, which yield four specific similarity measures. These are summarized in Table 3.

3.3.1. Early Fusion

The first idea is to represent the query and the table each with a single vector. Their similarity can then simply be expressed as the similarity of the corresponding vectors. We let Cq\vec{C}_{q} be the centroid of the query term vectors (Cq=i=1nqi/n\vec{C}_{q}=\sum_{i=1}^{n}\vec{q}_{i}/n). Similarly, CT\vec{C}_{T} denotes the centroid of the table term vectors. The query-table similarity is then computed by taking the cosine similarity of the centroid vectors. When query/table content is represented in terms of words, we additionally make use of word importance by employing standard TF-IDF term weighting. Note that this only applies to word embeddings (as the other three semantic representations are based on entities). In case of word embeddings, the centroid vectors are calculated as CT=i=1mti×TFIDF(ti)\vec{C}_{T}=\sum_{i=1}^{m}\vec{t}_{i}\times TFIDF(t_{i}). The computation of Cq\vec{C}_{q} follows analogously.

Table 3. Similarity measures.
Measure Equation
Early cos(Cq,CT)\cos(\vec{C}_{q},\vec{C}_{T})
Late-max max({cos(qi,tj):i[1..n],j[1..m]})\mathrm{max}(\{\cos(\vec{q}_{i},\vec{t}_{j}):i\in[1..n],j\in[1..m]\})
Late-sum sum({cos(qi,tj):i[1..n],j[1..m]})\mathrm{sum}(\{\cos(\vec{q}_{i},\vec{t}_{j}):i\in[1..n],j\in[1..m]\})
Late-avg avg({cos(qi,tj):i[1..n],j[1..m]})\mathrm{avg}(\{\cos(\vec{q}_{i},\vec{t}_{j}):i\in[1..n],j\in[1..m]\})

3.3.2. Late Fusion

Instead of combining all semantic vectors qiq_{i} and tjt_{j} into a single one, late fusion computes the pairwise similarity between all query and table vectors first, and then aggregates those. We let SS be a set that holds all pairwise cosine similarity scores: S={cos(qi,tj):i[1..n],j[1..m]}S=\{\cos(\vec{q}_{i},\vec{t}_{j}):i\in[1..n],j\in[1..m]\}. The query-table similarity score is then computed as aggr(S)\mathrm{aggr}(S), where aggr()\mathrm{aggr}() is an aggregation function. Specifically, we use max()\max(), sum()\mathrm{sum}() and avg()\mathrm{avg}() as aggregators; see the last three rows in Table 3 for the equations.

4. Test Collection

We introduce our test collection, including the table corpus, test and development query sets, and the procedure used for obtaining relevance assessments.

4.1. Table Corpus

We use the WikiTables corpus (Bhagavatula et al., 2015), which comprises 1.6M tables extracted from Wikipedia (dump date: 2015 October). The following information is provided for each table: table caption, column headings, table body, (Wikipedia) page title, section title, and table statistics like number of headings rows, columns, and data rows. We further replace all links in the table body with entity identifiers from the DBpedia knowledge base (version 2015-10) as follows. For each cell that contains a hyperlink, we check if it points to an entity that is present in DBpedia. If yes, we use the DBpedia identifier of the linked entity as the cell’s content; otherwise, we replace the link with the anchor text, i.e., treat it as a string.

4.2. Queries

We sample a total of 60 test queries from two independent sources (30 from each): (1) Query subset 1 (QS-1): Cafarella et al. (2009) collected 51 queries from Web users via crowdsourcing (using Amazon’s Mechanical Turk platform, users were asked to suggest topics or supply URLs for a useful data table). (2) Query subset 2 (QS-2): Venetis et al. (2011) analyzed the query logs from Google Squared (a service in which users search for structured data) and constructed 100 queries, all of which are a combination of an instance class (e.g., “laptops”) and a property (e.g., “cpu”). Following (Bhagavatula et al., 2013), we concatenate the class and property fields into a single query string (e.g., “laptops cpu”). Table 4 lists some examples.

Table 4. Example queries from our query set.
Queries from (Cafarella et al., 2009) Queries from (Venetis et al., 2011)
video games asian coutries currency
us cities laptops cpu
kings of africa food calories
economy gdp guitars manufacturer
fifa world cup winners clothes brand

4.3. Relevance Assessments

We collect graded relevance assessments by employing three independent (trained) judges. For each query, we pool the top 20 results from five baseline methods (cf. Sect. 5.3), using default parameter settings. (Then, we train the parameters of those methods with help of the obtained relevance labels.) Each query-table pair is judged on a three point scale: 0 (non-relevant), 1 (somewhat relevant), and 2 (highly relevant). Annotators were situated in a scenario where they need to create a table on the topic of the query, and wish to find relevant tables that can aid them in completing that task. Specifically, they were given the following labeling guidelines: (i) a table is non-relevant if it is unclear what it is about (e.g., misses headings or caption) or is about a different topic; (ii) a table is relevant if some cells or values could be used from this table; and (iii) a table is highly relevant if large blocks or several values could be used from it when creating a new table on the query topic.

We take the majority vote as the relevance label; if no majority agreement is achieved, we take the average of the scores as the final label. To measure inter-annotator agreement, we compute the Kappa test statistics on test annotations, which is 0.47. According to (Fleiss et al., 1971), this is considered as moderate agreement. In total, 3120 query-table pairs are annotated as test data. Out of these, 377 are labeled as highly relevant, 474 as relevant, and 2269 as non-relevant.

5. Evaluation

Table 5. Table retrieval evaluation results.
Method NDCG@5 NDCG@10 NDCG@15 NDCG@20
Single-field document ranking 0.4315 0.4344 0.4586 0.5254
Multi-field document ranking 0.4770 0.4860 0.5170 0.5473
WebTable (Cafarella et al., 2008b) 0.2831 0.2992 0.3311 0.3726
WikiTable (Bhagavatula et al., 2013) 0.4903 0.4766 0.5062 0.5206
LTR baseline (this paper) 0.5527 0.5456 0.5738 0.6031
STR (this paper) 0.5951 0.6293 0.6590 0.6825

In this section, we list our research questions (Sect. 5.1), discuss our experimental setup (Sect. 5.2), introduce the baselines we compare against (Sect. 5.3), and present our results (Sect. 5.4) followed by further analysis (Sect. 5.5).

5.1. Research Questions

The research questions we seek to answer are as follows.

RQ1:

Can semantic matching improve retrieval performance?

RQ2:

Which of the semantic representations is the most effective?

RQ3:

Which of the similarity measures performs better?

5.2. Experimental Setup

We evaluate table retrieval performance in terms of Normalized Discounted Cumulative Gain (NDCG) at cut-off points 5, 10, 15, and 20. To test significance, we use a two-tailed paired t-test and write {\dagger}/{\ddagger} to denote significance at the 0.05 and 0.005 levels, respectively.

Our implementations are based on Nordlys (Hasibi et al., 2017a). Many of our features involve external sources, which we explain below. To compute the entity-related features (i.e., features in Table 1 as well as the features based on the bag-of-entities and bag-of-categories representations in Table 2), we use entities from the DBpedia knowledge base that have an abstract (4.6M in total). The table’s Wikipedia rank (yRank) is obtained using Wikipedia’s MediaWiki API. The PMI feature is estimated based on the ACSDb corpus (Cafarella et al., 2008a). For the distributed representations, we take pre-trained embedding vectors, as explained in Sect. 3.2.2.

Table 6. Comparison of semantic features, used in combination with baseline features (from Table 1), in terms of NDCG@20. Relative improvements are shown in parentheses. Statistical significance is tested against the LTR baseline in Table 5.
Sem. Repr. Early Late-max Late-sum Late-avg ALL
Bag-of-entities 0.6754 (+11.99%) 0.6407 (+6.23%) 0.6697 (+11.04%) 0.6733 (+11.64%) 0.6696 (+11.03%)
Bag-of-categories 0.6287 (+4.19%) 0.6245 (+3.55%) 0.6315 (+4.71%) 0.6240 (+3.47%) 0.6149 (+1.96%)
Word embeddings 0.6181 (+2.49%) 0.6328 (+4.92%) 0.6371 (+5.64%) 0.6485 (+7.53%) 0.6588 (+9.24%)
Graph embeddings 0.6326 (+4.89%) 0.6142 (+1.84%) 0.6223 (+3.18%) 0.6316 (+4.73%) 0.6340 (+5.12%)
ALL 0.6736 (+11.69%) 0.6631 (+9.95%) 0.6831 (+13.26%) 0.6809 (+12.90%) 0.6825 (13.17%)

5.3. Baselines

We implement four baseline methods from the literature.

Single-field document ranking:

In (Cafarella et al., 2009; Cafarella et al., 2008b) tables are represented and ranked as ordinary documents. Specifically, we use Language Models with Dirichlet smoothing, and optimize the smoothing parameter using a parameter sweep.

Multi-field document ranking:

Pimplikar and Sarawagi (2012) represent each table as a fielded document, using five fields: Wikipedia page title, table section title, table caption, table body, and table headings. We use the Mixture of Language Models approach (Ogilvie and Callan, 2003) for ranking. Field weights are optimized using the coordinate ascent algorithm; smoothing parameters are trained for each field individually.

WebTable:

The method by Cafarella et al. (2008b) uses the features in Table 1 with (Cafarella et al., 2008b) as source. Following (Cafarella et al., 2008b), we train a linear regression model with 5-fold cross-validation.

WikiTable:

The approach by Bhagavatula et al. (2013) uses the features in Table 1 with (Bhagavatula et al., 2013) as source. We train a Lasso model with coordinate ascent with 5-fold cross-validation.

Additionally, we introduce a learning-to-rank baseline:

LTR baseline:

It uses the full set of features listed in Table 1. We employ pointwise regression using the Random Forest algorithm.111We also experimented with Gradient Boosting regression and Support Vector Regression, and observed the same general patterns regarding feature importance. However, their overall performance was lower than that of Random Forests. We set the number of trees to 1000 and the maximum number of features in each tree to 3. We train the model using 5-fold cross-validation (w.r.t. NDCG@20); reported results are averaged over 5 runs.

The baseline results are presented in the top block of Table 5. It can be seen from this table that our LTR baseline (row five) outperforms all existing methods from the literature; the differences are substantial and statistically significant. Therefore, in the remainder of this paper, we shall compare against this strong baseline, using the same learning algorithm (Random Forests) and parameter settings. We note that our emphasis is on the semantic matching features and not on the supervised learning algorithm.

5.4. Experimental Results

The last line of Table 5 shows the results for our semantic table retrieval (STR) method. It combines the baseline set of features (Table 1) with the set of novel semantic matching features (from Table 2, 16 in total). We find that these semantic features bring in substantial and statistically significant improvements over the LTR baseline. Thus, we answer RQ1 positively. The relative improvements range from 7.6% to 15.3%, depending on the rank cut-off.

To answer RQ2 and RQ3, we report on all combinations of semantic representations and similarity measures in Table 6. In the interest of space, we only report on NDCG@20; the same trends were observed for other NDCG cut-offs. Cells with a white background show retrieval performance when extending the LTR baseline with a single feature. Cells with a grey background correspond to using a given semantic representation with different similarity measures (rows) or using a given similarity measure with different semantic representations (columns). The first observation is that all features improve over the baseline, albeit not all of these improvements are statistically significant. Concerning the comparison of different semantic representations (RQ2), we find that bag-of-entities and word embeddings achieve significant improvements; see the rightmost column of Table 6. It is worth pointing out that for word embeddings the four similarity measures seem to complement each other, as their combined performance is better than that of any individual method. It is not the case for bag-of-entities, where only one of the similarity measures (Late-max) is improved by the combination. Overall, in answer to RQ2, we find the bag-of-entities representation to be the most effective one. The fact that this sparse representation outperforms word embeddings is regarded as a somewhat surprising finding, given that the latter has been trained on massive amounts of (external) data.

As for the choice of similarity measure (RQ3), it is difficult to name a clear winner when a single semantic representation is used. The relative differences between similarity measures are generally small (below 5%). When all four semantic representations are used (bottom row in Table 6), we find that Late-sum and Late-avg achieve the highest overall improvement. Importantly, when using all semantic representations, all four similarity measures improve significantly and substantially over the baseline. We further note that the combination of all similarity measures do not yield further improvements over Late-sum or Late-avg. In answer to RQ3, we identify the late fusion strategy with sum or avg aggregation (i.e., Late-sum or Late-avg) as the preferred similarity method.

5.5. Analysis

We continue with further analysis of our results.

5.5.1. Features

Refer to caption
Figure 4. Normalized feature importance (measured in terms of Gini score).

Figure 4 shows the importance of individual features for the table retrieval task, measured in terms of Gini importance. The novel features are distinguished by color. We observe that 8 out of the top 10 features are semantic features introduced in this paper.

Refer to caption
(a) Bag-of-entities
Refer to caption
(b) Bag-of-categories
Refer to caption
(c) Word embeddings
Refer to caption
(d) Graph embeddings
Figure 5. Distribution of query-level differences between the LTR baseline and a given semantic representation.

5.5.2. Semantic Representations

To analyze how the four semantic representations affect retrieval performance on the level of individual queries, we plot the difference between the LTR baseline and each semantic representation in Figure 5. The histograms show the distribution of queries according to NDCG@20 score difference (Δ\Delta): the middle bar represents no change (Δ<\Delta<0.05), while the leftmost and rightmost bars represents the number of queries that were hurt and helped substantially, respectively (Δ>\Delta>0.25). We observe similar patterns for the bag-of-entities and word embeddings representations; the former has less queries that were significantly helped or hurt, while the overall improvement (over all topics) is larger. We further note the similarity of the shapes of the distributions for bag-of-categories and graph embeddings.

5.5.3. Query Subsets

On Figure 6, we plot the results for the LTR baseline and for our STR method according to the two query subsets, QS-1 and QS-2, in terms of NDCG@20. Generally, both methods perform better on QS-1 than on QS-2. This is mainly because QS-2 queries are more focused (each targeting a specific type of instance, with a required property), and thus are considered more difficult. Importantly, STR achieves consistent improvements over LTR on both query subsets.

5.5.4. Individual Queries

We plot the difference between the LTR baseline and STR for the two query subsets in Figure 7. Table 7 lists the queries that we discuss below. The leftmost bar in Figure 7(a) corresponds to the query “stocks.” For this broad query, there are two relevant and one highly relevant tables. LTR does not retrieve any highly relevant tables in the top 20, while STR manages to return one highly relevant table in the top 10. The rightmost bar in Figure 7(a) corresponds to the query “ibanez guitars.” For this query, there are two relevant and one highly relevant tables. LTR produces an almost perfect ranking for this query, by returning the highly relevant table at the top rank, and the two relevant tables at ranks 2 and 4. STR returns a non-relevant table at the top rank, thereby pushing the relevant results down in the ranking by a single position, resulting in a decrease of 0.29 in NDCG@20.

The leftmost bar in Figure 7(b) corresponds to the query “board games number of players.” For this query, there are only two relevant tables according to the ground truth. STR managed to place them in the 1st and 3rd rank positions, while LTR returned only one of them at position 13th. The rightmost bar in Figure 7(b) is the query “cereals nutritional value.” Here, there is only one highly relevant result. LTR managed to place it in rank one, while it is ranked eighth by STR. Another interesting query is “irish counties area” (third bar from the left in Figure 7(b)), with three highly relevant and three relevant results according to the ground truth. LTR returned two highly relevant and one relevant results at ranks 1, 2, and 4. STR, on the other hand, placed the three highly relevant results in the top 3 positions and also returned the three relevant tables at positions 4, 6, and 7.

Refer to caption
Figure 6. Table retrieval results, LTR baseline vs. STR, on the two query subsets in terms of NDCG@20.
Refer to caption
(a) QS-1
Refer to caption
(b) QS-2
Figure 7. Query-level differences on the two query subsets between the LTR baseline and STR. Positive values indicate improvements made by the latter.
Table 7. Example queries from our query set. Rel denotes table relevance level. LTR and STR refer to the positions on which the table is returned by the respective method.
Query Rel LTR STR
QS-1-24: stocks
      Stocks for the Long Run / Key Data Findings: annual real returns 2 - 6
      TOPIX / TOPIX New Index Series 1 9 -
      Hang Seng Index / Selection criteria for the HSI constituent stocks 1 - -
QS-1-21: ibanez guitars
      Ibanez / Serial numbers 2 1 2
      Corey Taylor / Equipment 1 2 3
      Fingerboard / Examples 1 4 5
QS-2-27: board games number of players
      List of Japanese board games 1 13 1
      List of licensed Risk game boards / Risk Legacy 1 - 3
QS-2-21: cereals nutritional value
      Sesame / Sesame seed kernels, toasted 2 1 8
QS-2-20: irish counties area
      Counties of Ireland / List of counties 2 2 1
      List of Irish counties by area / See also 2 1 2
      List of flags of Ireland / Counties of Ireland Flags 2 - 3
      Provinces of Ireland / Demographics and politics 1 4 4
      Toponymical list of counties of the United Kingdom / Northern … 1 - 7
      Múscraige / Notes 1 - 6

6. Related Work

There is an increasing amount of work on tables, addressing a wide range of tasks, including table search, table mining, table extension, and table completion. Table search is a fundamental problem on its own, as well as used often as a core component in other tasks.

Table Search

Users are likely to search for tables when they need structured or relational data. Cafarella et al. (2008b) pioneered the table search task by introducing the WebTables system. The basic idea is to fetch the top-ranked results returned by a web search engine in response to the query, and then extract the top-kk tables from those pages. Further refinements to the same idea are introduced in (Cafarella et al., 2009). Venetis et al. (2011) leverage a database of class labels and relationships extracted from the Web, which are attached to table columns, for recovering table semantics. This information is then used to enhance table search. Pimplikar and Sarawagi (2012) search for tables using column keywords, and match these keywords against the header, body, and context of tables. Google Web Tables222https://research.google.com/tables provides an example of a table search system interface; the developers’ experiences are summarized in (Balakrishnan et al., 2015). To enrich the diversity of search results, Nguyen et al. (2015) design a goodness measure for table search and selection. Apart from keyword-based search, tables may also be retrieved using a given “local” table as the query (Ahmadov et al., 2015; Das Sarma et al., 2012; Limaye et al., 2010). We are not aware of any work that performs semantic matching of tables against queries.

Table Extension/Completion

Table extension refers to the task of extending a table with additional elements, which are typically new columns (Das Sarma et al., 2012; Cafarella et al., 2009; Lehmberg et al., 2015; Yakout et al., 2012; Bhagavatula et al., 2013). These methods commonly use table search as the first step (Lehmberg et al., 2015; Bhagavatula et al., 2013; Yakout et al., 2012). Searching related tables is also used for row extension. In (Das Sarma et al., 2012), two tasks of entity complement and schema complement are addressed, to extend entity rows and columns respectively. Zhang and Balog (2017b) populate row and column headings of tables that have an entity focus. Table completion is the task of filling in empty cells within a table. Ahmadov et al. (2015) introduce a method to extract table values from related tables and/or to predict them using machine learning methods.

Table Mining

The abundance of information in tables has raised great interest in table mining research (Cafarella et al., 2011; Cafarella et al., 2008b; Madhavan et al., 2009; Sarawagi and Chakrabarti, 2014; Venetis et al., 2011; Zhang and Chakrabarti, 2013). Munoz et al. (2014) recover table semantics by extracting RDF triples from Wikipedia tables. Similarly, Cafarella et al. (2008b) mine table relations from a huge table corpus extracted from a Google crawl. Tables could also be searched to answer questions or mined to extend knowledge bases. Yin et al. (2016) take tables as a knowledge base to execute queries using deep neural networks. Sekhavat et al. (2014) augment an existing knowledge base (YAGO) with a probabilistic method by making use of table information. Similar work is carried out in (Dong et al., 2014), with tabular information used for knowledge base augmentation. Another line of work concerns table annotation and classification. Zwicklbauer et al. (2013) introduce a method to annotate table headers by mining column content. Crestan and Pantel (2011) introduce a supervised framework for classifying HTML tables into a taxonomy by examining the contents of a large number of tables. Apart from all the mentioned methods above, table mining also includes tasks like table interpretation (Cafarella et al., 2008b; Munoz et al., 2014; Venetis et al., 2011) and table recognition (Crestan and Pantel, 2011; Zwicklbauer et al., 2013). In the problem space of table mining, table search is an essential component.

7. Conclusion

In this paper, we have introduced and addressed the problem of ad hoc table retrieval: answering a keyword query with a ranked list of tables. We have developed a novel semantic matching framework, where queries and tables can be represented using semantic concepts (bag-of-entities and bag-of-categories) as well as continuous dense vectors (word and graph embeddings) in a uniform way. We have introduced multiple similarity measures for matching those semantic representations. For evaluation, we have used a purpose-built test collection based on Wikipedia tables. Finally, we have demonstrated substantial and significant improvements over a strong baseline. In future work, we wish to relax our requirements regarding the focus on Wikipedia tables, and make our methods applicable to other types of tables, like scientific tables (Gao and Callan, 2017) or Web tables.

References

  • (1)
  • Ahmadov et al. (2015) Ahmad Ahmadov, Maik Thiele, Julian Eberius, Wolfgang Lehner, and Robert Wrembel. 2015. Towards a Hybrid Imputation Approach Using Web Tables.. In Proc. of BDC ’15. 21–30.
  • Balakrishnan et al. (2015) Sreeram Balakrishnan, Alon Y. Halevy, Boulos Harb, Hongrae Lee, Jayant Madhavan, Afshin Rostamizadeh, Warren Shen, Kenneth Wilder, Fei Wu, and Cong Yu. 2015. Applying WebTables in Practice. In Proc. of CIDR ’15.
  • Balog et al. (2011) Krisztian Balog, Marc Bron, and Maarten De Rijke. 2011. Query modeling for entity search based on terms, categories, and examples. ACM Trans. Inf. Syst. 29, 4, Article 22 (Dec. 2011), 22:1–22:31 pages.
  • Bhagavatula et al. (2013) Chandra Sekhar Bhagavatula, Thanapon Noraset, and Doug Downey. 2013. Methods for Exploring and Mining Tables on Wikipedia. In Proc. of IDEA ’13. 18–26.
  • Bhagavatula et al. (2015) Chandra Sekhar Bhagavatula, Thanapon Noraset, and Doug Downey. 2015. TabEL: Entity Linking in Web Tables. In Proc. of ISWC ’15. 425–441.
  • Cafarella et al. (2009) Michael J. Cafarella, Alon Halevy, and Nodira Khoussainova. 2009. Data Integration for the Relational Web. Proc. of VLDB Endow. 2 (2009), 1090–1101.
  • Cafarella et al. (2011) Michael J. Cafarella, Alon Halevy, and Jayant Madhavan. 2011. Structured Data on the Web. Commun. ACM 54 (2011), 72–79.
  • Cafarella et al. (2008a) Michael J. Cafarella, Alon Halevy, Daisy Zhe Wang, Eugene Wu, and Yang Zhang. 2008a. Uncovering the Relational Web. In Proc. of WebDB ’08.
  • Cafarella et al. (2008b) Michael J. Cafarella, Alon Halevy, Daisy Zhe Wang, Eugene Wu, and Yang Zhang. 2008b. WebTables: Exploring the Power of Tables on the Web. Proc. of VLDB Endow. 1 (2008), 538–549.
  • Chen et al. (2016) Jing Chen, Chenyan Xiong, and Jamie Callan. 2016. An Empirical Study of Learning to Rank for Entity Search. In Proc. of SIGIR ’16. 737–740.
  • Crestan and Pantel (2011) Eric Crestan and Patrick Pantel. 2011. Web-scale Table Census and Classification. In Proc. of WSDM ’11. 545–554.
  • Das Sarma et al. (2012) Anish Das Sarma, Lujun Fang, Nitin Gupta, Alon Halevy, Hongrae Lee, Fei Wu, Reynold Xin, and Cong Yu. 2012. Finding Related Tables. In Proc. of SIGMOD ’12. 817–828.
  • Dong et al. (2014) Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, Shaohua Sun, and Wei Zhang. 2014. Knowledge Vault: A Web-scale Approach to Probabilistic Knowledge Fusion. In Proc. of KDD ’14. 601–610.
  • Fleiss et al. (1971) J.L. Fleiss and others. 1971. Measuring nominal scale agreement among many raters. Psychological Bulletin 76 (1971), 378–382.
  • Ganguly et al. (2015) Debasis Ganguly, Dwaipayan Roy, Mandar Mitra, and Gareth J.F. Jones. 2015. Word Embedding Based Generalized Language Model for Information Retrieval. In Proc. of SIGIR ’15. 795–798.
  • Gao and Callan (2017) Kyle Yingkai Gao and Jamie Callan. 2017. Scientific Table Search Using Keyword Queries. CoRR abs/1707.03423 (2017).
  • Grbovic et al. (2015) Mihajlo Grbovic, Nemanja Djuric, Vladan Radosavljevic, Fabrizio Silvestri, and Narayan Bhamidipati. 2015. Context- and Content-aware Embeddings for Query Rewriting in Sponsored Search. In Proc. of SIGIR ’15. 383–392.
  • Hasibi et al. (2017a) Faegheh Hasibi, Krisztian Balog, Darío Garigliotti, and Shuo Zhang. 2017a. Nordlys: A Toolkit for Entity-Oriented and Semantic Search. In Proceedings of SIGIR ’17. 1289–1292.
  • Hasibi et al. (2017b) Faegheh Hasibi, Fedor Nikolaev, Chenyan Xiong, Krisztian Balog, Svein Erik Bratsberg, Alexander Kotov, and Jamie Callan. 2017b. DBpedia-Entity V2: A Test Collection for Entity Search. In Proc. of SIGIR ’17. 1265–1268.
  • Kenter and de Rijke (2015) Tom Kenter and Maarten de Rijke. 2015. Short Text Similarity with Word Embeddings. In Proc. of CIKM ’15. 1411–1420.
  • Lehmberg et al. (2015) Oliver Lehmberg, Dominique Ritze, Petar Ristoski, Robert Meusel, Heiko Paulheim, and Christian Bizer. 2015. The Mannheim Search Join Engine. Web Semant. 35 (2015), 159–166.
  • Limaye et al. (2010) Girija Limaye, Sunita Sarawagi, and Soumen Chakrabarti. 2010. Annotating and Searching Web Tables Using Entities, Types and Relationships. Proc. of VLDB Endow. 3 (2010), 1338–1347.
  • Liu (2011) Tie-Yan Liu. 2011. Learning to Rank for Information Retrieval. Springer Berlin Heidelberg.
  • Macdonald et al. (2012) Craig Macdonald, Rodrygo L T Santos, and Iadh Ounis. 2012. On the Usefulness of Query Features for Learning to Rank. In Proc. of CIKM ’12. 2559–2562.
  • Madhavan et al. (2009) Jayant Madhavan, Loredana Afanasiev, Lyublena Antova, and Alon Y. Halevy. 2009. Harnessing the Deep Web: Present and Future. CoRR abs/0909.1785 (2009).
  • Manotumruksa et al. (2016) Jarana Manotumruksa, Craig MacDonald, and Iadh Ounis. 2016. Modelling User Preferences using Word Embeddings for Context-Aware Venue Recommendation. CoRR abs/1606.07828 (2016).
  • Mikolov et al. (2013) Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and Their Compositionality. In Proc. of NIPS ’13. 3111–3119.
  • Mitra et al. (2016) Bhaskar Mitra, Eric T. Nalisnick, Nick Craswell, and Rich Caruana. 2016. A Dual Embedding Space Model for Document Ranking. CoRR abs/1602.01137 (2016).
  • Munoz et al. (2014) Emir Munoz, Aidan Hogan, and Alessandra Mileo. 2014. Using Linked Data to Mine RDF from Wikipedia’s Tables. In Proc. of WSDM ’14. 533–542.
  • Nguyen et al. (2015) Thanh Tam Nguyen, Quoc Viet Hung Nguyen, Weidlich Matthias, and Aberer Karl. 2015. Result Selection and Summarization for Web Table Search. In ISDE ’15. 231–242.
  • Ogilvie and Callan (2003) Paul Ogilvie and Jamie Callan. 2003. Combining Document Representations for Known-item Search. In Proc. of SIGIR ’03. 143–150.
  • Pennington et al. (2014) Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. GloVe: Global Vectors for Word Representation. In Proc. of EMNLP ’14. 1532–1543.
  • Perozzi et al. (2014) Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. DeepWalk: Online Learning of Social Representations. In Proc. of KDD ’14. 701–710.
  • Pimplikar and Sarawagi (2012) Rakesh Pimplikar and Sunita Sarawagi. 2012. Answering Table Queries on the Web Using Column Keywords. Proc. of VLDB Endow. 5 (2012), 908–919.
  • Qin et al. (2010) Tao Qin, Tie-Yan Liu, Jun Xu, and Hang Li. 2010. LETOR: A Benchmark Collection for Research on Learning to Rank for Information Retrieval. Inf. Retr. 13, 4 (Aug 2010), 346–374.
  • Raviv et al. (2016) Hadas Raviv, Oren Kurland, and David Carmel. 2016. Document Retrieval Using Entity-Based Language Models. In Proc. of SIGIR ’16. 65–74.
  • Ristoski and Paulheim (2016) Petar Ristoski and Heiko Paulheim. 2016. RDF2vec: RDF Graph Embeddings for Data Mining. In Proc. of ISWC ’16. 498–514.
  • Sarawagi and Chakrabarti (2014) Sunita Sarawagi and Soumen Chakrabarti. 2014. Open-domain Quantity Queries on Web Tables: Annotation, Response, and Consensus Models. In Proc. of KDD ’14. 711–720.
  • Sekhavat et al. (2014) Yoones A. Sekhavat, Francesco Di Paolo, Denilson Barbosa, and Paolo Merialdo. 2014. Knowledge Base Augmentation using Tabular Data. In Proc. of LDOW ’14.
  • Tang et al. (2015) Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. LINE: Large-scale Information Network Embedding. In Proc. of WWW ’15. 1067–1077.
  • Tyree et al. (2011) Stephen Tyree, Kilian Q Weinberger, Kunal Agrawal, and Jennifer Paykin. 2011. Parallel Boosted Regression Trees for Web Search Ranking. In Proc. of WWW ’11. 387–396.
  • Venetis et al. (2011) Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Paşca, Warren Shen, Fei Wu, Gengxin Miao, and Chung Wu. 2011. Recovering Semantics of Tables on the Web. Proc. of VLDB Endow. 4 (2011), 528–538.
  • Vulić and Moens (2015) Ivan Vulić and Marie-Francine Moens. 2015. Monolingual and Cross-Lingual Information Retrieval Models Based on (Bilingual) Word Embeddings. In Proc. of SIGIR ’15. 363–372.
  • Xiong et al. (2017) Chenyan Xiong, Jamie Callan, and Tie-Yan Liu. 2017. Word-Entity Duet Representations for Document Ranking. In Proc. of SIGIR ’17. 763–772.
  • Yakout et al. (2012) Mohamed Yakout, Kris Ganjam, Kaushik Chakrabarti, and Surajit Chaudhuri. 2012. InfoGather: Entity Augmentation and Attribute Discovery by Holistic Matching with Web Tables. In Proc. of SIGMOD ’12. 97–108.
  • Yin et al. (2016) Pengcheng Yin, Zhengdong Lu, Hang Li, and Ben Kao. 2016. Neural Enquirer: Learning to Query Tables in Natural Language. In Proc. of IJCAI ’16. 2308–2314.
  • Zhang and Chakrabarti (2013) Meihui Zhang and Kaushik Chakrabarti. 2013. InfoGather+: Semantic Matching and Annotation of Numeric and Time-varying Attributes in Web Tables. In Proc. of SIGMOD ’13. 145–156.
  • Zhang and Balog (2017a) Shuo Zhang and Krisztian Balog. 2017a. Design Patterns for Fusion-Based Object Retrieval. In Proc. of ECIR ’17. 684–690.
  • Zhang and Balog (2017b) Shuo Zhang and Krisztian Balog. 2017b. EntiTables: Smart Assistance for Entity-Focused Tables. In Proc. of SIGIR ’17. 255–264.
  • Zhou et al. (2015) Guangyou Zhou, Tingting He, Jun Zhao, and Po Hu. 2015. Learning Continuous Word Embedding with Metadata for Question Retrieval in Community Question Answering. In Proc. of ACL ’15. 250–259.
  • Zwicklbauer et al. (2013) Stefan Zwicklbauer, Christoph Einsiedler, Michael Granitzer, and Christin Seifert. 2013. Towards Disambiguating Web Tables. In Proc. of ISWC-PD ’13. 205–208.