This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Exempla Gratis (E.G.): Code Examples for Free

Celeste Barnaby Facebook Inc.U.S.A. celestebarnaby@fb.com Koushik Sen UC BerkeleyU.S.A. ksen@berkeley.edu Tianyi Zhang Harvard UniversityU.S.A. tianyi@seas.harvard.edu Elena Glassman Harvard UniversityU.S.A. glassman@seas.harvard.edu  and  Satish Chandra Facebook Inc.U.S.A. schandra@acm.org
(2020)
Abstract.

Modern software engineering often involves using many existing APIs, both open source and – in industrial coding environments – proprietary. Programmers reference documentation and code search tools to remind themselves of proper common usage patterns of APIs. However, high-quality API usage examples are computationally expensive to curate and maintain, and API usage examples retrieved from company-wide code search can be tedious to review. We present a tool, \aroma, that mines codebases and shows the common, idiomatic usage examples for API methods. \aroma was integrated into Facebook’s internal code search tool for the Hack language and evaluated on open-source GitHub projects written in Python. \aroma was also compared against code search results and hand-written examples from a popular programming website called ProgramCreek. Compared with these two baselines, examples generated by \aroma are more succinct and representative with less extraneous statements. In addition, a survey with Facebook developers shows that \aroma examples are preferred in 97% of cases.

API examples, big code, software tools
copyright: rightsretaineddoi: 10.1145/3368089.3417052journalyear: 2020submissionid: fse20ind-p36-pisbn: 978-1-4503-7043-1/20/11conference: Proceedings of the 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering; November 8–13, 2020; Virtual Event, USAbooktitle: Proceedings of the 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE ’20), November 8–13, 2020, Virtual Event, USAccs: Software and its engineering Software maintenance tools

1. Introduction

Application programming interfaces (APIs) are becoming a pervasive component of modern software engineering. A core challenge for software engineers in industry is to use existing APIs in idiomatic ways within their organization. In order to do this, developers often search for API documentation and usage examples (Duala-Ekoko and Robillard, 2012; Robillard, 2009; Buse and Weimer, 2012). However, this can be especially challenging in companies where many APIs are proprietary. Because those proprietary APIs are only documented within the company by its engineers, there is no externally crowdsourced documentation or examples posted on sites like StackOverflow.

Usage examples are a key component of API documentation (Maalej and Robillard, 2013). Examples can refresh a programmer’s memory (Brandt et al., 2009), concretize the more abstract components of documentation (Robillard, 2009), and support code improvement and adaptation (Zhang et al., 2019; Luan et al., 2019). However, there is some risk that programmers will generalize from or adapt an example incorrectly, e.g., leaving in irrelevant components or leaving out critical ones. One useful trait of an example is succinctness (Nasehi et al., 2012): having minimal details specific to a particular usage situation and few superficial distractions, leaving just what is common across most or all proper usages of the API. Alternatively, multiple examples showing a variety of usages (Orgill, 2012) may help programmers infer what may be common and uncommon usage patterns and parameter values. Writing usage examples that have these helpful properties is labor-intensive, especially when there exist multiple proper, consistently used usage patterns within an organization.

Regardless of whether available documentation includes one or more usage examples, many programmers instead use company- or project-wide code search to find API usage snippets. However, code search engines results ranking is difficult and often defaults to showing the most recently edited files first. The task of sifting through myriad code search results in an attempt to glean a common usage pattern can be tedious, time-consuming, and unproductive (Starke et al., 2009). If the developer does decide to use code from a code search result, they have no assurance that this code represents a common usage, rather than an atypical, niche way of using a method. In fact, prior work has shown that individual code examples may even suffer from API usage violations (Zhang et al., 2018), insecure coding practices (Fischer et al., 2017), and unchecked obsolete usage (Zhou and Walker, 2016). Therefore, without thoroughly inspecting and comparing many examples, developers may leave out critical safety checks or desirable usage scenarios.

Several approaches have been previously proposed to address this challenge of presenting programmers with good API usage examples, whether found through search or automatically generated. A number of approaches cluster and rank similar examples to reduce the cognitive load of reading through individual examples (Kim et al., 2010; Buse and Weimer, 2012; Katirtzis et al., 2018). However, these clustering techniques rely on pre-defined similarity metrics and do not help users understand why some examples are clustered together, e.g., the commonalities and variations among those examples. Buse et al.  presented a synthesis technique to generate a single example from multiple similar examples (Buse and Weimer, 2012), but the synthetic example only demonstrates a common skeleton, without showing possible variations. In contrast, Examplore is capable of visualizing an entire distribution over API usage features in a large number of API usage examples (Glassman et al., 2018), but the analysis requires a pre-defined API skeleton and concrete usage patterns are best revealed through additional interaction with the visualization.

In this paper, we present \aroma, a tool that mines codebases and shows multiple common, idiomatic usage examples for API methods. \aroma assumes access to a large repository of Python programs from many projects. Given a query API method, \aroma first searches all methods in the repository containing at least one usage of the API method. It then computes the parse tree of each such method and finds the maximal subtree that a) contains the query API, and b) is part of a meaningful proportion of methods. \aroma then serializes the subtree to create a common idiomatic usage pattern of the API method. \aroma repeats this process multiple times to find nn diverse idiomatic usage patterns.

Once \aroma has generated several common usage patterns, it displays the patterns to users in an easy-to-use interface. For each usage pattern, \aroma displays a concise and representative code snippet to serve as an example of that usage pattern. In each example, \aroma emphasizes the code parts that are part of the common usage pattern in bold texts, while graying out the uncommon parts – referred to in this paper as ”filler”. Further, \aroma displays how many times each usage pattern appears in the repository. This interface allows users to efficiently understand the common usage of an API method, and relieves the cognitive load of manually looking through code results in an attempt to discern a common pattern. In additional, \aroma relieves the burden of manually curating examples for API methods, and automates the task of keeping API examples up-do-date and relevant as a codebase changes.

\aroma

has several properties that are particularly advantageous for its scalability and generality. First, \aroma is language agnostic: to generate \aroma examples for a new programming language, one need only implement a new parser. Second, \aroma does not require mining coding patterns ahead of time, and can retrieve new and idiomatic usage patterns on-the-fly. Third, \aroma is fast enough to use in real time, and can generate examples from a large corpus containing millions of methods within a couple of seconds on a multi-core server machine. On average, \aroma takes 1.0 seconds to generate examples for a query method on a 24-core CPU.

We have implemented \aroma in C++ for Hack and Python. We have also integrated \aroma into Facebook’s internal code search website, where it is used daily by developers. We report our experimental evaluation of \aroma for Python. We have used \aroma to index 1,900,911 Python methods obtained from open source GitHub projects. We performed our experiments for Python because it is a language that is widely used at Facebook, as well as in open source projects. We evaluated \aroma against code search results and examples from ProgramCreek, a website providing code examples of Python methods. We found that developers preferred \aroma examples to code search results in over 99% of cases, and that a majority of developers found the main features of the \aroma interface useful. We also found that \aroma examples were shorter, more relevant, and more representative than code search results or ProgramCreek examples.

The rest of this paper is organized as follows. Section 2 motivates the design of \aroma with insights and lessons learned from deploying another code search and recommendation tool in Facebook. Section 3 describes a usage scenario of learning APIs with \aroma. Section 4 describes the pattern mining and example generation algorithms in \aroma. Section 5 describes the evaluation of \aroma, including a survey with Facebook developers, a quantitative analysis of examples generated by \aroma and two other tools, and a summary of \aroma’s usage metrics after its deployment in Facebook. Section 5.4 discusses the challenges we encountered when evaluating \aroma. Section 6 discusses related work and Section 7 concludes this paper.

Table 1. \aroma code examples for a variety of Python methods.
Query Examples Notes
Case A: json.dump This usage pattern is found in 29 out of 336 samples. 11endnote: 1This code snippet is adapted from https://github.com/openai/gym/blob/master/gym/wrappers/monitoring/video_recorder.py#L229. Accessed in March 2020. with open(self.output_path, ’w’) as f:     json.dump(data, f)   This usage pattern is found in 17 out of 336 samples. 22endnote: 2This code snippet is adapted from https://github.com/scrapinghub/splash/blob/master/scripts/rst2inspections.py#L77. Accessed in March 2020. with open(out_filename, "w") as f:         json.dump(info, f, indent=2)   This usage pattern is found in 17 out of 336 samples. 33endnote: 3This code snippet is adapted from https://github.com/supernnova/SuperNNova/blob/master/supernnova/utils/experiment_settings.py#L161. Accessed in March 2020. with open(Path(self.rnn_dir) / "cli_args.json", "w") as f:     json.dump(self.cli_args, f, indent=4, sort_keys=True) The first example shows that the following is idiomatic: Opening a file before calling json.dump Passing ’w’ as the second argument to open Passing f as the second argument to json.dump The second and third examples show that it is also idiomatic to pass an integer to the optional parameter indent, and to pass True to the optional parameter sort_keys
Case B:
os.makedirs This usage pattern is found in 103 out of 1699 samples. 44endnote: 4This code snippet is adapted from https://github.com//huggingface/transformers/tree/master/examples/run_multiple_choice.py. Accessed in March 2020. output_dir = os.path.join(args.output_dir,                     "checkpoint-".format(global_step)) if not os.path.exists(output_dir):     os.makedirs(output_dir)   This usage pattern is found in 110 out of 1699 samples. 55endnote: 5This code snippet is adapted from https://github.com/zalandoresearch/fashion-mnist/blob/master/configs.py#L49. Accessed in March 2020. base_dir = os.path.dirname(fname) if not os.path.exists(base_dir):     os.makedirs(base_dir)   This usage pattern is found in 116 out of 1699 samples. 66endnote: 6This code snippet is adapted from https://github.com/toddheitmann/PetroPy/blob/master/petropy/download.py#L195. Accessed in March 2020. year_dir = os.path.join(save_dir,                     url.split(’/’)[-1].split(’.’)[0]) if not os.path.isdir(year_dir):     os.makedirs(year_dir) The first example shows that the following is idiomatic: Calling os.path.join and os.path.exists before calling os.makedirs. Calling os.makedirs on the condition that the directory you are making does not already exist. The second example shows an alternate idiom where os.path.dirname is called instead of os.path.join, while the third example calls os.path.isdir instead of os.path.exists.
Case C: range This usage pattern is found in 213 out of 2000 samples. 77endnote: 7This code snippet is adapted from https://github.com/TarrySingh/Artificial-Intelligence-Deep-Learning-Machine-Learning-Tutorials/blob/master/deep-learning/1-pixel-attack/networks/capsnet.py#L41 for i in range(3):     img[:,:,i] = (img[:,:,i] - mean[i]) / std[i]   This usage pattern is found in 150 out of 2000 samples. 88endnote: 8This code snippet is adapted from https://github.com/scikit-learn-contrib/category_encoders/blob/master/category_encoders/sum_coding.py#L238. Accessed in March 2020. columns=[str(col) + ’_%d’ % (i, )     for i in range(len(sum_contrast_matrix.column_suffixes))]   This usage pattern is found in 123 out of 2000 samples. 99endnote: 9This code snippet is adapted from https://github.com/waditu/tushare/blob/master/tushare/util/common.py#L40. Accessed in March 2020. for j in range(start, i): The first example shows that the following is idiomatic: Calling range in the condition of a for loop. Naming the for loop variable i. The second example shows a common idiom for list comprehension using range, while the third example shows that two variables may be passed to range.
Case D:
csv.writer This usage pattern is found in 11 out of 160 samples. 1010endnote: 10This code snippet is adapted from https://github.com/bboczeng/Nyxar/blob/master/api/coinmarketcap.py#L94. Accessed in March 2020. with open(filename, ’a+’, newline=’’) as file:     writer = csv.writer(file)     writer.writerow(fieldnames)   This usage pattern is found in 11 out of 160 samples. 1111endnote: 11This code snippet is adapted from https://github.com/vgpena/next-weekend/blob/master/scraper.py#L82 writer = csv.writer(csvfile, delimiter=’,’,                     quotechar=’|’, quoting=csv.QUOTE_MINIMAL) writer.writerow([hike_name, url, trailhead_name, ...])   This usage pattern is found in 11 out of 160 samples. 1212endnote: 12This code snippet is adapted from https://github.com/baychimo/loto/blob/master/tests/test_loto.py#L134. Accessed in March 2020. with open(ntf.name, "w") as f:     ntf_writer csv.writer(f, delimiter=",") The first example shows that the following is idiomatic: Opening a file before calling csv.writer Passing ’a+’ as the second argument to open, and passing ’’ as the argument to the optional parameter newline Calling writer.writerow after csv.writer The second example shows an alternate idiom where a list of items is passed to writer.writerow, while the third example shows an idiom where an argument is provided for the optional parameter delimiter
Case E:
requests.post This usage pattern is found in 67 out of 1019 samples. 1313endnote: 13This code snippet is adapted from https://github.com/home-assistant/core/blob/master/homeassistant/components/ohmconnect/sensor.py#L70. Accessed in March 2020. response = requests.get(url, timeout=10)   This usage pattern is found in 48 out of 1019 samples. 1414endnote: 14This code snippet is adapted from https://github.com/zvtvz/zvt/blob/master/zvt/recorders/exchange/china_index_list_spider.py#L85. Accessed in March 2020. try:     response requests.get(url) except requests.HTTPError as error:   This usage pattern is found in 48 out of 1019 samples. 1515endnote: 15This code snippet is adapted from https://github.com/testerSunshine/12306/blob/master/verify/pretreatment.py#L26. Accessed in March 2020. url = ’https://kyfw.12306.cn/otn/passcodeNew/...’ r = requests.get(url) The first example shows that the following is idiomatic: Naming the variable assigned to requests.getresponse Passing two arguments to requests.get The second example shows an alternate idiom where the call to requests.get is wrapped in a try-catch block, while the third example shows that it is common to initialize a string variable names url before calling requests.get.

2. Motivations from Facebook

Aroma is a code-to-code search and recommendation tool. It has been integrated into Facebook’s IDE and internal code search website in December 2018 (Luan et al., 2019). Given a code snippet as input and a large code corpus, Aroma returns a set of idiomatic extensions to the input code clustered together from similar code snippets in the corpus. Aroma produces code recommendations for Hack, Python, Java, and JavaScript. Here, we summarize how the lessons we learned from Aroma informed the design of \aroma.

We expected that developers would query Aroma with multi-line code snippets, to get recommendations for how they should modify or improve their code. However, we found that in practice, most Aroma queries were for single API methods. Furthermore, most of these queried APIs were Facebook-specific APIs for which there was little existing documentation and no hand-written examples. We concluded, then, that developers at Facebook were using Aroma to obtain API usage examples.

Since Aroma was not designed for generating examples of API usage, recommendations created from querying a single API method had several shortcomings. First, we found that across many different methods, APIs, and libraries, Aroma recommendations consistently cut out the arguments passed into a function call. For example, in Figure  17, the example generated by Aroma does not include any arguments to the assert_frame_equal method in pandas.testing. This is because Aroma is designed to prune out code that is different among multiple snippets in a cluster, while retaining code that is commonly shared among them. Since different calls to this method tend to contain different arguments, the arguments are pruned out in the recommendation. Second, examples generated by Aroma include many extraneous statements. In Figure 17, this example contains several lines that are not strictly relevant to the assert_frame_equal call, such as the function header, and the initialization of the query variable.

Refer to caption
Figure 1. Aroma recommendation for assert_frame_equal1717endnote: 17This code snippet is adapted from https://github.com/pydata/pandas-gbq/blob/master/tests/system/test_gbq.py#L130. Accessed in March 2020.

For API learning, these shortcomings are detrimental. When learning the common usage of an API method, it is helpful to see its common arguments, and usually unhelpful to see a lot of extraneous context. Prior work has shown that conciseness is an important feature of code examples, and that the median length of hand-written examples is five lines (Nasehi et al., 2012). Aroma’s ability to perform fuzzy searches also goes under-utilized when the query is a single method.

For these reasons, we decided that, while Aroma is still a powerful code recommendation engine with other potential uses, it is not suitable as a generator of API usage examples. Thus, we created \aroma to allow developers to see succinct, idiomatic usage examples for an API method. Figure 19 shows the top example generated by \aroma for learning assert_frame_equal. This example includes the arguments of the queried method and removes those extraneous statements. The additional code serves only to further illuminate the use of assert_frame_equals, as it shows how to initialize its arguments. Further, \aroma shows code elements that are common in black text, and code elements that are unique to a single snippet in gray text. This allows users to understand what is common and what is atypical, while still seeing a complete, readable example.

Refer to caption
Figure 2. \aroma’s example for assert_frame_equal 1919endnote: 19This code snippet is adapted from https://github.com/pandas-dev/pandas/blob/master/pandas/tests/frame/test_dtypes.py. Accessed in March 2020.

3. Usage Scenario

This section describes a usage scenario of learning Python APIs with \aroma. While we find \aroma to be most useful for proprietary libraries with few hand-written examples, we cannot release such proprietary code in this paper for confidentiality reasons. Thus, for the purposes of this scenario, we assume that hand-written examples for the libraries mentioned are not widely available.

Suppose Harry is a novice Python developer. He needs to write code that creates a directory and then writes some text to a file in that directory. He is aware that there is a makedirs function in the os package, but he is not sure how to use it. He searches for os.makedirs in \aroma. Figure 4 shows the top example generated by \aroma. This example shows that among 1699 snippets that call os.makedirs, 103 followed the same API usage pattern. The bolded code in this example shows the idiomatic usage of os.path.exists. Harry finds that it is common to check whether the directory exists before creating it. Further, he finds that it is idiomatic to call os.path.join together with os.makedirs to safely construct a file path across platforms. A link to the file containing the code snippet used in this example is displayed above the code snippet, which Harry can use if he wants to see additional context.

Refer to caption
Figure 3. \aroma’s interface showing an example for os.makedirs. When code search results initially load, the top \aroma example is presented as the first result.44endnotemark: 4

Harry clicks ”Show More Examples” to view additional usage examples of os.makedirs, as shown in Figure 6. He sees that the third example calls os.path.isdir in the if statement instead of os.path.exists. The text above this code example indicates that this is a common usage pattern appearing in 116 out of 1699 snippets, giving Harry confidence that this is another standard check before calling os.makedirs. Harry copies this code from the \aroma example and replaces year_dir with the name of his directory. Since these examples generated by \aroma have already summarized distinct API usage in hundreds of examples in the codebase, Harry feels he does not look at any additional code search results.

Refer to caption
Figure 4. The ”Show More Examples” button displays the top three common usage examples.44endnotemark: 455endnotemark: 566endnotemark: 6

Harry now needs his code to write text to a file, so he queries write in \aroma, without including a package name. Figure 21 shows the top example generated by \aroma. He sees that this common usage pattern is found in 150 methods out of 2000 snippets, indicating that it is idiomatic to open a file before calling write. He also sees that, in this code snippet, the second argument to open is "w". Further search shows that "w" means write-only. This is exactly what Harry needs. By stitching this example with the previous example, Harry successfully writes the desired code.

Refer to caption
Figure 5. \aroma’s example for write.2121endnote: 21This code is adapted from https://github.com/pydata/xarray/blob/master/xarray/tests/test_backends_file_manager.py#L197. Accessed in March 2020.

4. Example generation algorithm

In this section, we describe several notations and definitions to compute the simplified parse tree of a program. The terminologies and notations are similar to that in Aroma (Luan et al., 2019). We reintroduce the definitions to keep the paper self-contained.

4.1. Formal Definitions

Definition 4.1 (Keyword tokens).

This is the set of all tokens in a language whose values are fixed as part of the language. Keyword tokens include keywords such as while and if, and symbols such as {, }, ., +, *. The set of all keyword tokens is finite for a language.

Definition 4.2 (Non-keyword tokens).

This is the set of all tokens that are not keyword tokens. Non-keyword tokens include variable names, method names, field names, and literals.

Examples of non-keyword tokens are i, length, 0, 11, etc. The set of non-keyword tokens is non-finite for most languages.

Definition 4.3 (Simplified Parse Tree).

A simplified parse tree is a data structure to represent a program. It is recursively defined as a non-empty list whose elements could be any of the following:

  • a non-keyword token,

  • a keyword token, or

  • a simplified parse tree.

Moreover, a simplified parse tree cannot be a list containing a single simplified parse tree.

We picked this particular representation of programs instead of a conventional abstract syntax tree representation because the representation only consists of program tokens, and does not use any special language-specific rule names such as IfStatement, block etc. As such, the representation can be used uniformly across various programming languages. Moreover, one could perform an in-order traversal of a simplified parse tree and print the token names to obtain the original program. We use this feature of a simplified parse tree to show the common usage examples.

Definition 4.4 (Label of a Simplified Parse Tree).

The label of a simplified parse tree is obtained by concatenating all the elements of the list representing the tree as follows:

  • If an element is a keyword token, the value of the token is used for concatenation.

  • If an element is a non-keyword token or a simplified parse tree, the special symbol # is used for concatenation.

For example, the label of the simplified parse tree ["x", ">", ["y", ".", "f"]] is "#>#".

     with open(self.output_path, ’w’) as f:          json.dump(data, f)     Refer to caption

Figure 6. A simplified parse tree of a code snippet. Variable nodes are highlighted in double circles.

Figure 6 shows a code snippet and its simplified parse tree. In the figure, each internal node represents a simplified parse tree, and is labeled using the tree’s label as defined above. Since keyword tokens in a simplified parse tree become part of the label of the tree, we do not create leaf nodes for keyword tokens in the tree diagram—we only add leaf nodes for non-keyword tokens. We show the label of each node in the tree, and add a unique index to each label as subscript to distinguish between nodes with similar labels.

To obtain the simplified parse tree of a code snippet, \aroma relies on a language-specific parser. For example, \aroma utilizes the lib2to3 Python parser to produce the simplified parse tree for a Python program. Once the simplified parse tree of a code snippet has been created, the rest of \aroma’s algorithm is language-agnostic.

We will represent a simplified parse tree tt using the tuple (N,L,E)(N,L,E), where

  • NN is the set of nodes of the tree,

  • LL is a function that maps a node to the label of the subtree rooted at the node,

  • EE is a children function. If n2n_{2} is the ithi^{\rm th} direct child of the node n1n_{1}, then E(n1,i)=n2E(n_{1},i)=n_{2}. If the ithi^{\rm th} child of a node nn does not exist, then E(n,i)=E(n,i)=\bot.

For example, with#:#1 and self8 N\in N are sample nodes in the tree shown in Figure 6. L(L( with#:#)1={}_{1})= with#:#. E(E(#as#,22)={}_{2},2)=f12.

A subtree of a tree tt is a tree rooted at some node in tt and contains all the descendants of the node in tt. Formally, t=(N,L,E)t^{\prime}=(N^{\prime},L,E^{\prime}) is a subtree of t=(N,L,E)t=(N,L,E) if the following conditions hold:

  • NNN^{\prime}\subseteq N,

  • for all n1Nn_{1}\in N^{\prime} if there exists n2Nn_{2}\in N and an ii\in\mathbb{N} such that E(n1,i)=n2E(n_{1},i)=n_{2}, then n2Nn_{2}\in N^{\prime} and E(n1,i)=n2E^{\prime}(n_{1},i)=n_{2},

  • for all nNn\in N^{\prime}, if there exists ii\in\mathbb{N} such that E(n,i)=E(n,i)=\bot, then E(n,i)=E^{\prime}(n,i)=\bot.

For example, the subtree rooted at ##3 in Figure 7 is highlighted in red.

A context tree of a tree tt is the tree with some of its subtrees removed. Formally, if t=(N,L,E)t^{\prime}=(N^{\prime},L,E^{\prime}) is a context tree of t=(N,L,E)t=(N,L,E), then the following conditions hold:

  • NNN^{\prime}\subseteq N,

  • for all n1Nn_{1}\in N^{\prime} if there exists n2Nn_{2}\in N and a ii\in\mathbb{N} such that E(n2,i)=n1E(n_{2},i)=n_{1}, then n2Nn_{2}\in N^{\prime} and E(n2,i)=n1E^{\prime}(n_{2},i)=n_{1}.

In Figure 7, we highlight a context of the tree in green.

Refer to caption
Figure 7. A simplified parse tree with a subtree highlighted in red and a context highlighted in green.

A context subtree can be obtained from a tree first by picking a subtree of the tree and then picking a context tree of the subtree. We also use the term pattern to refer to a context subtree.

4.2. \aroma Algorithm

We assume that we are given a set of trees TT and a query tree qq. Note that a query – for example, json.dump – is parsed as a code construct as well. \aroma works in two steps to create a common usage pattern. First, it finds a pattern tt such that qq is a subtree of tt and tt is a pattern in each tree in a subset TT^{\prime} of TT. The pattern denotes a partial code snippet that is common and contains the query snippet. Second, \aroma finds a completion of the pattern by picking a subtree from a suitable tree in TT^{\prime}. The subtree must contain the pattern as a context. The subtree denotes a common usage code snippet of qq.

Phase 1.

\aroma

starts with the pattern qq and grows it iteratively by adding nodes to the pattern as shown in Figure 9. Let us assume that after some iteration the current pattern is tc=(N,L,E)t_{c}=(N,L,E) and it is present exactly in TcTT_{c}\subseteq T trees. Then a suitable neighboring node n1n_{1} is added to the pattern to obtain a new bigger pattern as described in Figure 10 . The tuple (l,i,n1,n2,b)(l,i,n_{1},n_{2},b) denotes that a node n1n_{1} is added to the tree tct_{c} where ll is the label of n1n_{1}, n2n_{2} is the node in tct_{c} connected to n1n_{1}, and bb is a Boolean which if true means E(n2,i)=n1E(n_{2},i)=n_{1}, and E(n1,i)=n2E(n_{1},i)=n_{2} if bb is false. The support of a tree added to the pattern is the number of trees in TcT_{c} that contain the new pattern (see Figure 11). In an iteration, \aroma adds a node to tct_{c} such that the new pattern has the highest support. At the end of an iteration, \aroma updates tct_{c} with the new pattern and the set of all the trees in TcT_{c} containing the new pattern becomes the new TcT_{c}.

\aroma

continues the iterations until the number of nodes in tct_{c} exceeds a configurable threshold γ\gamma (usually set to 100) or the cardinality of TcT_{c} divided by TT goes below a configurable threshold α\alpha (usually set to .05). Threshold γ\gamma ensures that the generated example is not too long, while threshold α\alpha ensures that the generate example is a common snippet.

Figure 8 shows the maximal pattern computed for the query json.dump from two simplified parse trees. The nodes in the pattern are highlighted in green.

     if data:      with open(self.output_path, ’w’) as f:              json.dump(data, f)     Refer to caption

     print("Writing to %s." % json_path)  with open(json_path, ’w’) as f:     json.dump(scan_json_results, f)      Refer to caption

Figure 8. The maximal pattern computed for the query json.dump from two different simiplified parse trees. The nodes in the pattern are highlighted in green. The filler code is highlighted in blue.
  Phase 1:
  Input: a set of simplified parse trees TT
  Input: the query tree qq
  TcT_{c}\leftarrow {tTtt\in T\mid t contains qq as a subtree}
  tcqt_{c}\leftarrow q
  while no. of nodes in tcγt_{c}\leq\gamma and |Tc|>|T|α|T_{c}|>|T|*\alpha do
           if (l,i,n1,b)\exists(l,i,n_{1},b) and n2\exists n_{2} in nodes of tct_{c} such that Support(Extend(tc,l,i,n1,n2,bt_{c},l,i,n_{1},n_{2},b), TcT_{c}) \geq Support(Extend(tc,l,i,n1,n2,bt_{c},l^{\prime},i^{\prime},n_{1}^{\prime},n_{2}^{\prime},b), TcT_{c}) for all (l,i,n1)(l^{\prime},i^{\prime},n_{1}^{\prime}) and n2n_{2}^{\prime} in nodes of tct_{c} then
                    tct_{c}\leftarrow Extend(tc,l,i,n1,n2,bt_{c},l,i,n_{1},n_{2},b)
                    TcT_{c}\leftarrow {tTctt\in T_{c}\mid t contains tct_{c}}
           end if
  end while
  return tc,Tct_{c},T_{c}
Figure 9. Phase 1 algorithm.
  Extend(tc,l,i,n1,n2,b)(t_{c},l,i,n_{1},n_{2},b)
  Let tc=(N,L,E)t_{c}=(N,L,E)
  LL{n1l}L\leftarrow L\cup\{n_{1}\mapsto l\}
  NN{n1}N\leftarrow N\cup\{n_{1}\}
  if bb then
           EE{(n2,i)n1}E\leftarrow E\cup\{(n_{2},i)\mapsto n_{1}\}
  else
           EE{(n1,i)n2}E\leftarrow E\cup\{(n_{1},i)\mapsto n_{2}\}
  end if
  return tct_{c}
Figure 10. Extend(tc,l,i,n1,n2,b)(t_{c},l,i,n_{1},n_{2},b) adds the node n1n_{1} with label ll to the node n1n_{1} in tct_{c} and adds the edge E(n2,i)=n1E(n_{2},i)=n_{1} if bb, and E(n1,i)=n2E(n_{1},i)=n_{2} otherwise.
  Support(t,T)(t,T)
  return |{ttT and t contains t as a pattern}||\{t^{\prime}\mid t^{\prime}\in T\mbox{ and }t^{\prime}\mbox{ contains t as a pattern}\}|
Figure 11. Support(t,T)(t,T) computes the number of trees in TT that contain tt as a pattern.

Phase 2.

Once \aroma has computed a pattern contained in several trees, it tries to complete the pattern by adding the missing subtrees in the pattern. In \aroma our goal is to show a real code snippet instead of a synthetic one, because we have found that programmers feel more confident with real code snippets. This means that we need to pick a minimal subtree from the final set TcT_{c} such that the subtree contains the pattern. We focus on a few properties of the tree which makes the common usage example code snippet short yet common. First, the code snippet should be short. Second, it should have more commonality to the code snippets in TcT_{c}.

In order to find the best subtree that extends the pattern found in Phase 1, we assign a score to each subtree obtained from the trees in TcT_{c}. Let us call a subtree that can fill a missing subtree in the pattern a filler tree. Let us also call the location at which a filler subtree is missing in the pattern a hole. Therefore, a pattern has a fixed finite number of holes. For a given tree tt in TcT_{c} and a hole in the pattern, let ff be the filler that fills the hole in tt. The score of ff is then the number of trees in TcT_{c} where ff is the filler of the hole. If the filler has more than βt\beta_{t} of tokens (usually set to 5) or more than βc\beta_{c} characters (usually set to 50), we set the score of ff to 0. This ensures that the code snippets are concise. We take the sum of all the fillers of the pattern in tt to compute the score of tt. We then pick the tree in TcT_{c} which has the highest score and show its minimal subtree containing the pattern as the common usage example. \aroma reconstructs the example with an in-order traversal of the subtree. Figure 8 shows the filler code computed for the query json.dump. The nodes in the filler code are highlighted in blue.

4.3. Creating Multiple Common Usage Examples

The algorithm above describes the process of generating a single common usage example for an API method. However, in many scenarios a user may be interested in multiple yet diverse usage examples. \aroma generates distinct usage examples for a single query as follows. \aroma generates the first common usage example using the regular \aroma algorithm described above—however, \aroma maintains a set of all the nodes added to the pattern in Phase 1. Let us call this set used_nodes. \aroma then saves that pattern, and begins the example generation again with the same initial set of trees. However, if at any point the second most common adjacent node is not in used_nodes and has at least half as many occurrences as the most common adjacent node, we add that node to the pattern instead. We then finish the example generation as normal.

\aroma

repeats this process nn times to create nn distinct usage examples. In the \aroma interface, we display the top three usage examples.

5. Evaluation

We designed the following experiments to evaluate \aroma. In each experiment, we compared a code example of an API method generated by \aroma against a randomly selected code snippet containing the method from the code corpus. The random example displays the line of code where the method is called, as well as the two lines of code preceding and following the method call. This random example serves as a reasonable stand-in for an arbitrary code search result, as code search engines typically display 2-4 lines of additional context by default. We use this as a comparison point since code search is the de facto way developers learn APIs in real-world programming workflows—especially for proprietary APIs where no hand-written examples exist (Brandt et al., 2010; Sadowski et al., 2015).

We aim to answer the following research questions:

  1. RQ1.

    Do developers prefer \aroma code examples to code search results?

  2. RQ2.

    How does \aroma perform against comparable tools on several quantitative metrics measuring code example quality?

  3. RQ3.

    If \aroma is made accessible, will developers incorporate \aroma code examples into their workflows?

5.1. RQ1: Survey with Facebook Developers

We first conducted a survey to measure the quality of code examples generated by \aroma, compared with code search results. The survey first displayed six common Python libraries, and asked participants to select two libraries they were most familiar with. The survey then showed ten API methods in each of the two selected libraries. For each method, two code examples were listed: the top-ranked example generated by \aroma (Option A) and a random example from the code search result (Option B). Participants were asked three questions about these two kinds of examples. Table 2 shows the questions in the survey.

We sent out the survey to 21 Facebook developers. 18 developers completed the survey (86% response rate). Overall, the examples generated by \aroma were preferred over the random examples 97% of the time. In addition, 66% of participants agreed that it is helpful to see the number of code examples that follow the same API usage pattern. 100% of participants agreed that it is helpful to color-code and distinguish code parts that are commonly shared among many examples. When asked to describe what they liked and disliked about the two kinds of examples, participants expressed a markedly positive sentiment towards \aroma: one said, “the usage count is super useful especially to make sure that the code you are looking at is consistent with the rest of the codebase.” Another participant said, “I think that the formatting (color) makes it easier to quickly compare a few examples…and find the most relevant example for your use case.

Table 2. Questions asked in the Facebook survey.
1. Suppose you were learning to use this library. Which code examples would you prefer to see? Select one: Strongly prefer A Prefer A Somewhat prefer A Somewhat prefer B Prefer B Strongly prefer B
2. To what extent do you agree with this statement: It is helpful to see the count of methods that contain a common usage pattern (e.g. “Common usage pattern found in 120 out of 2000 methods”). Strongly agree Agree Somewhat agree Somewhat disagree Disagree Strongly disagree
3. To what extent do you agree with this statement: It is helpful for a code example to be formatted so I can see what is common and what is unique to a specific use case (e.g. common part in black, unique part in gray). Strongly agree Agree Somewhat agree Somewhat disagree Disagree Strongly disagree

5.2. RQ2: Quantitative Evaluation with Metrics

In addition to the qualitative survey with real developers, we conducted a quantitative analysis of the quality of examples generated by \aroma. We defined several metrics to measure example quality:

  • Succinctness: How many lines of code are in the example?

  • Relevancy: How relevant is the surrounding code in the example w.r.t. understanding the usage of the queried API?

  • Representativeness: How frequently do other examples in the code corpus follow the same pattern in the example?

Succinctness is measured by counting the number of lines in an example. We did not count empty lines or code comments. Relevancy is measured as the ratio of relevant lines in an example to total lines. A relevant line is a line whose meaning and connection to the query method is clear without additional explanation or context. Figure 23 illustrates this metric by showing random code search results and \aroma examples for two methods, with relevant lines bolded and the query methods highlighted. In the code search example for np.array, the first line does not show how or why reshape is called, so this line is deemed irrelevant. Without additional context, we also do not know what discretize.EntropyMDL does, so lines 4 and 5 are not relevant. In the \aroma example for np.array, lines 1 and 2 show calls to np.array, while lines 3 and 4 show the returned values of np.array being passed to fit_transform – so all of these lines are relevant. In the code search example for pd.concat, it is not clear what df1 on lines 4 and 5 is used for, and how or if it pertains to pd.concat – so these two lines are irrelevant. In the \aroma example for pd.concat, lines 1 and 2 show the initialization of variables passed to pd.concat in line 3 – making all 3 lines relevant to understanding how pd.concat is used. Representativeness is measured by the ranking score that \aroma assigns to an example. Recall that this score is the sum of the number of occurrences of each filler option in the example. In this way, this score reflects how representative this example is of a common use of the query method. To measure the representativeness of the comparison baseline (i.e., code examples randomly selected from the original search result), we first check whether the method containing this random example is one of the methods containing the \aroma common usage pattern. If it is, we take that method’s ranking score as the representativeness score. Otherwise, we assign the random example a representativeness score of 0. Notice that relevancy is measured by manually assessing the code snippets, while succinctness and representativeness are computed automatically.

Code search examples

     ).reshape((100, 1))  Y = np.array([0] * 25 + [1] * 75)  table = data.Table.from_numpy(None, X, Y)  disc = discretize.EntropyMDL()  dvar = disc(table, table.domain[0])     

     Ozone1 = pd.concat([df.Ozone] * K)  print(Time1.shape, Ozone1.shape,          Time1.describe(), Ozone1.describe())  df1 = pd.DataFrame();  df1[’Time’] = Time1.values;     \aroma examples

     X = np.array([’a’, ’b’, ’c’])  y = np.array([1, 0, 1])  out = encoders.JamesSteinEncoder(model=’binary’)                  .fit_transform(X, y)     

     sparse1 = pd.SparseSeries(val1, name=’x’)  sparse2 = pd.SparseSeries(val2, name=’y’)  res = pd.concat([sparse1, sparse2], axis=1)     

Figure 12. Random code search examples and \aroma examples for several methods, with relevant lines bolded.2323endnote: 23These code snippets have been adapted from https://github.com/renn0xtek9/Arithmos/blob/799fe071ab3a85ea9a0f86b8099548f11be96841/Arithmos/tests/test_discretize.py#L111, and https://github.com/antoinecarme/pyaf/blob/master/tests/perf/test_ozone_long_series.py#L23. Accessed in March 2020.

For this experiment, we considered four popular Python libraries: Pandas, os, Numpy, and TensorFlow. For each library, we selected the ten most used methods in GitHub – forty methods total. We compared average succinctness, relevancy, and representativeness of the top \aroma example, a random code search result, and the top example from ProgramCreek (cre, 2020). ProgramCreek is a website where users can query a Python library method and see functions from open source GitHub projects that call that method. Users vote on which functions represent the best example of a method. The top ProgramCreek example was taken to be the method with the most upvotes. Since the example was a complete method, relevancy was not a meaningful measurement; however, we were still able to measure length.

Table 3 shows the quality of code examples generated by \aroma, randomly selected from code search results, and selected from ProgramCreek (cre, 2020). Compared with examples from \aroma and ProgramCreek, random examples contained many more irrelevant lines of code, as well as long, uninformative identifier names. Meanwhile, ProgramCreek examples were on average over five times longer than \aroma examples.

Table 3. The Quality of Code Examples for 40 Popular Methods in Python
Type of Example Length Relevancy Representativeness
\aroma 2.675 .996 59.6
Code Search 3.9 .640 .2
ProgramCreek 13.8

We also collected 100 Hack API methods that had been queried in Facebook’s code search website most frequently over a 30 day period. Hack is a programming language created by Facebook as a dialect of PHP (hac, 2020). These 100 Hack methods were queried an average of 8.6 times, ranging from 5 times to 26 times. For these 100 methods, we measured the average length and representativeness of \aroma examples and random code search results. Since these 100 methods are proprietary API methods in Facebook, we were not able to find curated examples from ProgramCreek. As a result, we are not able to compare \aroma with ProgramCreek. Table 4 shows the quality of code examples generated by \aroma and randomly selected from code search results. Similar to the results on open-source libraries, examples generated by \aroma were significantly more concise and representative than examples selected from the original code search results.

Table 4. The Quality of Code Examples for 100 Internal Methods in Facebook
Type of Example Length Representativeness
\aroma 3.5 116.6
Code Search 4.6 2.1

5.3. RQ3: Live Usage in Facebook

We have integrated \aroma into Facebook’s internal code search website. When users query a method name in Hack or Python, the top \aroma example is displayed first, before the standard code search results. There is a link to the full contents of the file containing the code snippet used in the example, and a ”Show More Examples” button that displays two additional common usage patterns. \aroma only shows the three most common usage patterns, ensuring that its interface is compact and easy to use. \aroma’s integration into the code search platform was frictionless: developers began to use \aroma with no prior announcement or tutorial.

\aroma

is deployed on a dedicated set of servers to respond to queries from developers. Our search server has 24 cores, and on average takes 1.0 seconds end-to-end to generate Python code examples for the queries used in Section 5.2. The median response time is .8 seconds and the maximum is 2.3 seconds. \aroma re-indexes the millions of methods in Facebook’s codebase daily. This indexing process works the same as in Aroma  (Luan et al., 2019). On a 24-core server, this process takes 20 minutes on average. If \aroma were to be deployed on a larger codebase, it would be possible to implement incremental indexing for only changed files. Since the goal of EG is to provide relevant and up-to-date usage examples, we show examples for only the most recently indexed version of the codebase, and we do not maintain past examples generated from prior versions.

We have been logging the usage of \aroma in the code search website. We log each time a user copies or selects code from an \aroma example, clicks the file link, or clicks the ”Show More Examples” button. Note that copying and selecting are the only events we log with a clear signal that the user actually reused code in the example.

Over a period of 24 days, from April 20 to May 13, \aroma was triggered to generate code examples for an average of 1,171 code search queries per day. Facebook developers interacted with \aroma examples an average of 59 times per day, and copied or selected code from an \aroma example an average of 30 times per day. While this appears to be a low ratio of interactions to total examples generated, there are several factors to keep in mind. First, developers do not always query method names because they want to see code examples—for instance, a developer may instead be looking for a specific file or class. We have no way to determine what a developer’s intentions are when they query a method name. Second, note that we integrated \aroma into the code search website without any public announcement. Therefore, Facebook developers may not even notice it among the other features of the code search website. The discoverability of \aroma is an orthogonal problem from its effectiveness, which we will investigate in the future. Finally, because a central feature of \aroma examples is succinctness, developers may be learning or ”mentally copying” from an example without physically interacting with it.

Despite these concerns, these results show that real developers indeed utilize \aroma in their workflows. A formal A/B test comparing \aroma examples against code search results remains as future work.

5.4. Discussion

Evaluating example generation is an interesting and complicated problem. We initially attempted to design a study wherein developers receive a comprehensive list of API usage questions. They are asked to answer these questions for one API using \aroma, and for another using a realistic baseline of code search. The problem with this approach is that \aroma’s focus is on providing a short list of idiomatic usage examples. \aroma makes no claim to offer the best or most informative usage example—but a succinct example representing a common usage pattern. Thus, we needed an evaluation that measured the benefit of seeing such a common usage pattern.

We next attempted to design a human study wherein participants complete short programming tasks using \aroma. A main challenge was devising a control to measure \aroma against. An obvious candidate is Facebook’s internal code search website. However, \aroma is not intended to replace code search altogether, but rather to be a complementary extension integrated into existing code search tools. Thus, it did not make sense to restrict the use of code search. However, Facebook employees are conditioned to use code search results as a go-to method for API inquiries, so even when the \aroma example contained salient, time-saving information, they often still wanted to page through code search results. Untangling what the user gained from \aroma versus what they gained from code search, or from documentation, proved difficult. In addition, success in solving a short programming task is extremely dependent on what background knowledge a developer has.

A tool like \aroma also runs the risk of identifying and perpetuating common anti-patterns. By indexing Facebook’s codebase, we ensure that all code has been reviewed by a developer – however, code can still become out of date or deprecated. A potential solution would be to only index code in a codebase written after a certain threshold date. Another would be to indicate to the user in the UI the date when the code displayed in an example was written. Exploring this issue further is left for future work.

6. Related Work

Developers often search for code in their own codebases or online to fulfill programming needs such as learning new APIs and locating code snippets with desired functionality (Sadowski et al., 2015; Sim et al., 2011; Brandt et al., 2009; Umarji et al., 2008; Montandon et al., 2013). For example, Sim et al. conducted a lab study with 36 graduate students to evaluate the effectiveness of different code retrieval techniques (Sim et al., 2011). In the demographic survey, 50% of participants reported to search code online frequently and 39% reported to search occasionally. Sadowski et al. analyzed the search logs generated by 27 Google developers over two weeks (Sadowski et al., 2015). They found that developers issued an average of 12 code search queries per weekday.

There is a large body of literature in code search (Holmes and Murphy, 2005; Mandelin et al., 2005; Stylos and Myers, 2006; Sahavechaphan and Claypool, 2006; Thummalapenta and Xie, 2007; Lemos et al., 2007; Kim et al., 2010; Brandt et al., 2010; Lazzarini Lemos et al., 2009; Reiss, 2009; Wang et al., 2010; McMillan et al., 2011, 2012; Kim et al., 2018; Gu et al., 2018; Sirres et al., 2018; Sivaraman et al., 2019; Yan et al., 2020). These techniques focus on 1) enriching search queries and 2) improving search algorithms. For example, beyond simple keyword descriptions, S6 (Reiss, 2009) and CodeGenie (Lemos et al., 2007) allow users to identify relevant code based on test cases. Prospector (Mandelin et al., 2005) supports expressing type constraints such as desired input and output types in a query. Code-to-code search tools such as FaCoY (Kim et al., 2018) take code fragments directly as input and identify other similar code. Wang et al. represented source code as a dependency graph to capture control-flow and data-flow dependencies in a program, and matched search queries against program dependence graphs (Wang et al., 2010). Gu et al. trained a neural network to predict relevant code examples given natural language queries (Gu et al., 2018).

Unlike our work, the aforementioned techniques provide limited support for browsing and assessing code search results. Previous studies have shown that it is cognitively demanding to navigate through code search results (Duala-Ekoko and Robillard, 2012; Starke et al., 2009). As a result, developers often rapidly skim through a handful of search results and make a quick judgement about the quality of these results (Brandt et al., 2009). When browsing search results, they also often backtrack due to irrelevant or uninteresting information in search results (Duala-Ekoko and Robillard, 2012). More specifically, Starke et al. show that developers rarely look beyond five examples when searching for code examples (Starke et al., 2009). These observations indicate that the code exploration process is often limited to a few search results, leaving a large portion of foraged information unexplored.

Several approaches have been proposed to help developers navigate through code search results. To enable users to explore a large number of code examples simultaneously, Examplore constructs a code skeleton with statistical distributions of individual API usage features in those examples (Glassman et al., 2018). ALICE allows users to mark several search results as desired or undesired and then automatically filter the remaining search results, so users do not have to manually go through all of them (Sivaraman et al., 2019). eXoaDocs employs program slicing to remove extraneous statements in a code example and then clusters sliced code examples based on the similarity of semantic characteristics such as invoked API methods in an example (Kim et al., 2010). Buse and Weimer improved eXoaDocs by synthesizing a single concise code example to summarize similar examples in a cluster (Buse and Weimer, 2012).

Our approach differs from these techniques in several perspectives. While the tools described above rely on the syntax and semantics of the Java language, \aroma is language agnostic, requiring only a parser for the target language. Examplore requires a pre-defined API usage skeleton to register and align code examples, while \aroma does not require a pre-defined skeleton. Buse and Weimer’s tool generates usage examples for a target class, while \aroma generates usage examples for API or library methods. Finally, to our knowledge, \aroma is the only tool designed to generate common usage patterns of APIs that has been integrated into the code search platform of a large software company, and is used by developers daily.

7. Conclusion

We presented \aroma, a new tool for generating usage example for API methods. \aroma works by first indexing a large code corpus. Given a query method, it assembles a list of method bodies in the corpus containing that method, then finds the maximal subtree that contains the query API and is part of a meaningful proportion of methods. \aroma then reconstructs this subtree into a succinct, relevant and representative code example.

To evaluate \aroma, we indexed a code corpus of 1.9 million Python methods, and designed a survey where we showed developers pairs of \aroma examples and code search results for commonly used methods in popular Python libraries. We observed that developers preferred \aroma examples to code search results 97% of the time, and that 100% of developers agreed that the color-coding of the common usage pattern in \aroma examples is helpful. Further, we defined several metrics to measure example quality, and quantitatively compared \aroma examples against code search results and ProgramCreek examples using these metrics. We found that across all metrics, \aroma performs better than these alternatives. Finally, we integrated \aroma into Facebook’s internal code search website. A log of developers’ activities shows that developers indeed interact with \aroma examples.

References

  • (1)
  • hac (2020) 2020. Hack: Programming Productivity Without Breaking Things. https://hacklang.org/.
  • cre (2020) 2020. Program Creek. https://www.programcreek.com/.
  • Brandt et al. (2010) Joel Brandt, Mira Dontcheva, Marcos Weskamp, and Scott R Klemmer. 2010. Example-centric programming: integrating web search into the development environment. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 513–522.
  • Brandt et al. (2009) Joel Brandt, Philip J Guo, Joel Lewenstein, Mira Dontcheva, and Scott R Klemmer. 2009. Two studies of opportunistic programming: interleaving web foraging, learning, and writing code. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 1589–1598.
  • Buse and Weimer (2012) Raymond PL Buse and Westley Weimer. 2012. Synthesizing API usage examples. In Software Engineering (ICSE), 2012 34th International Conference on. IEEE, 782–792.
  • Duala-Ekoko and Robillard (2012) Ekwa Duala-Ekoko and Martin P Robillard. 2012. Asking and answering questions about unfamiliar APIs: An exploratory study. In Proceedings of the 34th International Conference on Software Engineering. IEEE Press, 266–276.
  • Fischer et al. (2017) Felix Fischer, Konstantin Böttinger, Huang Xiao, Christian Stransky, Yasemin Acar, Michael Backes, and Sascha Fahl. 2017. Stack overflow considered harmful? the impact of copy&paste on android application security. In 2017 IEEE Symposium on Security and Privacy (SP). IEEE, 121–136.
  • Glassman et al. (2018) Elena L Glassman, Tianyi Zhang, Björn Hartmann, and Miryung Kim. 2018. Visualizing api usage examples at scale. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1–12.
  • Gu et al. (2018) Xiaodong Gu, Hongyu Zhang, and Sunghun Kim. 2018. Deep code search. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE). IEEE, 933–944.
  • Holmes and Murphy (2005) Reid Holmes and Gail C. Murphy. 2005. Using structural context to recommend source code examples. In ICSE ’05: Proceedings of the 27th International Conference on Software Engineering (St. Louis, MO, USA). ACM Press, New York, NY, USA, 117–125. https://doi.org/10.1145/1062455.1062491
  • Katirtzis et al. (2018) Nikolaos Katirtzis, Themistoklis Diamantopoulos, and Charles Sutton. 2018. Summarizing Software API Usage Examples Using Clustering Techniques. In Fundamental Approaches to Software Engineering, Alessandra Russo and Andy Schürr (Eds.). Springer International Publishing, Cham, 189–206.
  • Kim et al. (2010) Jinhan Kim, Sanghoon Lee, Seung-won Hwang, and Sunghun Kim. 2010. Towards an intelligent code search engine. In Twenty-Fourth AAAI Conference on Artificial Intelligence.
  • Kim et al. (2018) Kisub Kim, Dongsun Kim, Tegawendé F Bissyandé, Eunjong Choi, Li Li, Jacques Klein, and Yves Le Traon. 2018. FaCoY: a code-to-code search engine. In Proceedings of the 40th International Conference on Software Engineering. ACM, 946–957.
  • Lazzarini Lemos et al. (2009) Otávio Augusto Lazzarini Lemos, Sushil Bajracharya, Joel Ossher, Paulo Cesar Masiero, and Cristina Lopes. 2009. Applying test-driven code search to the reuse of auxiliary functionality. In Proceedings of the 2009 ACM symposium on Applied Computing. ACM, 476–482.
  • Lemos et al. (2007) Otávio Augusto Lazzarini Lemos, Sushil Krishna Bajracharya, Joel Ossher, Ricardo Santos Morla, Paulo Cesar Masiero, Pierre Baldi, and Cristina Videira Lopes. 2007. CodeGenie: using test-cases to search and reuse source code. In Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering. 525–526.
  • Luan et al. (2019) Sifei Luan, Di Yang, Celeste Barnaby, Koushik Sen, and Satish Chandra. 2019. Aroma: Code Recommendation via Structural Code Search. Proc. ACM Program. Lang. 3, OOPSLA, Article 152, 28 pages. https://doi.org/10.1145/3360578
  • Maalej and Robillard (2013) Walid Maalej and Martin P Robillard. 2013. Patterns of knowledge in API reference documentation. IEEE Transactions on Software Engineering 39, 9 (2013), 1264–1282.
  • Mandelin et al. (2005) David Mandelin, Lin Xu, Rastislav Bodík, and Doug Kimelman. 2005. Jungloid mining: helping to navigate the API jungle. In PLDI ’05: Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation (Chicago, IL, USA). ACM, New York, NY, USA, 48–61. https://doi.org/10.1145/1065010.1065018
  • McMillan et al. (2012) Collin McMillan, Mark Grechanik, Denys Poshyvanyk, Chen Fu, and Qing Xie. 2012. Exemplar: A source code search engine for finding highly relevant applications. IEEE Transactions on Software Engineering 38, 5 (2012), 1069–1087.
  • McMillan et al. (2011) Collin McMillan, Mark Grechanik, Denys Poshyvanyk, Qing Xie, and Chen Fu. 2011. Portfolio: finding relevant functions and their usage. In Software Engineering (ICSE), 2011 33rd International Conference on. IEEE, 111–120.
  • Montandon et al. (2013) João Eduardo Montandon, Hudson Borges, Daniel Felix, and Marco Tulio Valente. 2013. Documenting apis with examples: Lessons learned with the apiminer platform. In Reverse Engineering (WCRE), 2013 20th Working Conference on. IEEE, 401–408.
  • Nasehi et al. (2012) Seyed Mehdi Nasehi, Jonathan Sillito, Frank Maurer, and Chris Burns. 2012. What makes a good code example?: A study of programming Q&A in StackOverflow. In 2012 28th IEEE International Conference on Software Maintenance (ICSM). IEEE, 25–34.
  • Orgill (2012) MaryKay Orgill. 2012. Variation Theory. Springer US, Boston, MA, 3391–3393. https://doi.org/10.1007/978-1-4419-1428-6_272
  • Reiss (2009) Steven P Reiss. 2009. Semantics-based code search. In Proceedings of the 31st International Conference on Software Engineering. IEEE Computer Society, 243–253.
  • Robillard (2009) Martin P Robillard. 2009. What makes APIs hard to learn? Answers from developers. IEEE software 26, 6 (2009), 27–34.
  • Sadowski et al. (2015) Caitlin Sadowski, Kathryn T Stolee, and Sebastian Elbaum. 2015. How developers search for code: a case study. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. 191–201.
  • Sahavechaphan and Claypool (2006) Naiyana Sahavechaphan and Kajal Claypool. 2006. Xsnippet: mining for sample code. ACM Sigplan Notices 41, 10 (2006), 413–430.
  • Sim et al. (2011) Susan Elliott Sim, Medha Umarji, Sukanya Ratanotayanon, and Cristina V Lopes. 2011. How well do search engines support code retrieval on the web? ACM Transactions on Software Engineering and Methodology (TOSEM) 21, 1 (2011), 1–25.
  • Sirres et al. (2018) Raphael Sirres, Tegawendé F Bissyandé, Dongsun Kim, David Lo, Jacques Klein, Kisub Kim, and Yves Le Traon. 2018. Augmenting and structuring user queries to support efficient free-form code search. Empirical Software Engineering 23, 5 (2018), 2622–2654.
  • Sivaraman et al. (2019) Aishwarya Sivaraman, Tianyi Zhang, Guy Van den Broeck, and Miryung Kim. 2019. Active inductive logic programming for code search. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 292–303.
  • Starke et al. (2009) Jamie Starke, Chris Luce, and Jonathan Sillito. 2009. Working with search results. In Proceedings of the 2009 ICSE Workshop on Search-Driven Development-Users, Infrastructure, Tools and Evaluation. IEEE Computer Society, 53–56.
  • Stylos and Myers (2006) Jeffrey Stylos and Brad A Myers. 2006. Mica: A web-search tool for finding api components and examples. In Visual Languages and Human-Centric Computing, 2006. VL/HCC 2006. IEEE Symposium on. IEEE, 195–202.
  • Thummalapenta and Xie (2007) Suresh Thummalapenta and Tao Xie. 2007. Parseweb: a programmer assistant for reusing open source code on the web. In Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering. ACM, 204–213.
  • Umarji et al. (2008) Medha Umarji, Susan Elliott Sim, and Crista Lopes. 2008. Archetypal internet-scale source code searching. In IFIP International Conference on Open Source Systems. Springer, 257–263.
  • Wang et al. (2010) Xiaoyin Wang, David Lo, Jiefeng Cheng, Lu Zhang, Hong Mei, and Jeffrey Xu Yu. 2010. Matching dependence-related queries in the system dependence graph. In Proceedings of the IEEE/ACM International Conference on Automated software engineering (Antwerp, Belgium) (ASE ’10). ACM, New York, NY, USA, 457–466. https://doi.org/10.1145/1858996.1859091
  • Yan et al. (2020) Shuhan Yan, Hang Yu, Yuting Chen, Beijun Shen, and Lingxiao Jiang. 2020. Are the Code Snippets What We Are Searching for? A Benchmark and an Empirical Study on Code Search with Natural-Language Queries. In 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 344–354.
  • Zhang et al. (2018) Tianyi Zhang, Ganesha Upadhyaya, Anastasia Reinhardt, Hridesh Rajan, and Miryung Kim. 2018. Are code examples on an online Q&A forum reliable?: a study of API misuse on stack overflow. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE). IEEE, 886–896.
  • Zhang et al. (2019) Tianyi Zhang, Di Yang, Crista Lopes, and Miryung Kim. 2019. Analyzing and supporting adaptation of online code examples. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 316–327.
  • Zhou and Walker (2016) Jing Zhou and Robert J Walker. 2016. API deprecation: a retrospective analysis and detection method for code examples on the web. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 266–277.