Generating Examples From CLI Usage: Can Transformers Help?

Roshanak Zilouchian rozilouc@microsoft.com MicrosoftRedmondWashingtonUSA , Spandan Garg^∗ spgarg@microsoft.com MicrosoftRedmondWashingtonUSA , Colin B. Clement^∗ coclemen@microsoft.com MicrosoftRedmondWashingtonUSA , Yevhen Mohylevskyy yemohyle@microsoft.com MicrosoftRedmondWashingtonUSA and Neel Sundaresan neels@microsoft.com MicrosoftRedmondWashingtonUSA

Abstract.

Continuous evolution in modern software often causes documentation, tutorials, and examples to be out of sync with changing interfaces and frameworks. Relying on outdated documentation and examples can lead programs to fail or be less efficient or even less secure. In response, programmers need to regularly turn to other resources on the web such as StackOverflow for examples to guide them in writing software. We recognize that this inconvenient, error-prone, and expensive process can be improved by using machine learning applied to software usage data. In this paper, we present our practical system which uses machine learning on large-scale telemetry data and documentation corpora, generating appropriate and complex examples that can be used to improve documentation. We discuss both feature-based and transformer-based machine learning approaches and demonstrate that our system achieves 100% coverage for the used functionalities in the product, providing up-to-date examples upon every release and reduces the numbers of PRs submitted by software owners writing and editing documentation by $>$ 68%. We also share valuable lessons learnt during the 3 years that our production quality system has been deployed for Azure Cloud Command Line Interface (Azure CLI).

Example Generation, Transformers, Software Documentation

^†^†ccs: Computer systems organization Embedded systems^†^†ccs: Computer systems organization Redundancy^†^†ccs: Computer systems organization Robotics^†^†ccs: Networks Network reliability

1. Introduction

Modern software development involves continuous integration, deployment, and rapid releases. New frameworks, libraries, and APIs are created and the existing ones keep improving and changing. This rapid and constant change often presents a steep learning curve to developers. In many cases, spending time and effort to become proficient in using a library or API is not even productive as it may only be used a few times. Instead the most efficient way to guide developers is by providing code examples demonstrating how to use new APIs or interact with new frameworks (Andrew J. Ko et al., 2004; Forward and Lethbridge, 2002). An extensive survey of software developers has identified use of up-to-date examples as one of the most important factors in useful documentation (Forward and Lethbridge, 2002). However, documentation and code examples are usually added only as an afterthought to comply with regulations, often rendering them out of sync or incomplete (Parnas and Clements, 1986; Robillard, 2009). Even when they exist, the documentation content and code examples are not updated in a timely manner (Lethbridge et al., 2003). Therefore, insufficient quantity and variation (Robillard, 2009) in examples and incorrect examples (Aghajani et al., 2020, 2019) remain to be the major obstacles for developers learning to use an API.

Code examples shared in Blogs, Wikis and Q&A sites have emerged as an alternative to supporting official documentation (Pagano and Maalej, 2011; Mamykina et al., 2011). However, such advice can go out of date in a matter of weeks. Further, when mining an enormous number of blogs and online articles, finding the most current or relevant examples can be difficult (Robillard, 2009). Additionally, blog articles or discussions on Q&A sites are not officially maintained by the software owners and the examples may be of poor quality (Nasehi et al., 2012).

Knowledge discovery tools can address these challenges to some extent. Knowledge discovery tools provide recommendations in the form of code samples or artifacts (Nykaza et al., 2002; McLellan et al., 1998). However, they cannot offer help for uncommon code frameworks or when samples are not present, limiting their use as alternatives for missing documentation. To tackle these challenges, another line of research has emerged to augment documentation with synthesized examples (Kim et al., 2009; Montandon et al., 2013; Mar et al., 2011). Our work extends this line of prior work by generating up-to-date examples from usage data and other external sources of information and automatically inserting them into the official documentation.

Our example generation framework automatically creates and updates examples in software documentation upon every release. The examples generated by our platform have following qualities:

•

Up-to-date examples Our platform utilizes usage telemetry to generate new examples at every release cycle of a product, ensuring the examples are always up-to-date.
•

Representative of actual usage Unlike bare-bones examples usually found in documentations that only cover basic scenarios, our examples are based on usage telemetry and, therefore, represent how current users use the software in practice.
•

Covering all used functionalities Our automatically generated examples cover all used functionalities of the software, in contrast to human written examples which are usually provided for a few important functionalities.

Our example generation framework consists of two steps: (i) Identifying successful scenarios to build example templates based on prior user successes, and (ii) Translating the templates to human readable examples. For the second step, we experimented with a feature-based parameter type prediction model and a transformer-based neural parameter value generation model. We discuss the benefits and challenges of each model in a production environment.

Our example generation system has been deployed for Azure Command Line Interface (Azure CLI), a large scale, open-source cloud developer command line environment. Our comparative study between our generated examples and the human written examples by software owners showed that our examples can help developers by covering all active features with a higher quality than the software owner’s examples. In addition, we found that our example generation pipeline was able to reduce the number of PRs submitted by software owners to write or edit documentation by $>$ 68%.

In this paper we make the following contributions:

(1)

we present a production-quality example generation platform which can generate up-to-date examples that cover all used functionalities,
(2)

discuss the benefits and challenges of a neural model and a feature-based model in a production environment,
(3)

share lessons learned from the deployment of our example generation platform in production.

2. Related Work

Prior work has tackled the problems posed by rapidly changing APIs and frameworks in software development (Robins et al., 2003) in different ways: crowd-sourced documentation, augmenting documentation with examples, and knowledge discovery tools.

2.1. Crowd-Sourced Documentation

As the leading way to learn about new features and APIs, web search enables developers to discover socially-mediated sources of information in addition to official documentation. Blogs, wikis and Q&A sites are commonly used to complement the official documentation. A study of Google search results on jQuery API showed that at least one blog post and StackOverflow question appear on the first page of the search results for 84% of methods in jQuery (Parnin and Treude, 2011).

However, it is not clear whether some of these additional sources will resolve staleness or the lack of examples in official documentations. For example, a study on blogging behaviors of developers has revealed that only 1.8% of relevant blog posts contain source code (Pagano and Maalej, 2011). This means that developers use blogs mainly to communicate and coordinate functional requirements as opposed to documenting code. Similarly, studies of Q&A websites such as StackOverflow have shown some software tools or APIs may not get enough coverage on StackOverflow (Treude and Grammel, 2012). Even for popular software tools, the coverage accumulates very slowly. For instance, for Android API classes the coverage after one year was only 30% (Treude and Grammel, 2012). This coverage is much worse in specialized software tools. Also, even questions posted to StackOverflow for popular software systems are usually answered by a small group of experts; such experts are hard to find for systems with smaller communities. Failure to find experts has been identified as one of the key reasons for unanswered questions on StackOverflow (Asaduzzaman et al., 2013). Our work fills the coverage and staleness gap in documentation by generating up-to-date examples based on usage for all of used commands and APIs.

2.2. Augmenting Documentation with Examples

Prior research has identified examples as a key learning resource in software development (Nykaza et al., 2002; McLellan et al., 1998; Holmes et al., 2009). Kim et al. (2009) proposes a technique to extract code examples and integrate the examples into API documentation. Montandon et al. (2013) describes APIMiner, a platform which extracts code examples from software repositories and instruments the standard Java API documentation with code examples. PorpER-Doc is another tool which accepts queries from API developers and suggests proper code examples for documentation purposes (Mar et al., 2011). Buse and Weimer (2012) presents a technique for automatically synthesizing human-readable API usage examples. Our work extends these works by generating examples from usage data and mining public resources, automatically inserting the examples into official documentation.

2.3. Knowledge Discovery Tools

Knowledge discovery tools can come to the rescue when there are stale examples in API and framework documentation. For instance, eMoose highlights rules or caveats of API calls in the documentation (Dekel and Herbsleb, 2009). XSnippet uses the code context such as types of methods and variables to locate sample code for object instantiation (Sahavechaphan and Claypool, 2006). Similarly, PARSEWeb (Thummalapenta and Xie, 2007) and Prospector (Mandelin et al., 2005) are also designed to provide examples of object instantiation to help developers navigate complex APIs. While clearly filling a niche, these tools have been found to be limited in their scope: they cannot offer help when code samples are not present or certain API calls have not been widely used. Our work ameliorates this limitation by creating high quality examples demonstrating how to use a tool or framework from previously successful usages.

3. Azure CLI

While our example generation platform can be leveraged for any application where usage data is available, for the purpose of this paper, we will specifically target a popular Command Line Interface (CLI) that is used to interact with the Microsoft Azure cloud platform, referred to as Azure CLI in this paper. Figure 1 shows an example of an Azure CLI command which creates a virtual machine.

⬇

az vm create –image UbuntuLTS –admin-username azureuser

–name MyVM –ssh-key-value ~/.ssh/id_rsa.pub

–resource-group MyResourceGroup –location westeurope

Figure 1. An example of an Azure CLI command for creating an Ubuntu virtual machine on Azure. All Azure CLI commands begin with ‘az’ followed by a command name. A command may be followed by a set of parameters and corresponding values, where parameters begin with ‘–’.

Each Azure CLI command consists of a command name (e.g. az vm create}) and a set of parameters names which usually start with a \mintinlinebash– flag (e.g. --image}) and are followed by a parameter value (e.g. \mintinlinebashUbuntuLTS). Overall, Azure CLI has more than 3600 commands. On a monthly basis, users run millions of commands to create, delete, and manage resources on Azure. While many of these commands run successfully, failures are quite common. A command may fail for various reasons like incorrect parameter combinations, errors in parameter names or parameter values, wrong assumptions about the state of a resource, or even service problems. In Azure CLI, user faults which includes wrong parameters combinations and errors in parameter names or values can account for up to 22% of command failures. These errors occur mainly due to lack of documentation and examples covering various parameter combinations. For instance, each Azure CLI command has at most 76 parameters and on average 10 parameters. The average number of parameters specified by users is 4, while the average number of parameters in the examples provided in the official documentation is 1. Therefore, the examples provided in the documentation likely do not fully capture the way Azure CLI is used in practice. This potential gap between official documentation and the actual usage of Azure CLI will only grow larger in time, and can cost both companies and customers a significant amount in wasted time and resources.

4. Example Template Generation

Our example generation framework consists of two steps: (i) identifying successful scenarios to build example templates based on prior user successes, and (ii) translating templates into human readable examples. Figure 2 shows an overview of our pipeline.

Refer to caption — Figure 2. Overview of our example generation framework. We use product usage telemetry to generate example templates. We then collect relevant examples from various web sources and use them to train models that can find or generate the best parameter values for each parameter. Finally, the parameter values are added to the template giving us the resulting examples.

In order to identify successful scenarios, we analyze the usage telemetry of Azure CLI. This telemetry data includes the CLI commands, a list of parameters used with each command, and whether the execution of the command was successful or not. Keeping customer privacy in mind, the usage telemetry data does not include the concrete parameter values, preventing potentially private information like user-name or email addresses from leaking into the machine learning model and possibly into the examples. For each upcoming release of Azure CLI, we collect around 3.20 billion successful commands which were executed for the last three months prior to the release. We then remove the commands corresponding to the old version and all the help calls, which did not result in an actual command execution from the data. This leaves us with $\sim$ 3.19 billion successful command and parameter set pairs. We then sort the unique command and parameter set pairs based on frequency of unique users. Going through the list of all parameter sets for all commands, we then take the top three most frequent parameter sets for each command to build up to three example templates. Since we do not have the values of parameters in the usage telemetry, we use a placeholder value based on the parameter name in the generated templates (e.g. <image>} for a parameter named \mintinlinebash–image). Figure 4 shows an example of a template generated for the virtual machine (VM) creation command with placeholders.

Category	Frequency
String	5228
Enum	713
Integer	273
GUID	246
Folder/File Path	241
Command Specific/Unknown	201
IP-Address	196
URL/E-Mail	166
Build Info	131
Quoted Strings	125
Version	45
Time/Duration	23
Keys/Tokens	14
Int With Specific Format	6
Permission Formats	5

	Precision	Recall	F-1 Score	Support
String	1.00	0.86	0.92	5228
Non-String	0.76	1.00	0.86	2385
Weighted Avg.	0.92	0.90	0.90	7613

-h in the command line (fig. 4). Alternatively, they can view the examples on the online reference documentation (fig.8). To evaluate the effectiveness of our example generation platform in action, we examined the coverage and quality of the live examples.

Generating Examples From CLI Usage: Can Transformers Help?

Abstract.

1. Introduction

2. Related Work

2.1. Crowd-Sourced Documentation

2.2. Augmenting Documentation with Examples

2.3. Knowledge Discovery Tools

3. Azure CLI

4. Example Template Generation

5. Parameter value Generation

5.1. Data Collection

5.2. Feature-based Parameter Type Prediction Model

5.2.1. Feature Embeddings

5.2.2. Classifier

5.2.3. Results

5.2.4. Parameter Value Lookup

5.3. Neural Parameter Value Generation Model

5.3.1. Pretraining

5.3.2. Fine-Tuning

5.3.3. Data Augmentation

6. Experiments

6.1. Experiment 1: Comparing neural approaches

6.2. Experiment 2: Comparing neural and feature-based models

7. Deploying in Production

7.1. Coverage of Examples

7.2. Quality of Examples

7.3. Software Owners’ Workflow

8. Lessons Learned and Ongoing Work

9. Conclusion

References