# Building a Go Doctor Refactoring

22 Apr 2018

In this post, I’ll give an overview of how to create a new refactoring for the Go Doctor, which refactors Go source code. We’ll build a small command-line tool that adds a copyright header to a Go source file.

## Go Doctor

The Go Doctor is a refactoring tool for the Go programming language. It was designed to be easy to add new refactorings, and it’s equally easy to create new tools that perform source code modifications using the Go Doctor infrastructure. The Go Doctor source code is open source, and GoDoc is available for the Go Doctor API. This post provides some starter code illustrating how to start building a new refactoring.

To illustrate how a Go Doctor-based tool is built, we’ll create a tool that inserts a copyright header into a Go source file. To start, we’ll check out the source code for the tool, compile it, and run it. After you’re comfortable using the tool, we’ll look at the source code.

### Checking Out the Source Code and Installing the Tool

Make sure your GOPATH is set, then go get the source code and install the goaddcopyright binary:

$go get github.com/godoctor/godoctor$ go get github.com/joverbey/goaddcopyright
$cd$GOPATH/src/github.com/joverbey/goaddcopyright
$go install  ### Running the Tool The goaddcopyright tool uses the same command-line driver as the godoctor tool. Here, we’ll show some typical command lines. #### Show Usage The -help flag displays usage information. $ $GOPATH/bin/goaddcopyright -help Add Copyright Header Usage: goaddcopyright [<flag> ...] Each <flag> must be one of the following: -complete Output entire modified source files instead of displaying a diff -doc Output documentation (install, user, man, or vim) and exit -file Filename containing an element to refactor (default: stdin) -json Accept commands in OpenRefactory JSON protocol format -list List all refactorings and exit -pos Position of a syntax element to refactor (default: entire file) -scope Package name(s), or source file containing a program entrypoint -v Verbose: list affected files -vv Very verbose: list individual edits (implies -v) -w Modify source files on disk (write) instead of displaying a diff  #### Refactor “Hello World” Now, let’s use the tool to refactor a simple Go program. We’ll use one of the tool’s test cases to illustrate how it’s used. $ cd $GOPATH/src/github.com/joverbey/goaddcopyright/refactoring/testdata/addcopyright/001-helloworld$ cat main.go


You should see that main.go looks like this (ignore the strange comment for now):

package main

import "fmt"

fmt.Println("Hello, world!")
}


A minimal command line to refactor a file looks like this.

$goaddcopyright -file hello.go "My Name" Defaulting to package scope github.com/joverbey/goaddcopyright/refactoring/testdata/addcopyright/001-helloworld for refactoring (provide an explicit scope to change this) diff -u main.go main.go --- main.go +++ main.go @@ -1,3 +1,4 @@ +// Copyright 2018 My Name. All rights reserved. package main import "fmt"  The tool prints informational messages, errors, and warnings on standard error. If it can successfully refactor the file (i.e., without any errors), it will print a patch file (i.e., a unified diff) on standard output and exit with code 0. In the output above, the first line is a warning, printed to standard error, and the remaining lines are the patch file. (It is safe to ignore the “Defaulting to package scope” warning for a refactoring that only changes a single file, like our Add Copyright refactoring. Go Doctor’s Rename refactoring may change multiple files; this is when it is important to set the scope correctly. See the Go Doctor documentation for more details.) The command above printed a patch file, which is fine for previewing what the refactoring will do, but it didn’t actually change any source code. To do that, use the following commands: $ goaddcopyright -file main.go "My Name" > main.go.patch
$patch -p0 -i main.go.patch  Now, if you cat main.go, you’ll see that the copyright header has been added at the top of the file: // Copyright 2018 My Name. All rights reserved. package main import "fmt" func main() { //<<<<<addcopyright,5,1,5,1,Your Name Here,pass fmt.Println("Hello, world!") }  The file we just changed, main.go, is one of the tool’s test cases. We should undo the refactoring so we don’t break its tests! The refactoring can be undone by applying the patch in reverse (using the -R switch): $ patch -p0 -R -i main.go.patch
patching file main.go


If you don’t like using patches, you can run goaddcopyright with the -w switch to directly overwrite files. However, there is no easy way to undo this if something goes wrong, so make sure the project is under version control.

For previewing what the refactoring will do, you can also run goaddcopyright with the -complete switch to output the entire file after it has been refactored. Sometimes, this is easier to read than a patch file.

If the goaddcopyright command line seems cumbersome, realize that most people don’t refactor code using command line tools: They refactor from inside a text editor or IDE. The Go Doctor Vim plug-in allows you to refactor code (and undo refactorings!) from inside the Vim editor with just a few keystrokes. However, for the purposes of this post, we’ll focus exclusively on the command line.

#### When Refactoring Fails

To illustrate this, try the following commands:

$cd$GOPATH/src/github.com/joverbey/goaddcopyright/refactoring/testdata/addcopyright/003-error1
$cat main.go // This file already contains a Copyright comment. package main import "fmt" //<<<<<addcopyright,3,1,3,1,Your Name Here,fail func main() { fmt.Println("Hello, world!") }$ goaddcopyright -file main.go "My Name"
main.go:1:33: Error: An existing copyright was found.
$echo$?
3


Notice that the goaddcopyright tool exited with code 3 and did not output a patch.

## Overview of the Code

Now that we’ve seen what the goaddcopyright tool does, let’s look at its source code. The source tree looks like this:

$GOPATH/src/github.com/joverbey/goaddcopyright/ ├── main.go Command line driver └── refactoring/ ├── addcopyright.go Refactoring implementation ├── addcopyright_test.go Unit tests └── testdata/ └── addcopyright/ ├── 001-helloworld/ Test case │ ├── main.go │ └── main.golden ├── 002-noname/ Test case │ ├── main.go │ └── main.golden ├── 003-error1/ Test case │ └── main.go └── 004-error2/ Test case └── main.go  There are only three “interesting” files: main.go (the program entrypoint), addcopyright.go (the refactoring itself), and addcopyright_test.go (the unit tests). The rest of the files are in the testdata directory; as you saw above, these are tiny Go programs used by the unit tests. ## The Refactoring: addcopyright.go All of the interesting work is in addcopyright.go. Skim the entire file (80 lines), then we’ll describe all the pieces. Let’s start from the top. On line 15, we declare CurrentYear as a package-scope variable. Our refactoring inserts the current year into the copyright header, which is problematic for unit tests since we don’t want to update our tests every year. In the unit tests (refactoring_test.go), we set this variable to “YYYY” so all the tests use “YYYY” rather than the current year. On line 17, we declare a struct for our refactoring, and we embed refactoring.RefactoringBase. RefactoringBase provides most of the functionality we will need to refactor source code; we’ll discuss it more later. In the Go Doctor infrastructure, every refactoring must implement the Refactoring interface, which looks like this: We have defined methods on *AddCopyright so that it will implement this interface. The Description() method returns a Description of the refactoring. The Run() method takes a Config and returns a Result. The GoDoc for these objects contains all of the excruciating details, so we’ll focus on only the most important parts here. In the Description (lines 22-33), the Synopsis and Usage strings should be at most 50 characters long (to display properly in help messages). The comment —-+—-1—-+—-2–… is a reminder of the 50-character boundary. The Usage string should contain a string in angle brackets for each parameter. The Params describe what arguments the refactoring expects. For our Add Copyright refactoring, we expect exactly one argument: the name of the copyright holder (to be inserted into the comment). The Run method actually performs the refactoring. It receives a Config, which contains two particuarly useful fields: • Args. This refactoring receives exactly one argument (the copyright holder). The text of this argument – supplied by the user – will be in Args[0]. • Selection. The Add Copyright refactoring does not use this, but most refactorings require the user to select a region of text in a file before activating the refactoring. For example, Rename requires the user to select the identifier to rename, and Extract Function requires the user to select a sequence of statements to extract. The Selection field identifies the region of text selected by the user. The Run method returns a Result, which contains two things: • Log. If the refactoring needs to provide informational messages, warnings, or errors to the user, this is done by writing them to the Log. • Edits. Ultimately, a refactoring makes changes to the user’s source code. Edits is, essentially, a description of what changes are to be made. We will use both of these fields later. Now, let’s go through the Run method line by line. First, the Run method invokes Init. This does several things. For example: • It sets up an error/warning log (r.Log). • It validates arguments. In our Description object, we included one Param, indicating that our refactoring should receive exactly one argument. If the user supplied no arguments or more than one argument, the Init method will log an error to r.Log. • It parses the Go source code to be refactored. If the source code cannot be parsed, it will log an error to r.Log. When the Go source code is parsed, some semantic errors may be detected. For example, the code might reference a package that does not exist, or the user might have mistyped a variable name. Errors like this are logged to r.Log. ChangeInitialErrorsToWarnings changes them into warnings. For our refactoring, these are not a problem; we can safely add a copyright header even though the Go source code might have problems. More complex refactorings will leave them as errors and refuse to refactor the code, since it might be impossible to correctly analyze (and transform) the source code. The findInComments method is on lines 54-60, and logError is on lines 62-69. The findInComments method searches for the first comment containing the word “Copyright” and returns a *text.Extent describing its position, or nil if it was not found. An Extent is just an offset-length pair, where offset 0 denotes the first character of the file’s source code. In Go, strings are UTF-8 encoded. The Offset is a byte offset into the UTF-8 encoded source code. For example, consider the string “今日は” (kon’nichiwa, “hello” in Japanese). Each character is three bytes long, so the string is 9 bytes in total. If the first character (今) were at offset 0, then the suffix “は” would be described by text.Extent{6, 3}. Finally, we get to the meat of the refactoring. The addCopyright method (discussed momentarily) adds an edit to r.Edits. The call to FormatFileInEditor formats the resulting source code in the same way as the gofmt tool. ### Changing Source Code: The addCopyright Method The addCopyright method illustrates how a refactoring actually changes source code: • Create a text.Extent (offset-length pair) describing a range of text to replace. • Specify what string to replace it with. • Add an edit to r.Edits. Given the string “abcdef”, an edit with text.Extent{1, 3} would replace the substring “bcd”. To delete text, set the replacement string to the empty string. To insert text, create a text.Extent with a length of 0. For example, given the string “abcdef”, an edit at text.Extent{5, 0} would represent an insertion before the letter f. In our case, the extent to replace is text.Extent{0, 0} – an insertion at the beginning of the file. Perhaps the most surprising part of this code is that our refactoring does not directly change any source code! Instead, it builds a list of edits that describe what changes it wants to make. The list of edits is part of the Result object that is returned from the Run method. The command line driver decides what to do with this list of edits; it produces a patch file, outputs the modified source code, or overwrites the file on disk, depending on what flags were passed on the command line. ### Logging an Error Message: The logError Method The logError method (lines 62-69) adds an error message to the log by calling r.Log.Error. The log’s AssociatePos method takes two token.Pos arguments, which determine the file, line, and column to associate with the error message. The first three lines create these arguments from the text.Extent. Don’t worry too much about the details for now; it’s safe to treat that code as boilerplate. ## The Driver: main.go The command-line driver for our tool is simple. We add an AddCopyright struct to the refactoring engine, then run Go Doctor’s command line interface (CLI) driver. The first argument (“Add Copyright Header”) is the name of our tool (displayed in the -help output). In the call to AddRefactoring, the first argument (“addcopyright”) is a short name for the refactoring. This isn’t important for our tool, since it only has one refactoring. In contrast, the godoctor tool has five refactorings; their short names are shown in the first column when godoctor -list is run: $ godoctor -list
Refactoring     Description                                          Multifile?
--------------------------------------------------------------------------------
rename          Changes the name of an identifier                       true
extract         Extracts statements to a new function/method            false
var             Extracts an expression, assigning it to a variable      false
toggle          Toggles between a var declaration and := statement      false


When the refactoring engine has more than one refactoring, the user must supply this short name on the command line to indicate which refactoring to perform. For example:

$echo 'package main' | godoctor -pos 1,9:1,9 rename thisIsMyNewName Reading Go source code from standard input... Defaulting to file scope for refactoring (provide an explicit scope to change this) <stdin>:1:9: Error: The "main" function in the "main" package cannot be renamed: it will eliminate the program entrypoint  ## The Unit Tests: addcopyright_test.go The unit test driver adds our Add Copyright refactoring to the refactoring engine, then transfers control to a TestRefactorings function provided by the Go Doctor. To see what this does, let’s run the unit tests. $ cd $GOPATH/src/github.com/joverbey/goaddcopyright/refactoring$ go test
PASS


Remember the structure of our testdata directory?

\$GOPATH/src/github.com/joverbey/goaddcopyright/
└── refactoring/
└── testdata/
├── 001-helloworld/
│   ├── main.go
│   └── main.golden
├── 002-noname/
│   ├── main.go
│   └── main.golden
├── 003-error1/
│   └── main.go
└── 004-error2/
└── main.go


We invoked testutil.TestRefactorings(“testdata/”, t). The short name passed to AddRefactoring (at the start of our TestRefactorings function) was “addcopyright”. So, this function looks for the directory testdata/addcopyright. Each subdirectory of that directory is treated as a test case.

Each test case must contain at least one .go file with a comment like this:

//<<<<<addcopyright,5,1,5,1,Your Name Here,pass

• The next four numbers indicate what range of text to select. “1,2,3,4” would mean, “select line 1, column 2 through line 3, column 4”. Here, the first line of the file is line 1, and the first column is column 1. Our Add Copyright refactoring does not use the selection for anything, so it doesn’t really matter what selection we provide.
• If the refactoring takes arguments, those are next. Our Add Copyright refactoring takes one argument: the name of the copyright owner (to insert into the header comment). In this case, “Your Name Here” will be provided to the refactoring as this argument.
• The last field must be either “pass” or “fail”.
• If the last field is “fail”, the refactoring is expected to log at least one error. (This is the case where the goaddcopyright command line tool exited with code 3 earlier.)
• If the last field is “pass”, then the test case directory must contain a .golden file with the same name as the .go file being refactored. After the .go file has been refactored, its text must match the text of the .golden file exactly.

The last point is important: The refactored program must match the .golden file exactly. If there is an extra line at the end of the file, the unit test will fail. If the .golden file contains spaces but the refactoring produces tabs, the unit test will fail.

When creating a new test case, probably the easiest way to create a .golden file is to simply run the refactoring, visually inspect its output, and then save the result as a .golden file.

## What’s Missing: Abstract Syntax Trees and Static Analysis

This post discussed the basic structure of a Go Doctor refactoring. However, this is only the beginning. “Real” refactorings are more complex.

• The RefactoringBase contains a field, File, that provides an abstract syntax tree (AST) for the current Go file (as an ast.File object). Almost every refactoring begins by analyzing this AST. In fact, the majority of the work in most refactorings is in traversing and analyzing ASTs; creating edits is usually the easy part!
• A refactoring begins by checking preconditions. These check that (1) the input to the refactoring is valid, (2) the refactoring will not introduce errors into the refactored code, and (3) when the refactored code executes, it will have exactly the same behavior as the code before refactoring.
• If all preconditions are satisfied, edits are created describing what changes to make to the source code.
• Some refactorings perform a second set of checks, analyzing the code after the edits have been applied. This can detect whether compile errors have been introduced by the refactoring.

Interestingly, the second step – checking preconditions – is almost always the hardest part of developing a refactoring. The goal is to a produce a refactoring that will never introduce an error into the user’s source code. The list of preconditions for a production-quality refactoring can be painfully complex (see this example), but for someone interested in static analysis, designing new refactorings can be a source of very challenging problems. The Go Doctor includes control and data flow analysis to handle some of the more complex cases.

So, the Add Copyright refactoring is obviously much simpler than most refactorings. However, its main purpose was to serve as a template – to provide a useful skeleton for developing new Go Doctor refactorings.

## Exercises

1. In main.go’s main function, call engine.AddDefaultRefactorings() just after adding our refactoring. What does this do, and how does it affect how you invoke the goaddcopyright tool from the command line?

2. Change one of the .golden files in the test cases so that it is incorrect. What happens?

3. Choose one of the test cases. In the .go file, find the //<<<<<addcopyright comment and change “addcopyright” to something invalid. What happens?

4. Add two new test cases for the Add Copyright refactoring: one that should pass and another that should fail.

5. Currently, the Add Copyright refactoring raises an error if any comment contains the word “Copyright”. Change it to issue an error only if a comment is found with the exact copyright text that will be produced by the refactoring.

6. Modify the Add Copyright refactoring to insert the copyright comment at the end of the file, rather than the beginning.

7. Modify the Add Copyright refactoring to check if the file being refactored is in a Git repository, and if it is, insert a copyright header of the form “Copyright 2014-2018”, where “2014” is the year of the first commit with that file and “2018” is the current year.