Brvty

March 28, 2014

Brvty

Within the context of naming conventions, the Go “Style Guide” Code Review Comments document reveals this succinct postulate:

Familiarity admits brevity.

That convenient little sound-bite is at the heart of the philosophy shared by the creators of the Go programming language. As a result, it has been applied very liberally in the naming conventions adopted by the Go community at large.

Familiarity admits brevity.

Of those three words, I like two of them just fine, but one of them concerns me. I like “familiarity;” it’s a nice, comfortable word. “Brevity” is wonderful to contemplate because of how many things would be improved if only this word applied to them. But “admits”—well, that one isn’t so bad, I guess. What I’m uncomfortable with is the way this little word seems to glue the other two together without any qualifications whatsoever. In order to clarify my discomfort, I offer some pithy variations that all include those two “nice” words more accurately.

The common thread here is that familiarity can lead to a kind of selfish or reckless brevity, rather than selfless and deliberate clarity. With that in mind, read the original postulate again:

Familiarity admits brevity.

That statement just feels wrong to me, as if someone needed a good excuse to justify cutting corners and that statement proved clever-sounding enough to do the trick. Is brevity the end-all when writing a program? Of course not. Aside from correct functionality, the next most important objective is clarity of expression. This may sound strange, but “code” is a misnomer for what we create. After all, what is code really? A series of instructions intended for:

  1. A machine (compiler), which only cares about exact syntax and valid operations.
  2. Other programmers, who instinctively only care about concepts.

So our job as programmers is to satisfy both machines and humans with our code. If we are successful, the machine will execute the instructions to produce the expected results and humans who come later to read the code will understand why those results are correct (or incorrect). A programmer shouldn’t need to “decipher” the “code”, just to understand it. In the spirit of that statement, the word code has been emphasized throughout this article to remind you that we are really referring to well-written instructions, not a ciphertext that requires a key (familiarity) to decrypt.

Well-written instructions will reveal their intent as a result of merely being read. Indeed, brevity is an important factor where readability is concerned, but not in relation to names, which are abstractions over concepts. The closer the name used in code is to the name we use in our minds for the actual concept, the better. If I could rewrite the original statement with an orthogonal (and more important) end in mind, it would go like this:

Familiarity warrants clarity.

Familiarity is gained as a result of writing code in the first place because you have to know what it does to get the program to work (most of the time). But gaining familiarity with already-written code is made possible proportionate to the clarity with which that code was imbued by the author(s) who came before.

So, what does this mean in everyday terms? What does this have to do with our code? Let’s start with some concrete guidelines and progress to actual examples.

Ok, you’ve been really patient, so we’ll look at an actual code sample. This method comes from the bufio.Writer struct. It illustrates many of the points I’m trying to make. Your task is to look at the first line after the function signature in order to discover what the variable named nn represents. Ready, Go!

// WriteString writes a string.
// It returns the number of bytes written.
// If the count is less than len(s), it also returns an error explaining
// why the write is short.
func (b *Writer) WriteString(s string) (int, error) {
	nn := 0
	for len(s) > b.Available() && b.err == nil {
		n := copy(b.buf[b.n:], s)
		b.n += n
		nn += n
		s = s[n:]
		b.flush()
	}
	if b.err != nil {
		return nn, b.err
	}
	n := copy(b.buf[b.n:], s)
	b.n += n
	nn += n
	return nn, nil
}

Here’s roughly how it went for me:

Well, it’s declared and initialized with a value of 0 so it’s numeric. Ok, now I’m superficially scanning the method body (which gets long-ish) and I notice that both return statements yield nn as the first return value. Now back to the method signature where I’m reminded that this method is reminiscent of the io.Writer interface and I infer that nn must refer to the number of elements written to an internal buffer. (Hmm, do the elements in this case represent bytes or characters or something else? The answer can only be confirmed by checking the type of b.buf–wait what’s b refer to? Oh, it’s the *Writer itself. Wait, why not just call it w like all the gophers say to do… Oh, maybe it’s b because this is the "bufio" package. Yeah, b for "bufio" 2 . Got it. I wonder why they didn’t just call this package "io/buffered" 3 . Seems like a better place and name for it. Wait, where was I? Oh yeah, figuring out the nn variable…) Oh look, there’s another variable just called n…ooh, and there’s a b.n–lots of ns. Wait, focus! Back to the nn… I better check the godoc comment to see if nn really is just a counter of what was written to the buffer (I hope that godoc comment is up to date…).

Wow, that was quite a tangential reading. I paid a high price just to discover the purpose of a single, local variable. You may argue that I learned a lot along the way so the time wasn’t wasted but it was a disjointed and frustrating experience and my patience ran thin. It’s likely that the experience will be repeated if I ever revisit this code in the future.

Imagine if that variable had simply been named thus:

totalBytesWritten := 0

Problem solved. No other context necessary. No one would miss the meaning. How much time in the future would be saved as a result of that simple and pragmatic decision 4 ? Now that we’re thinking about this, why do you suppose the author chose nn for that variable name in the first place? What good reason other than brevity for the sake of brevity can you come up with? In no particular order, here are my guesses:

  1. The author’s favorite letter is n. Two ns are better than one.
  2. The author was already planning to use n for another local variable so adding another n was the easiest way to differentiate between the two 5 .
  3. nn represents the sum of all n values. I admit that this is clever but not at all intention-revealing at the outset. The cumbersome overhead of deciphering all those n values in those few lines of code far outweighs the cleverness of the name.
  4. This was merely the first draft of the code and was never revised or reviewed thereafter. Unfortunately, there are very few people capable of expressing themselves clearly and elegantly with a first draft. The writing of mere mortals will always be improved by revision 6 , and source code is a form of writing.

None of these reasons justify the use of nn as the best representation of the number of bytes written as a result of calling the method. So why not just call it that? This is just one example of the wild goose chase that we require of anyone reading our code in the future if our familiarity leads us to reckless brevity rather than thoughtful clarity. Above all, think as you type. Ask yourself if this is the kind of code you would want to read six months from now, because you probably will.

Familiarity admits brevity.

Really?


Footnotes:

1 Well-established acronyms from the field of Computer Science and the problem domain of the program are always appropriate. Single-letter counters in tight loops are also permissible (unless there’s a better name for a given situation).

2 The choice to use b as a receiver name throughout the entire file is interesting. There are two structs (with methods) defined: Reader and Writer. Both define their respective methods using b as the receiver name. This is a break from the usual practice of using the first letter of the type as the receiver name. If I am correct in labeling b as a truncation of the package name (bufio) then the author may have been using b as a consistent way to refer to the receiver throughout this file. Sounds like one of the same reasons for which programs in other languages employ names like self or this. But isn’t that a no-no in Go? Now I’m really confused. Does anyone know the actual reason that b was used instead of r and w?

3 Don’t even get me started on the fmt package (and its functions). If I could wave a magic wand to mitigate backwards compatibility issues, the following packages would be renamed as they aren’t actual words or established acronyms:

fmt     -> format
bufio   -> io/buffered
strconv -> strings/convert

4 Here I include my own rendering of the example code. I regret that because of the brevity of the original I’m not 100% certain that my names accurately describe the underlying concepts. For example, it was only through reading several other godoc sections that I discovered a more conclusive name for the n struct field: unflushedBytes. Given that this version of the code is much more verbose, I ask you to consider it from the perspective of someone reading the method for the first time. Because my rendering is a clearer set of instructions, a single reading is all that is required to fully understand it. Be aware that I took the liberty of renaming a few struct fields in the process, to unify style. Finally, instead of using w for the receiver name I opted for something more readable: writer. I would normally use self or this but thought I would meet in the middle this time.

// WriteString writes a string.
// It returns the number of bytes written.
// If the count is less than len(s), it also returns an error explaining
// why the write is short.
func (writer *Writer) WriteString(input string) (int, error) {
	totalBytesWritten := 0
	
	for len(input) > writer.Available() && writer.err == nil {
		chunkLength := copy(writer.buffer[writer.unflushedBytes:], input)
		writer.unflushedBytes += chunkLength
		totalBytesWritten += chunkLength
		input = input[chunkLength:]
		writer.flush()
	}
	
	if writer.err != nil {
		return totalBytesWritten, writer.err
	}

	finalChunkLength := copy(writer.buffer[writer.unflushedBytes:], input)
	writer.unflushedBytes += finalChunkLength
	totalBytesWritten += finalChunkLength
	return totalBytesWritten, nil
}

Because I can now see the forest from the trees, I’ve noticed that there is duplication of behavior, which begs for a method extraction (a helpful form of brevity):

func (writer *Writer) WriteString(input string) (int, error) {
	totalBytesWritten := 0
	
	for len(input) > writer.Available() && writer.err == nil {
		totalBytesWritten += writer.writeChunk(input)
		input = input[chunkLength:]
		writer.flush()
	}
	
	if writer.err != nil {
		return totalBytesWritten, writer.err
	}

	totalBytesWritten += writer.writeChunk(input)
	return totalBytesWritten, nil
}

Here’s the very brief definition of that extracted method:

func (writer *Writer) writeChunk(chunk string) int {
	chunkLength := copy(writer.buffer[writer.unflushedBytes:], chunk)
	writer.unflushedBytes += chunkLength
	return chunkLength
}

I could take it a bit further (extracting a method for the complex boolean loop condition, etc…) but you get the idea. Here’s the original again (for easy reference):

// WriteString writes a string.
// It returns the number of bytes written.
// If the count is less than len(s), it also returns an error explaining
// why the write is short.
func (b *Writer) WriteString(s string) (int, error) {
	nn := 0
	for len(s) > b.Available() && b.err == nil {
		n := copy(b.buf[b.n:], s)
		b.n += n
		nn += n
		s = s[n:]
		b.flush()
	}
	if b.err != nil {
		return nn, b.err
	}
	n := copy(b.buf[b.n:], s)
	b.n += n
	nn += n
	return nn, nil
}

5 Sadly, this is a common naming convention across several go packages: If n is taken just use nn.

6 Many thanks to my colleagues Jonathan and Michelle, who helped me revise this article.


Afterword:

Software isn’t the only context in which brevity is abused. In my work with SmartyStreets I help create street address verification software. For a human, reading an address and identifying its components is almost always a trivial task. Humans are very good at pattern recognition, and have memorized lots of information that helps us assert the correctness of a proposed pattern. We also absorb spelling mistakes with ease. There are instances, however, where ambiguities introduced by word truncations make it impossible to interpret an address correctly without firsthand geographical knowledge. These ambiguities stem from brevity accepted and standardized by the USPS. Writing an accurate algorithm for a machine is very difficult at best. As an example, consider the following contrived, but well-formed address:

123 MAIN ST HELENA CA 94574

Would you be able to identify the following information about that address?

Primary Number (ie. house number, PO Box number, etc...)
Street Name (ie. Broadway, Frontage, etc...)
Street Suffix (ie. avenue, road, lane, etc...)
City
State
ZIP Code

What you probably came up with (unless you are from a particular area of California) was this:

Primary Number: "123"
Street Name:    "MAIN"
Street Suffix:  "ST" // (ie. street)
City:           "HELENA"
State:          "CALIFORNIA"
ZIP Code:       "94574"

But here is the correct rendering:

Primary Number: "123"
Street Name:    "MAIN"
Street Suffix:  ""
City:           "ST HELENA" // (ST is short for "SAINT")
State:          "CALIFORNIA"
ZIP Code:       "94574"

This is just the tip of the iceberg regarding the problem of street address parsing.