12 February 2015

Go code that stutters

"You keep using that word...I don't think it means what you think it means..."

I first stumbled upon the word "stutter" as a description of code on the Code Review Comments document. With only a single example and no clear definition I was left wondering what it really meant for code to "stutter".

I assumed that they referred to the similarity of the return parameter's name and type. A few additional google searches, as well as a recent blog post on the subject of package names and my assumption was confirmed. I will herein propose that the conventions proposed by the go team actually cause stuttering in a much more literal sense than what they have loosely described as such.

Stepping back to look at the go project itself, I can understand the go team wanting a consistent style across all core go packages and projects. But they are actively pushing for a much broader acceptance--a battle they don't really need to fight. There are bigger, more important battles in software (Does it work? Can it be extended? Is it under test? Does it perform well? Is it robust? Etc...).

A wonderful example of what I'm talking about came from Jeremy Saenz, responding to criticism of his popular web framework, Martini. He made a great point:

Martini, and its design, is simply not idiomatic Go. This is not to say that Martini is not well designed...This doesn’t make Martini wrong, it is just not going in the direction that the Go community as a whole is going.

The moral of the story here is that not producing idiomatic code is not the end of the world--not by a long shot!

Now, back to code that stutters.

"Ok google, define stutter"

verb 1. talk with continued involuntary repetition of sounds, especially initial consonants.

Field and Variable Names

Given that definition, here's some actual go code that shows how field names and local variable names can stutter:

type Writer struct {
	err error
	buf []byte
	n   int
	wr  io.Writer
}
...
func (b *Writer) WriteString(s string) (int, error) {
	nn := 0
	for len(s) > b.Available() && b.err == nil {
		n := copy(b.buf[b.n:], s)
		b.n += n
		nn += n
		s = s[n:]
		b.flush()
	}
	if b.err != nil {
		return nn, b.err
	}
	n := copy(b.buf[b.n:], s)
	b.n += n
	nn += n
	return nn, nil
}

buf? err? wr? b? b.n? n? nn? That's a lot of 'b's and 'n's. What exactly are we to understand from such a minimalistic scheme? Here are my best guesses:

buf -> buffer
err -> error (<grumble>fine, I'll accept `err` because `error` is already a built-in type.</grumble>)
wr  -> writer
b.n -> ??? (hmm. probably another counter. Guess I'll have to read more code to be sure. #fail
nn  -> total

These names cause the reader of the code to stutter. The reader is required to expend mental effort just to make it past the initial consonants of the actual concepts that are being used.

Receiver Names

Another form of stutter we are oft encouraged to utilize is to use the first letter or two of a struct as the receiver name in functions defined on that struct. Personally, I prefer self or this because those names help you remember, even when reading the body of the function, that the function is centered around that identifier. When pressed for something else, I would probably use a one-word rendering of the struct before a truncation of the struct name to one or two characters.

func (writer *Writer) Write(...) {...}

The apparent repetition of the word "writer" in the function signature is annoying to lots of go programmers. writer is a label for an instance of a struct. Writer is a type (struct) name. writer and Writer are names for different/orthogonal concepts and it's ok that they happen to share the same spelling. This isn't an impediment to readability (as the word "stuttering" implies), it's simply the repetition of a word. Using complete words actually allows us to read the contents of the function more fluidly and expressively.

The example above used b as the receiver name, presumably because the Writer struct was defined in the bufio package [citation needed]. But the many crytpic instances of b.--- could have been presented as writer.--- or just self.---, both of which communicate more accurately than b, which could be any old variable (local, package-wide, or the receiver itself).

Function Names

Here's a grouping of function names that causes mental stutter every time I try to recall them without the documentation (way too many 'f's and 's's and 'ln's all over the place):

fmt.Fprint  
fmt.Fprintf 
fmt.Fprintln
fmt.Fscan   
fmt.Fscanf  
fmt.Fscanln 
fmt.Print   
fmt.Printf  
fmt.Println 
fmt.Scan    
fmt.Scanf   
fmt.Scanln  
fmt.Sprint  
fmt.Sprintf 
fmt.Sprintln
fmt.Sscan   
fmt.Sscanf  
fmt.Sscanln 

Seriously folks, Sscanln has got to be one of the weirdest looking names ever. I realize that these function names are direct descendants of the comparable functions from the C programming language (printf) but I question the need felt by the authors of the fmt package to reuse those old, clunky names. I know this will be counted as heresy but wouldn't something like this be much easier to recall and cause much less keyboard stutter?

fmt.Fprint    ->  format.Write             
fmt.Fprintf   ->  format.WriteFormat       
fmt.Fprintln  ->  format.WriteLine         
fmt.Fscan     ->  format.Read              
fmt.Fscanf    ->  format.ReadFormat        
fmt.Fscanln   ->  format.ReadLine          
fmt.Print     ->  format.Print             
fmt.Printf    ->  format.PrintFormat       
fmt.Println   ->  format.PrintLine         
fmt.Scan      ->  format.Scan              
fmt.Scanf     ->  format.ScanFormat        
fmt.Scanln    ->  format.ScanLine          
fmt.Sprint    ->  format.Compose           
fmt.Sprintf   ->  format.ComposeFormat     
fmt.Sprintln  ->  format.ComposeLine       
fmt.Sscan     ->  format.Decompose         
fmt.Sscanf    ->  format.DecomposeFormat   
fmt.Sscanln   ->  format.DecomposeLine     

Don't want to type format because it's just so much longer than fmt? Well, I think the fmt package should have been built-in, so why not import the format package like it was built-in?

import (
	"fmt"
	. "github.com/mdw-go/format"
)

func main() {

	PrintLine("Which line is easier to grok?")

	fmt.Println("Which line is easier to grok?")

}

You're welcome.

(Mixed-up) Example

Andrew Gerrand gave a talk on the preferred naming conventions for go code in which he presents a function with "bad" names and then shows an altered version that has "good" names. I like to imagine that he just mixed up the titles for those slides, because I much prefer the "bad" code. Here's how I would have titled those slides:

Bad

func RuneCount(b []byte) int {
    i, n := 0, 0
    for i < len(b) {
        if b[i] < RuneSelf {
            i++
        } else {
            _, size := DecodeRune(b[i:])
            i += size
        }
        n++
    }
    return n
}

Good

func RuneCount(buffer []byte) int {
    index, count := 0, 0
    for index < len(buffer) {
        if buffer[index] < RuneSelf {
            index++
        } else {
            _, size := DecodeRune(buffer[index:])
            index += size
        }
        count++
    }
    return count
}

Conclusion

I've just presented some of my naming preferences. We all have our own preferences and reasons for naming things and that's ok. It's silly that a group of language designers have taken it upon themselves to dictate how we all should go about naming things, sweepingly calling one scheme good and all others bad. This is all just a matter of opinion and really should be left up to project leaders.

As a result of the "guidance" from the go team, those that take an alternate approach are openly criticized for having their own opinion, nevermind whether their software even works or effectively solves the problems that sparked the creation of the software in the first place.