20 October 2021

Vectors: last vs peek

Why it's important to understand the defining characteristics of data structures.

Advent of Code 2018 day 5 requires detecting when a pair of characters is the same letter but differing type. Such pairs are deemed 'unstable' and are removed:

(defn abs [n]
  (max n (- n)))

(def lower-vs-upper
  (- (int \a)
     (int \A)))

(defn unstable? [a b]
  (= lower-vs-upper
     (abs (- (int (or a 0))
             (int (or b 0))))))

An input string such as dabAcCaCBAcCcaDA will go through the following 'reactions':

    vv
dabAcCaCBAcCcaDA
   vv
dabAaCBAcCcaDA
      vv
dabCBAcCcaDA

dabCBAcaDA (stable)

The important realization for this puzzle is that the solution can be built one character at a time in a single pass of the input. We establish an output 'stack' and add characters to it until we reach an unstable pair. At that point we 'pop' from the stack and skip over the next input character--doing so implements the rule that unstable pairs 'react', destroying one another. So the output stack grows and shrinks character by character until the whole input string has been scanned and every unstable pair 'reacted'.

As is typical for me, my solutions often come out as loop/recur forms, even though I can usually sense there's probably a more 'functional' way.

(defn part1-recur-slow [INPUT]
  (loop [input  (rest INPUT)
         output [(first INPUT)]]
    (if (empty? input)
      (count output)
      (let [i (first input)]
        (if (unstable? (last output) i)
          (recur (rest input) (pop output))
          (recur (rest input) (conj output i)))))))

This solution worked, but clocked in at a disappointing 7.5 seconds! The input string is 50,000 characters long. I didn't see any reason for this kind of solution to take so long but couldn't see how to improve it. So, off to the subreddit. After reading a few of the functional approaches I decided to adapt a reduce-based solution by Reddit user Average-consumer and ended up with the following little beauty:

(defn react [output i]
  (if (unstable? (peek output) i)
    (pop output)
    (conj output i)))

(defn part1-reduce [input]
  (count (reduce react [] input)))

Of course the solution can be thought of in terms of reduce! It really is a perfect fit. This solution still bears some resemblance to my original. It has an (unstable? ...) check and also the same (pop output) and (conj output i) forms. We can even use reductions to visualize the step-by-step creation of the output:

(defn part1 [input]
  (let [steps (reductions react [] input)]
    (pprint/pprint (map #(apply str %) steps))
    (count (last steps))))

Here's what pprint/pprint emits in processing the input string dabAcCaCBAcCcaDA (and I've added complimentary comments showing what's left of the input at each step). You can clearly see 'reaction' points wherever the output string (left side) gets shorter instead of longer.

(""           ; dabAcCaCBAcCcaDA
 "d"          ;  abAcCaCBAcCcaDA
 "da"         ;   bAcCaCBAcCcaDA
 "dab"        ;    AcCaCBAcCcaDA
 "dabA"       ;     cCaCBAcCcaDA
 "dabAc"      ;      CaCBAcCcaDA
 "dabA"       ;       aCBAcCcaDA
 "dab"        ;        CBAcCcaDA
 "dabC"       ;         BAcCcaDA
 "dabCB"      ;          AcCcaDA
 "dabCBA"     ;           cCcaDA
 "dabCBAc"    ;            CcaDA
 "dabCBA"     ;             caDA
 "dabCBAc"    ;              aDA
 "dabCBAca"   ;               DA
 "dabCBAcaD"  ;                A
 "dabCBAcaDA")

Most interestingly, this solution completes almost instantaneously compared to my loop/recur solution! The algorithmic approach they take is exactly the same, so what's the difference? Is it really just that reduce is far more optimized than loop/recur or is there something else? (HINT: It's something else...)

Well, the only other notable difference between the two is the first argument to (unstable? ...). In the slow solution, (last output) is used. In the fast one it's (peek output). The docs for each make the difference perfectly clear (emphasis mine):

(last coll)

Return the last item in coll, in linear time

(peek coll)

For a list or queue, same as first, for a vector, same as, but much more efficient than, last...

The real-input, being 50,000 characters long has real implications for the performance of last vs. peek. As the output string gets longer and longer, the call to last gets longer. Vectors must implement stack-like behavior (keeping a direct reference to the last/top item) which peek is able to take advantage of. This is probably a lesson I should have learned already...

Anyway, let's benchmark the two functions with collections of different sizes to drive home the point:

(def items-100 (vec (range 100)))
(def items-1000 (vec (range 1000)))
(def items-10000 (vec (range 10000)))
(def items-50000 (vec (range 50000)))

(doseq [items [items-100 items-1000 items-10000 items-50000]]
  (perf/benchmark 10 100 (format "last - %d" (count items)) #(last items))
  (perf/benchmark 10 100 (format "peek - %d" (count items)) #(peek items))
  (println))

And the results:

## last - 100 - 10 x 100 ops, average per-op: 4371ns
## peek - 100 - 10 x 100 ops, average per-op: 70ns

## last - 1000 - 10 x 100 ops, average per-op: 40632ns
## peek - 1000 - 10 x 100 ops, average per-op: 36ns

## last - 10000 - 10 x 100 ops, average per-op: 269780ns
## peek - 10000 - 10 x 100 ops, average per-op: 36ns

## last - 50000 - 10 x 100 ops, average per-op: 1327493ns
## peek - 50000 - 10 x 100 ops, average per-op: 34ns

Understanding that distinction makes all the difference in the world for the original algorithm!

(defn part1-recur-fast [INPUT]
  (loop [input  (rest INPUT)
         output [(first INPUT)]]
    (if (empty? input)
      (count output)
      (let [i (first input)]
        (if (unstable? (peek output) i) ; now using peek instead of last here!
          (recur (rest input) (pop output))
          (recur (rest input) (conj output i)))))))

It now performs comparably with the reduce-based solution:

## part1-recur-slow - 1  x 1  ops, average per-op: 7.5 seconds (ouch)
## part1-recur-fast - 10 x 20 ops, average per-op: 10,456,967ns
## part1-reduce     - 10 x 20 ops, average per-op:  8,810,398ns