Counting Frequencies In Clojure

"There's a function for that"

15 October 2021 programming apprenticeship clojure

Ok, in the last post I tantalized you with a silly problem I was trying to solve. I was taking items from a list and using a list comprehension to insert them each into their own map, each with a value of 1.


user=> (for [i [:a :b :c]] {i 1}) ({:a 1} {:b 1} {:c 1})

That list comprehension is actually part of a function:

(defn counts-merge [coll] (->> (for [i coll] {i 1})

   (apply (partial merge-with +))))
Here's what it does:

(should= {\a 1 \b 2 \c 3 \d 4}

     (counts-merge "abbcccdddd"))
Yeah, since teasing you in the last post I've discovered that, as usual, [coding in Clojure is basically cheating](/coding-in-clojure-feels-like-cheating/) (if you know the right functions to reach for):

(should= {\a 1 \b 2 \c 3 \d 4}

     (frequencies "abbcccdddd"))
Yup, [there's a function in `clojure.core`](https://clojuredocs.org/clojure.core/frequencies) that does just what I was trying to do. And, arguably, it has a better name.

I was curious to examine its [implementation](https://github.com/clojure/clojure/blob/clojure-1.10.1/src/clj/clojure/core.clj#L7203):

(persistent! (reduce (fn [counts x]

         (assoc! counts x (inc (get counts x 0))))
       (transient {}) coll))
Ok, so there are a few strange bits ([`persistent!`](https://clojuredocs.org/clojure.core/persistent!) and [`transient`](https://clojuredocs.org/clojure.core/transient)). I presume those are performance-related (more on that in a bit). Here's a simplified implementation:

(defn- counts-inc [counts x] (assoc counts x (inc (get counts x 0))))

(defn counts-reduce [coll] (reduce counts-inc {} coll))

It also works as advertised:

(should= {\a 1 \b 2 \c 3 \d 4}

     (sut/counts-reduce "abbcccdddd"))
Tt's not hard to imagine that this solution is just all-around more performant to begin with compared with my original `count-merge` function. So, [let's benchmark it](/benchmarking-clojure-code/):

And the winner is...
  1. counts-reduce - 10 x 10000 ops, average per-op: 3036ns
  2. frequencies - 10 x 10000 ops, average per-op: 3209ns
  3. counts-merge - 10 x 10000 ops, average per-op: 6764ns

My simplified counts-reduce function??

Ok, I'm not surprised that counts-merge came in last at roughly 2x average duration, but on my machine, the counts-reduce consistently beats out the built-in frequencies. Not by much, but still. Maybe that's something to explore in a future blog post...