15 October 2021

Counting Frequencies In Clojure

"There's a function for that"

Ok, in the last post I tantalized you with a silly problem I was trying to solve. I was taking items from a list and using a list comprehension to insert them each into their own map, each with a value of 1.

```clojure-repl
user=> (for [i [:a :b :c]] {i 1})
({:a 1} {:b 1} {:c 1})

That list comprehension is actually part of a function:

(defn counts-merge [coll]
  (->> (for [i coll] {i 1})
       (apply (partial merge-with +))))

Here's what it does:

(should= {\a 1 \b 2 \c 3 \d 4}
         (counts-merge "abbcccdddd"))

Yeah, since teasing you in the last post I've discovered that, as usual, coding in Clojure is basically cheating (if you know the right functions to reach for):

(should= {\a 1 \b 2 \c 3 \d 4}
         (frequencies "abbcccdddd"))

Yup, there's a function in clojure.core that does just what I was trying to do. And, arguably, it has a better name.

I was curious to examine its implementation:

  (persistent!
   (reduce (fn [counts x]
             (assoc! counts x (inc (get counts x 0))))
           (transient {}) coll))

Ok, so there are a few strange bits (persistent! and transient). I presume those are performance-related (more on that in a bit). Here's a simplified implementation:

(defn- counts-inc [counts x]
  (assoc counts x (inc (get counts x 0))))

(defn counts-reduce [coll]
  (reduce counts-inc {} coll))

It also works as advertised:

(should= {\a 1 \b 2 \c 3 \d 4}
         (sut/counts-reduce "abbcccdddd"))

Tt's not hard to imagine that this solution is just all-around more performant to begin with compared with my original count-merge function. So, let's benchmark it:

And the winner is...

1. counts-reduce - 10 x 10000 ops, average per-op: 3036ns
2. frequencies   - 10 x 10000 ops, average per-op: 3209ns
3. counts-merge  - 10 x 10000 ops, average per-op: 6764ns

My simplified counts-reduce function??

Ok, I'm not surprised that counts-merge came in last at roughly 2x average duration, but on my machine, the counts-reduce consistently beats out the built-in frequencies. Not by much, but still. Maybe that's something to explore in a future blog post...