List vs Vector

Round 1...FIGHT!

August 6, 2021

I'm taking the opportunity to understand the difference between lists and vectors as implemented in clojure.

As an exercise I implemented a few methods common to both in Java, which is closer to languages I've used over the bulk of my time as a programmer.

https://github.com/mdwhatcott/java-containers

Here are the methods implemented for each container:

  1. conj (append for vector, prepend for list)
  2. nth (at)
  3. pop
  4. insert
  5. remove

Vector

The vector is backed by a more primitive array which has to be carefully managed. Upon instantiation an array of specifiable size is allocated.

The append method works by filling in subsequent slots in the array. When filled to capacity, a newer (larger) array is allocated, the existing items are copied to it, then incoming items are put in the larger array. The old array is simply garbage-collected. This operations is fast (O(1)) when there is capacity, but can take more time when the backing array must be expanded. (Pre-allocation can speed up scenarios when the size is known up-front.)

The at method defers to the index operation of the underlying array and simply retrieves the item as the provided index. This operation is O(1).

The pop method removes (and returns) the last item in the array (O(1)).

The insert method allows the user to insert a value at a specified index. This can be as costly as O(n) because many items must be shifted to perserve ordering and make way for the incoming item.

The remove method allows the user to remove a value at a specified index. This can also be as costly as O(n) for the same reasons stated for insert.

(Singly-Linked) List

The list is a singly-linked list, meaning that the first item has a 'link' or reference to the second item, which references the third, etc... The recursive nature of this structure lends itself to recursive implementations of most of the following functions.

Because this structure is uni-directional it is faster/cheaper to prepend items at the front of the list, rather than appending to the back (ie. the vector). So, the prepend method works by making the new item the root of the collection, which refers to the previously held root item (O(1)).

The at method requires a traversal of the list starting with the root item all the way to the desired node to retrieve the corresponding value. This can take up to O(n).

The pop method simply designates the second item as the root and discards/returns what was the root.

The insert method must traverse to the desired index and reassign references to establish the new item's position in the chain (up to O(n)).

The remove method is the inverse of insert (up to O(n)).

Conclusions

Q1. Which is faster for inserting an item into the middle?

A1. VECTOR


The into clojure function repeatedly calls conj to build the specified collection.

Q2. What is the result of (into '() [1 2 3 4 5])?

A2. (5 4 3 2 1)


Q3. Why was the collection reversed?

A3. Because conj adds items to collections in the most efficient way possible. In the case of lists that means prepending the items. This has the effect of reversing the input sequence.


Q4. What is the result of (into [] '(1 2 3 4 5))?

A4. [0 1 2 3 4]


Q5. Why was the order of the input sequence maintained?

A5. Because vectors append new items to the end since that's the most efficient option.

References

-Michael Whatcott