I'm taking the opportunity to understand the difference between lists and vectors as implemented in clojure.
As an exercise I implemented a few methods common to both in Java, which is closer to languages I've used over the bulk of my time as a programmer.
https://github.com/mdwhatcott/java-containers
Here are the methods implemented for each container:
conj
(append
for vector,prepend
for list)nth
(at
)pop
insert
remove
Vector
The vector is backed by a more primitive array which has to be carefully managed. Upon instantiation an array of specifiable size is allocated.
The append
method works by filling in subsequent slots in the array. When filled to capacity, a newer (larger) array is allocated, the existing items are copied to it, then incoming items are put in the larger array. The old array is simply garbage-collected. This operations is fast (O(1)
) when there is capacity, but can take more time when the backing array must be expanded. (Pre-allocation can speed up scenarios when the size is known up-front.)
The at
method defers to the index operation of the underlying array and simply retrieves the item as the provided index. This operation is O(1)
.
The pop
method removes (and returns) the last item in the array (O(1)
).
The insert
method allows the user to insert a value at a specified index. This can be as costly as O(n)
because many items must be shifted to perserve ordering and make way for the incoming item.
The remove
method allows the user to remove a value at a specified index. This can also be as costly as O(n)
for the same reasons stated for insert
.
(Singly-Linked) List
The list is a singly-linked list, meaning that the first item has a 'link' or reference to the second item, which references the third, etc... The recursive nature of this structure lends itself to recursive implementations of most of the following functions.
Because this structure is uni-directional it is faster/cheaper to prepend items at the front of the list, rather than appending to the back (ie. the vector). So, the prepend
method works by making the new item the root of the collection, which refers to the previously held root item (O(1)
).
The at
method requires a traversal of the list starting with the root item all the way to the desired node to retrieve the corresponding value. This can take up to O(n)
.
The pop
method simply designates the second item as the root and discards/returns what was the root.
The insert
method must traverse to the desired index and reassign references to establish the new item's position in the chain (up to O(n)
).
The remove
method is the inverse of insert
(up to O(n)
).
Conclusions
Q1. Which is faster for inserting an item into the middle?
A1. VECTOR
The into
clojure function repeatedly calls conj
to build the specified collection.
Q2. What is the result of (into '() [1 2 3 4 5])
?
A2.
(5 4 3 2 1)
Q3. Why was the collection reversed?
A3. Because
conj
adds items to collections in the most efficient way possible. In the case of lists that means prepending the items. This has the effect of reversing the input sequence.
Q4. What is the result of (into [] '(1 2 3 4 5))
?
A4.
[0 1 2 3 4]
Q5. Why was the order of the input sequence maintained?
A5. Because vectors append new items to the end since that's the most efficient option.
References
- Linked list, Wikipedia
- Clojure Data Structures
- "Performance Considerations, Lists vs. Vectors", by Robert Gu
- "A wrinkle in Clojure", by John Cook
- Clojure FAQ: "Why does conj add to the front of a list, but the back of a vector?"
- Clojure FAQ: "I keep forgetting that after calling sequence functions on vectors/sets, the return value is no longer a vector or a set."
- StackOverflow: "In Clojure, when should I use a vector over a list, and the other way around?"