copied from Pankaj Kumar's Weblog


The trade-offs in using java.util.ArrayList and java.util.LinkedList should be straight-forward, shouldn't it? At least this is what I used to think till today. But then most of my thinking around these datastructures were formed during college days, when C was the hottest language and Java and its Collection classes just didn't exist.

Not surprisingly, it is natural for me to think of arrays as fixed size containers, where elements can be accessed at random location through O(1) operation (ie; in constant time) and insertion/deletion in the middle are O(N) operations (ie; could take time proportional to the size of the array) and hence, are better avoided. In contrast, a linked list can grow in size, access of its head or tail and insertion/deletion in the middle are all O(1) operations (assuming that you have pointer to an adjacent element).

It is possible get around the fixed size limitation of arrays by writing a wrapper which will allocate a new array, copy the elements of the old array into the new one and then discard the old one (BTW, this is what ArrayList does). Still, the basic arrays remain a datastructure for collections of fixed size. In contrast, a linked list consists of nodes with 'pointers' to the next and previous node in the list. So, adding or removing a node is simply a matter of reassigning the pointers. Of course, this implies linear time for traversing upto an indexed node, starting from beginning or end. This simple model is very handy in deciding when to use an array and when to use a linked list.

In Java, the ArrayList and LinkedList classes provide a uniform interface to both these datastructures and hence, destroy this simple conceptual model, so necessary to make judicious implementation decisions, in impressionable young minds of many Java programmers. Let me further elaborate this with my recent own experience.

Today, while going over a graph traversal code, I was somewhat alarmed by the generous use of ArrayLists. This code was written by someone who perhaps had learnt programming with Java. As hinted earlier, both ArrayList and LinkedList implement List interface and support similar operations. An ArrayList can grow dynamically and allows insertion/deletion of elements. A LinkedList also allows access of elements through an index, exactly the same way as an ArrayList. This is all fine. However, the problem is that the apparent similarity in the API hides the widely different memory and time costs of these datastructures for different kinds of operations, luring the unwary to use them in dangerous ways:

  1. An empty ArrayList takes two to three times more memory than an empty LinkedList (because ArrayList would typically preallocate memory). This becomes important if you plan to keep an adjacency list of nodes for each node in the graph and you know beforehand that the nodes will have at most one or two adjacent nodes and the total number of nodes in the graph can be quite large.

  2. The following straight-forward loop to iterate over all elements of a list


      for (int i = 0; i < list.size(); i++)
        doSomething(list.get(i));


    works great for an ArrayList but will cause serious performance problems for a LinkedList. Can you guess why? The right way to iterate over a LinkedList is:


      ListIterator li = list.listIterator(0);
      while (li.hasNext())
        doSomething(li.next());


  3. Although both ArrayList and LinkedList allow insertion/deletion in the middle through similar operations (ie; by invoking add(int index) or remove(index)), these operations do not offer you the advantage of O(1) insertion/deletion for a LinkedList. For that, you must work through a ListIterator.

While researching on this topic, I did find a couple of good articles on the Web:


  • JDC Tech Tip article on Using ArrayList/LinkedList. Good coverage of the topic. Worth reading if you want to know more about performance tradeoffs.
  • joustlog entry titled LinkedList vs. ArrayList performance tests and subsequent clarification. This entry is more focussed in scope, pointing out the fact that addition as the end is faster for ArrayList than for LinkedList. The only thing I would like to add is that addition at the end of a LinkedList is always O(1) whereas addition at the end of an ArrayList is amortized O(1), meaning if you do M at-the-end additions then the total cost will be proportional to M. This is due to the fact that the underlying array may have to be grown (a new one to be allocated, old one to be copied and discarded) when the capacity is reached. However, I can understand that a normal at-the-end addition (ie; not involving resizing of the underlying array) will be faster for ArrayList (compared to LinkedList).

I am not advocating either ArrayList or LinkedList, though it can be justifiably argued that the use of ArrayList is better suited in many more programming scenarios, and I have no contention with that. The point I am making is that the sameness of the API makes it easy for programmers to assume that these can be used interchangeably. Nothing can be farther from truth. They are distinct datastructures, each optimized for certain kinds of operations and domain of applicability. And a good programmer should be aware of the distinction. The API exposed by the above mentioned Java classes blur this distinction. In my opinion, this is one of those areas where implementation hiding behind a common, easy-to-use interface (think of List interface that both ArrayList and LinkedList implement) may not be in the best interest of the primary user of these classes.


只有注册用户登录后才能发表评论。
网站导航: 博客园   IT新闻   BlogJava   博问   Chat2DB   管理