Lower bound for finding second largest element

In a recent discussion, I came across the idea of proving a lower bound for the number of comparisons required to find the largest element in an array. The bound is $n – 1$. This is so because the set of comparisons performed by every such algorithm looks like a tournament tree, which always has $n – 1$ internal nodes.

The obvious question is then, What should be the lower bound for finding the 2nd largest element in an array.

Solutions Collecting From Web of "Lower bound for finding second largest element"

The definitive answer for any $k$ (not just the second largest) is discussed in these notes. In short, if $V(n,k)$ is the number of comparisons needed to determine the kth largest element in a set of size $n$, then
$$ V(n,k) \ge n + R(n,k) – 2\sqrt{R(n,k)} $$
where
$$ R(n,k) = \log \binom{n}{k} – \log (n-k+1) + 3$$

The (tight) lower bound is $n + \lceil \lg n \rceil – 2$ (where $\lg n$ means $\log_2 n$).

I’ll prove tightness first: that this can be achieved (apparently the idea is due to Lewis Carroll!). First find the maximum using a “tennis tournament” structure: first compare the $n$ elements in pairs, then compare the $n/2$ “winners” in pairs, and so on. (Unpaired elements get a bye to the next round.) Since every element except the winner loses exactly once, this takes $n-1$ comparisons. But now note that the second largest element must be one which lost to the winner, as it couldn’t have been defeated by any other element. So you need to find the maximum among all the (up to) $\lceil \lg n \rceil$ elements that were defeated by the winner, and finding this maximum can be done in $\lceil \lg n \rceil – 1$.

We can prove this is a lower bound as well. Let the number of elements that lost to the maximum be $m$.

  • Firstly, you need to find the maximum, since one cannot be sure some element is the second maximum without knowing which element is the maximum. Further, for each element that lost to the maximum ($m$ of them), this comparison was useless in determining whether it was or not the second maximum. In other words, all elements other than the maximum must lose at least once, and all but one ($m-1$) of the elements that directly lost to the maximum must lose at least once more, no matter in which order the comparisons were made. So we need at least $n-1 + m-1 = n + m – 2$ comparisons.
  • To state the same thing differently: $n-2$ of the elements must be found less than the second-largest element — comparisons with the largest element do not help here — plus $m$ of them must lose directly to the maximum by definition, so we need at least $n – 2 + m = n + m – 2$ comparisons.

This proves the $n + \lceil \lg n\rceil – 2$ lower bound if we show that $m \ge \lceil \lg n \rceil$, i.e., an adversary can make sure you always have at least $\lceil \lg n\rceil$ elements that lost to the winner. This is proved here or here; an argument goes as follows: the adversary keeps for each element a list of elements known to be less than or equal to it. Whenever a query’s result is already “known”, the adversary gives that answer, else it declares the one which has a larger list the winner (breaking ties arbitrarily). The maximum is determined when the size of some list grows to n, and with each comparison the size of any list can only double, so the number of comparisons needs to be at least $\lceil \lg n\rceil$.

An observation: it will always require at least n-1. Draw a graph. If we test two nodes, then we draw an arrow from the smaller to the larger. We know a is greater than b if and only if there is a path from a to b along the arrows. We will always need at least n-1 arrows to connect the graph and hence n-1 comparisons to find a lower bound. It is easy enough to see that this isn’t the minimal upper bound though – we will only know the second highest element if our arrows form a line (ie. we can go from a “start” node to an “end” node by passing through each node and each arrow).

Another fact – there are algorithms for finding the kth largest number in guaranteed linear time. That could give you some information, but there could always be a special algorithm for finding the second largest element.

I don’t have a solution for the problem yet. But one thing we can try is breaking the problem into two groups and combining them. Suppose we have two groups, A and B, where we know both the largest and second largest elements, but no knowledge of any relationships between the groups. Then it takes two arrows to find the second highest element. First we compare the highest elements in A and B. Without loss of generality assume A has the highest element. Then we would compare B with the highest element in A. Showing that the groups can’t be combined with a single comparison is trivial. This gives us a recursion relation for the time taken to solve by combining groups which have been solved separately: $f(n)=min f(k)+f(n-k)+2$ where $ 0 < k < n $.

The graph model shows clearly why you need n+m-2 comparisons to determine the second maximum. Consider the graph of comparisons mentioned by Casebash. Now remove the maximum and its m edges connecting with the rest. If the algorithm correctly finds the second maximum, this reduced graph must still be connected. Otherwise, it means that there are two connected components that were connected only through the maximum node, which is useless to help compare the elements of both components. Thus, if connected after removing m edges and a node, the graph must have n+m-2 edges.