2-pivot quicksort
This posting on the Java core library mailing list proposes to replace the current quicksort with a new, 2-pivot quicksort.
Ordinary quicksort in Haskell looks something like this:
quicksort [] = [] quicksort (pivot:rest) = quicksort [x| x ← rest, x ≤ pivot] ++ [pivot] ++ quicksort [x| x ← rest, x > pivot]
Note the similarity of this to the notation for ‘invariants’ used in the email:
[ <= p | >= p ]
Similarly, for the 2-pivot invariants (I have corrected a typo in this):
[ < P1 | P1 <= & <= P2 | > P2 ]
We can derive the code:
quicksort2 [] = [] quicksort2 (only:[]) = [only] quicksort2 (first:second:rest) = quicksort2 [x|x ← rest, x < p1] ++ [p1] ++ quicksort2 [x|x ← rest, p1 ≤ x && x ≤ p2] ++ [p2] ++ quicksort2 [x|x ← rest, p2 < x] where (p1,p2) = if first ≤ second then (first,second) else (second,first)
Which does, in fact, run faster than the ordinary 1-pivot quicksort.
n-pivot quicksort
Of course, this being Haskell we want to take things to their logical conclusions, and allow any number of pivots to be used. My formulation follows:
quicksortN _ [] = [] quicksortN _ (x:[]) = [x] quicksortN n xs | n > length xs = quicksortN ((n `div` 2) `max` 1) xs | otherwise = part True pivots where (takexs,dropxs) = splitAt n xs pivots = quicksortN 1 takexs part False (p2:[]) = quicksortN n [x|x ← dropxs, p2 < x] part False (p1:p2:rest) = quicksortN n [x|x ← dropxs, p1 < x && x ≤ p2] ++ [p2] ++ part False (p2:rest) part True (p1:[]) = quicksortN n [x|x ← dropxs, x ≤ p1] ++ [p1] ++ quicksortN n [x|x ← dropxs, p1 < x] -- remember me? :) part True (p1:p2:rest) = quicksortN n [x|x ← dropxs, x ≤ p1] ++ [p1] ++ part False (p1:p2:rest)
Now there are only 3 things that I can see that are left ‘tweakable’;
- how to reduce ‘n’ when it is greater than length of the list
- dividing by 2 here
- how to sort the pivots
- using a recursive call to the 1-pivot version of itself here
- whether or not to reduce the number of pivots as sorting continues
- ‘no’ chosen here
I have no idea whether these are sensible defaults ![]()
Which n should I choose?
Finally, some code for the timings. You’ll need timeit to run it.
main = do gen ← getStdGen let n = 10^6 let r = take n $ randoms gen :: [Int] print $ sum r -- force list timeIt . putStrLn . ("quicksort = "++) . show . sum $ quicksort r timeIt . putStrLn . ("quicksort2 = "++) . show . sum $ quicksort2 r mapM_ (\x → timeIt . putStrLn . (("quicksortN "++ show x++" = ")++) . show . sum $ quicksortN x r) [1..20]
Sample output:
$ ./sort 858124546 quicksort = 858124546 CPU time: 4.60s quicksort2 = 858124546 CPU time: 3.66s quicksortN 1 = 858124546 CPU time: 5.19s quicksortN 2 = 858124546 CPU time: 3.77s quicksortN 3 = 858124546 CPU time: 3.28s quicksortN 4 = 858124546 CPU time: 3.27s quicksortN 5 = 858124546 CPU time: 3.06s quicksortN 6 = 858124546 CPU time: 3.04s quicksortN 7 = 858124546 CPU time: 3.04s quicksortN 8 = 858124546 CPU time: 3.07s quicksortN 9 = 858124546 CPU time: 3.10s quicksortN 10 = 858124546 CPU time: 3.14s quicksortN 11 = 858124546 CPU time: 3.22s quicksortN 12 = 858124546 CPU time: 3.31s quicksortN 13 = 858124546 CPU time: 3.30s quicksortN 14 = 858124546 CPU time: 3.37s quicksortN 15 = 858124546 CPU time: 3.54s quicksortN 16 = 858124546 CPU time: 3.41s quicksortN 17 = 858124546 CPU time: 3.52s quicksortN 18 = 858124546 CPU time: 3.65s quicksortN 19 = 858124546 CPU time: 3.63s quicksortN 20 = 858124546 CPU time: 3.77s
As expected, generalized quicksort performs slower than the equivalent hard-coded quicksorts for specific values of n, but higher values of n than 1 or 2 can perform better.
Comments 1
I’d also point out that this implementation isn’t particularly fast; note the call to ‘length’, for example.
Posted 26 Sep 2009 at 6:58 pm ¶Post a Comment