Sorted sums of a sorted list

code 25 September 2008 | 3 Comments

This is a post in response to this question over at the newly-spawned StackOverflow. Essentially:

If we have a sorted list of numbers, how can we efficiently get a sorted list of the sums of every pair of numbers in that list?

My initial reply is also there. I decided that, since I have a moment of spare time I should implement this. I’m semi-reprinting it here, but you might want to read it anyway.

Algorithm

The idea is that if you have a sorted list x, you can write out the sums as a matrix M where M_{i,j} = x_{i} + x_{j}:

\begin{tabular}{r|rrrrrr}& 1 & 4     & 5 & 6 & 8 & 9 \\ \hline 1 & 2 & 5 & 6   & 7 & 9 & 10 \\ 4 &   & 8 & 9  & 10 & 12 & 13 \\ 5 &   &   & 10 & 11 & 13 & 14 \\ 6 &   &   &     & 12 & 14 & 15 \\ 8 &   &   &    &      & 16 & 17 \\ 9 &   &   &    &      &     & 18 \end{tabular}

I’ve left out the lower half of the matrix as it is clearly symmetric (M_{i,j} = M_{j,i}).

You can see that clearly M_{i,j} \le M_{(i+1),j} and M_{i,j} \le M_{i,(j+1)} (since if y \le z then x + y \le x + z), so we only have to examine the “top-left corners” of the matrix. As an example, we would first examine M_{0,0}:

\begin{tabular}{r|rrrrrr}& 1 & 4     & 5 & 6 & 8 & 9 \\ \hline 1 & \fbox{2} & 5 & 6   & 7 & 9 & 10 \\ 4 &   & 8 & 9  & 10 & 12 & 13 \\ 5 &   &   & 10 & 11 & 13 & 14 \\ 6 &   &   &     & 12 & 14 & 15 \\ 8 &   &   &    &      & 16 & 17 \\ 9 &   &   &    &      &     & 18 \end{tabular}

Then accept it since it is the only option, and move on to the next top-left corner, which again is a single choice:

\begin{tabular}{r|rrrrrr}& 1 & 4     & 5 & 6 & 8 & 9 \\ \hline 1 &  & \fbox{5} & 6   & 7 & 9 & 10 \\ 4 &   & 8 & 9  & 10 & 12 & 13 \\ 5 &   &   & 10 & 11 & 13 & 14 \\ 6 &   &   &     & 12 & 14 & 15 \\ 8 &   &   &    &      & 16 & 17 \\ 9 &   &   &    &      &     & 18 \end{tabular}

Then we get two options:

\begin{tabular}{r|rrrrrr}& 1 & 4     & 5 & 6 & 8 & 9 \\ \hline 1 &  &   & \fbox{6}   & 7 & 9 & 10 \\ 4 &   & \fbox{8} & 9  & 10 & 12 & 13 \\ 5 &   &   & 10 & 11 & 13 & 14 \\ 6 &   &   &     & 12 & 14 & 15 \\ 8 &   &   &    &      & 16 & 17 \\ 9 &   &   &    &      &     & 18 \end{tabular}

And so on:

\begin{tabular}{r|rrrrrr}& 1 & 4     & 5 & 6 & 8 & 9 \\ \hline 1 &  &   &   & \fbox{7} & 9 & 10 \\ 4 &   & \fbox{8} & 9  & 10 & 12 & 13 \\ 5 &   &   & 10 & 11 & 13 & 14 \\ 6 &   &   &     & 12 & 14 & 15 \\ 8 &   &   &    &      & 16 & 17 \\ 9 &   &   &    &      &     & 18 \end{tabular}
\begin{tabular}{r|rrrrrr}& 1 & 4     & 5 & 6 & 8 & 9 \\ \hline 1 &  &   &   &   & \fbox{9} & 10 \\ 4 &   & \fbox{8} & 9  & 10 & 12 & 13 \\ 5 &   &   & 10 & 11 & 13 & 14 \\ 6 &   &   &     & 12 & 14 & 15 \\ 8 &   &   &    &      & 16 & 17 \\ 9 &   &   &    &      &     & 18 \end{tabular}
\begin{tabular}{r|rrrrrr}& 1 & 4     & 5 & 6 & 8 & 9 \\ \hline 1 &  &   &   &   & \fbox{9} & 10 \\ 4 &   &  & \fbox{9}  & 10 & 12 & 13 \\ 5 &   &   & 10 & 11 & 13 & 14 \\ 6 &   &   &     & 12 & 14 & 15 \\ 8 &   &   &    &      & 16 & 17 \\ 9 &   &   &    &      &     & 18 \end{tabular}

I think you get the picture

One of the points of this post is that I horribly misimplemented this and have spent many wasted hours today trying to figure out why it wasn’t working. Instead of the last sequence of diagrams, my algorithm was doing the following:

\begin{tabular}{r|rrrrrr}& 1 & 4     & 5 & 6 & 8 & 9 \\ \hline 1 &  &   &   & \fbox{7} & 9 & 10 \\ 4 &   & \fbox{8} & 9  & 10 & 12 & 13 \\ 5 &   &   & 10 & 11 & 13 & 14 \\ 6 &   &   &     & 12 & 14 & 15 \\ 8 &   &   &    &      & 16 & 17 \\ 9 &   &   &    &      &     & 18 \end{tabular}
\begin{tabular}{r|rrrrrr}& 1 & 4     & 5 & 6 & 8 & 9 \\ \hline 1 &  &   &   &   & \fbox{9} & 10 \\ 4 &   & \fbox{8} & 9  & \fbox{10} & 12 & 13 \\ 5 &   &   & 10 & 11 & 13 & 14 \\ 6 &   &   &     & 12 & 14 & 15 \\ 8 &   &   &    &      & 16 & 17 \\ 9 &   &   &    &      &     & 18 \end{tabular}

That is, every time a corner was eliminated it was adding both neighbours to the list of corners even if there was a corner further left on the row below it. As such, the following naïve algorithm was beating it by a factor of 4:

sortedSumSortedList xs = sort $ concat [[x+y|y<-drop nx xs]|x<-xs|nx<-[0..]]

Implementation

A preliminary: you’ll need the following imports for the code below.

import Data.Array.Unboxed
import Control.Monad
import qualified Data.IntMap as IntMap
import Test.QuickCheck
import Data.List
import Data.Time.Clock

First of all we need the sums of the list with itself:

sumLists :: [Int] -> [Int] -> UArray (Int,Int) Int
sumLists xs ys = array ((0,0),(length xs-1,length ys-1)) $ sumLists' xs ys 0
	where
	sumLists' _ [] _ = []
	sumLists' xs' (y:ys') ny = [((nx,ny),x+y)|x<-drop ny xs'|nx<-[ny..]] ++ sumLists' xs ys' (ny+1)

This does basically the same thing as the code above (concat [[x+y|y<-drop nx xs]|x<-xs|nx<-[0..]]) except that we want an array rather than a list so that we have constant-time addressing. In Haskell an array is created by specifying the boundaries of the array and then the data for the array. Note that the bottom of the array is left empty as in the diagrams; this is fine, we can leave some data unspecified as long as we never access it.

And here's the workhorse. I've commented it more thoroughly so you should be able to understand what's happening.

orderedSumsOfOrderedList sums corners row
	| IntMap.null corners = []
	| otherwise = replicate (length minCorners) minimumValue ++ orderedSumsOfOrderedList sums corners' row'
	where
	-- minCorners = all corners with the same (lowest) value
	-- allCorners = all other corners currently being considered (as a IntMap)
	(minCorners,allCorners) = IntMap.deleteFindMin corners
	-- minimumValue = the value in the lowest corner(s)
	minimumValue = sums ! (head minCorners)
 
	-- row is incremented if any of the neighbours has moved down a row
	row' = if any (\(_,_,b)-> b) neighbours then row+1 else row
	-- update the corners to reflect all neighbours of the just-removed corners
	corners' = foldr (\(key,value)-> IntMap.insertWith (++) key [value]) allCorners $ map (\(x,y,_) -> (x,y)) neighbours
 
	-- a list of values, their positions in the array, and whether to increment row
	neighbours = concatMap (\p-> goRight p ++ goDown p) minCorners
		where
			-- try to get a neighbour further right, within bounds of array
			goRight (x,y) =	ifM (x < maxX)                       (sums ! (x+1,y), (x+1,y),False)
			-- try to get a neighbour further down, within bounds of array and not on the same row as another one
			-- (if successful we should update row)
			goDown  (x,y) = ifM (y < x && y < maxY && y+1 > row) (sums ! (x,y+1), (x,y+1),True)
			maxX = (fst$snd$bounds$sums)
			maxY = (snd$snd$bounds$sums)

To efficiently choose the lowest number each time, we keep track of them in something like a binary tree. In Haskell the provided implementation is Data.Map (or Data.IntMap which is specialized for integer keys).

The rest of what this code does is basically:

  1. Choose the lowest corner(s) from the tree of available corners.
  2. Find the neighbours of these corner(s), ensuring they are valid (in this case this means checking bounds and checking that no two are on the same row—it suffices to ensure that no downwards-searching neighbours are added which are on a row less than the maximum row reached so far), and insert these into the tree.
  3. Output the lowest corner(s) and recurse with the new tree and row number.

ifM is a nice little function I've had cause to use several times, and it looks like this:

ifM :: (MonadPlus m) => Bool -> a -> m a
ifM pred res = if pred
		then return res
		else mzero

And here's the wrapper function (osool = ‘ordered sums of ordered list’), we take in a list of Ints and return a list of Ints. Note that we do not check the precondition (that the input list is sorted).

osool :: [Int] -> [Int]
osool [] = []
osool xs = orderedSumsOfOrderedList sums corners row 
	where
	sums = sumLists xs xs
	corners = IntMap.singleton (sums ! (0,0)) [(0,0)]
	row = 0

sums is the sums of the list, and corners is the quick-lookup structure (in this case IntMap) which allows us to find the minimum fast. We initialize it with the value of the top-left corner of the matrix, and its coördinates. Finally, row is a value indicating which row we are up to. As shown above, we don't want to add any new corners which are less than this value when eliminating a square and looking for a neighbouring square in the downwards direction.

And finally, we can test this with the wonderful QuickCheck:

checkMe = do
	time "osool" $ quickCheck (\xs -> let sxs = sort xs in length (osool sxs) == num sxs)
	time "checkSumLists" $ quickCheck (\xs -> let sxs = sort xs in length (checkSumLists sxs) == num sxs)
 
	time "osool" $ longCheck (\xs -> let sxs = sort xs in length (osool sxs) == num sxs)
	time "checkSumLists" $ longCheck (\xs -> let sxs = sort xs in length (checkSumLists sxs) == num sxs)
 
	time "areEqual" $ quickCheck (\xs -> let sxs = sort xs in checkSumLists sxs == osool sxs)
	where
	longCheck p = check (defaultConfig { configSize = const 1000, configMaxTest = 500}) p
	checkSumLists xs = sort $ concat [[x+y|y<-drop nx xs]|x<-xs|nx<-[0..]]
	num xs = sum [1..length xs]
	time s d = do
		putStrLn $ s ++ ":"
		t1 <- getCurrentTime
		d
		t2 <- getCurrentTime
		putStrLn $ show $ t2 `diffUTCTime` t1

Results

$ ./test
osool:
OK, passed 100 tests.
0.024591s
checkSumLists:
OK, passed 100 tests.
0.014857s
osool:
OK, passed 500 tests.
15.448749s
checkSumLists:
OK, passed 500 tests.
82.871982s
areEqual:
OK, passed 100 tests.
0.007773s

As you can see, the wins over the naïve version are substantial.

Tagged in , ,

3 Responses on “Sorted sums of a sorted list”

  1. Henry Laxen says:

    It seems to me you left out the definition of row in osool. It would also be nice if you included the imports you need to make this run, namely:

    import Data.Array.Unboxed
    import Control.Monad
    import qualified Data.IntMap as IntMap
    import Test.QuickCheck
    import Data.List
    import Data.Time.Clock

    finally, on ghc6.10.1 I needed to add:

    checkSumLists :: [Int] -> [Int]

    in order to get it to compile.

    Feel free to remove this post after you’ve corrected the errors. This is a neat algorithm, and just what I was looking for. Do you know if it has a name?

    Best wishes,
    Henry Laxen

  2. Henry Laxen says:

    I’ve taken your code and generalized it a little bit. I’m listing it below. Feel free to use it as you see fit, (or not).

    module OrderedCartesianProductOfOrderedListsWithFunction 
      (orderedCartesianProductOfOrderedListsWithFunction) where
     
    import Data.Array
    import qualified Data.Map as Map
     
    cartProdArray f xs ys = 
      array ((0,0),(length xs-1,length ys-1)) $ cartProdwithF
      where
        (ix,iy) = (zip [0..] xs, zip [0..] ys)
        cartProdwithF = [ ((i,j), f xi yj) | (i,xi) <- ix, (j,yj)  b) neighbours then row+1 else row
     -- update the corners to reflect all neighbours of the just-removed corners
     corners' = foldr (\(key,value)-> Map.insertWith (++) key [value]) allCorners $ map (\(x,y,_) -> (x,y)) neighbours
     
     -- a list of values, their positions in the array, and whether to increment row
     neighbours = concatMap (\p-> goRight p ++ goDown p) minCorners
      where
       -- try to get a neighbour further right, within bounds of array
       goRight (x,y) = if (x < maxX)  then [(a ! (x+1,y), (x+1,y),False)] else []
       -- try to get a neighbour further down, within bounds of array and not on the same row as another one
       -- (if successful we should update row)
       goDown  (x,y) = if (y < x && y  row) 
          then [(a ! (x,y+1), (x,y+1),True)] else []
       maxX = (fst$snd$bounds$a)
       maxY = (snd$snd$bounds$a)
     
    orderedCartesianProductOfOrderedListsWithFunction :: 
      forall b b1 e.  (Ord e) =>
      (b -> b1 -> e) -> [b] -> [b1] -> [e]
    -- Note f must be monotonic in each argument  
    orderedCartesianProductOfOrderedListsWithFunction f l1 l2 =
      ocpool a corners 0
     where
      a = cartProdArray f l1 l2
      corners = Map.singleton (a ! (0,0)) [(0,0)]
     
    {-
     
     This can be very useful when you just want the elements of a
     cartesian product of two sets that are less than a given value. 
     For Example, suppose you want all numbers less than 10^50 of the form
     2^i 3^j.  Then the answer is:
     
    t = 
       takeWhile (<10^50) $ orderedCartesianProductOfOrderedListsWithFunction (*)
       (takeWhile (<10^50) [2^i | i<-[1..]])
       (takeWhile (<10^50) [3^i | i<-[1..]])
     
       which is computed almost instantly.
    -}

    Best wishes,
    Henry Laxen

    Porges’ edit: added syntax highlighting

  3. Porges says:

    Hi Henry, thanks for the contributions. I don’t know if this algorithm has been described before (and hence has a name); I came up with it as a solution to the posed question.

    There seems to be something missing in your last post; ‘ocpool’ is referenced but not defined. (I hope this is not due to my edit!)

Leave a Reply