Tutorial :Space leak in list program



Question:

I am solving some problems of Project Euler in Haskell. I wrote a program for a riddle in it and it did not work as I expected.

When I looked in the task manager when running the program I saw that it was using > 1 gigabyte of RAM on ghc. A friend of me wrote a program with the same meaning in Java and succeeded in 7 seconds.

import Data.List    opl = find vw $ map (\x-> fromDigits (x++[0,0,9]) )           $ sequence [[1],re,[2],re,[3],re,[4],re,[5],re,[6],re,[7],re,[8],re]    vw x = hh^2 == x      where hh = (round.sqrt.fromIntegral) x    re = [0..9]    fromDigits x = foldl1 (\n m->10*n+m) x  

I know this program would output the number I want given enough RAM and time, but there has to be a better-performing way.


Solution:1

The main problem here is that sequence has a space leak. It is defined like this:

sequence [] = [[]]  sequence (xs:xss) = [ y:ys | y <- xs, ys <- sequence xss ]  

so the problem is that the list produced by the recursive call sequence xss is re-used for each of the elements of xs, so it can't be discarded until the end. A version without the space leak is

myseq :: [[a]] -> [[a]]  myseq xs = go (reverse xs) []   where    go [] acc = [acc]    go (xs:xss) acc = concat [ go xss (x:acc) | x <- xs ]  

PS. the answer seems to be Just 1229314359627783009

Edit version avoiding the concat:

seqlists :: [[a]] -> [[a]]  seqlists xss = go (reverse xss) [] []   where     go []       acc rest = acc : rest     go (xs:xss) acc rest = foldr (\y r -> go xss (y:acc) r) rest xs  

note that both of these versions generate the results in a different order from the standard sequence, so while they work for this problem we can't use one as a specialised version of sequence.


Solution:2

Following on from the answer given by Simon Marlow, here's a version of sequence that avoids the space leak while otherwise working just like the original, including preserving the order.

It still uses the nice, simple list comprehension of the original sequence - the only difference is that a fake data dependency is introduced that prevents the recursive call from being shared.

sequenceDummy d [] = d `seq` [[]]  sequenceDummy _ (xs:xss) = [ y:ys | y <- xs, ys <- sequenceDummy (Just y) xss ]    sequenceUnshared = sequenceDummy Nothing  

I think this is a better way of avoiding the sharing that leads to the space leak.

I'd blame the excessive sharing on the "full laziness" transformation. Normally this does a great job of creating sharing that avoids recomputions, but sometimes recompution is very much more efficient than storing shared results.

It'd be nice if there was a more direct way to tell the compiler not to share a specific expression - the above dummy Maybe argument works and is efficient, but it's basically a hack that's just complicated enough that ghc can't tell that there's no real dependency. (In a strict language you don't have these issues because you only have sharing where you explicitly bind a variable to a value.)


Solution:3

EDIT: I think I'm wrong here - changing the type signature to :: Maybe Word64 (which would be enough bits for this problem I think) also takes forever / has a space leak, so it couldn't be the old Integer bug.

Your problem seems to be an old GHC bug (that I thought was fixed) with Integer causing a space leak. The below code finishes in about 150 ms when compiled with -O2.

import Data.List  import Data.Word    main = print opl    opl :: Maybe Word32  opl = find vw $ map (\x-> fromDigits (x++[0,0,9]) ) $ sequence [[1],re,[2],re,[3],re,[4],re,[5],re,[6],re,[7],re,[8],re]    vw x = hh^2 == x      where hh = (round.sqrt.fromIntegral) x    re = [0..9]    fromDigits x = foldl1 (\n m->10*n+m) x  


Solution:4

Since you're looking for a nineteen-digit number with those characteristics found in vw, I'd try to simplify the construction in the mapped function just say fromDigits x*1000+9 for starters. Appending to a list is O(length-of-the-left-list), so throwing those last three digits on the end hurts the computation time a bunch.

As an aside (to you both), using the strict version of the fold (foldl1') will also help.


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »