Tutorial :Dividing a list of values into three equal subtotals


I have a list of numbers which total 540000. I would like to sort this list into 3 lists which each total 180000. What is the most efficient programming method to do this, assuming that the list of numbers is a flat file with a number per line?


Sounds like a variation of the Knapsack problem . It would be useful to know the size of these numbers, and count - are there huge variations in size, or are they all similar in scale - are there lots of them (>1000) or just a few (<100)?

One quick and dirty method would be to sort them into size order - largest to smallest - then loop over them, putting the first in the first list, the second into the second list, the third into the third list, and then go back and put the fourth into the first list... and so on. May work quite well for lots of small-ish numbers... but there are other approaches for different types fo dataset.


for i as integer = 1 to 180000  put data in array 1  next i    for i as integer = 180001 to 360000  put data in array 2  next i    for i as integer = 360001 to 540000  put data in array 3  next i  


I've written some Java code to do most of the work for you.

The smaller of the methods takes a list of numbers and a total to be achieved, and it returns a set of lists of numbers that add up to that total. You could run it with 18000 and your list of numbers.

For each list of numbers returned, you need to make a new list that is missing the numbers already used, and run the method on 18000 and that again.

If this second invocation returns one or more lists, you'll know the problem is soluble because the numbers remaining will also add up to 18000.

Anyway, here's the code. Yes, it's just recursive brute force. It's very likely there's no proven method to consistently do better by any other method. Don't blame me if it runs for a long time; you may want to try it with smaller examples first.

import java.util.*;     public class Listen {       private static Set<List<Integer>> makeFrom(int total, List<Integer> numbers) {        Set<List<Integer>> results = new HashSet<List<Integer>>();        List<Integer> soFar = new ArrayList<Integer>();        makeFrom(results, total, soFar, numbers, 0);        return results;     }       private static void makeFrom(Set<List<Integer>> results, int total, List<Integer> soFar, List<Integer> numbers, int startingAt) {        if (startingAt >= numbers.size()) return;        for (int p=startingAt; p<numbers.size(); p++) {           Integer number = numbers.get(p);           List<Integer> newSoFar = new ArrayList<Integer>(soFar);           newSoFar.add(number);           int newTotal = total - number;           if (newTotal < 0) continue;           if (newTotal == 0) {              Collections.sort(newSoFar);              results.add(newSoFar);           } else {              List<Integer> newNumbers = new ArrayList<Integer>(numbers);              newNumbers.remove(number);              makeFrom(results, newTotal, newSoFar, newNumbers, startingAt + 1);           }        }     }       public static void main(String[] args) {        List<Integer> numbers = new ArrayList<Integer>();        for (int j=1; j<11; j++) numbers.add(j);        for (List<Integer> result : makeFrom(25, numbers)) {           System.out.println(Arrays.deepToString(result.toArray(new Integer[result.size()])));        }     }  }  


As ian-witz already remarked, this is probably a problem of the NP-complete sort: This means there's no really good solution for the general case, short of trying all possibilities. Algorithms that do this tend to become spectacularly slow as the amount of data they deal with increases.

That said, here's my algorithm for forming sub-lists having a specified sum from a given list of integers:

Set up a place to hold your results. The results will all be lists of numbers, each some sub-set of your original list. We don't know how many such lists will result; possibly none.    Put your list of numbers into an array so you can refer to them and access them by index. In the following, I'm assuming array indices starting at 1. Say you have 10 numbers, so you put them into a 10 element array, indexed by positions 1 through 10.    For performance reasons, it may help to sort your array in descending order. It's not necessary though.    Run a first index, call it i, through this array, i.e. through index values 1 through 10.   For each index value:    select the number at index position i, call it n1.    set up a new list of numbers, where we will be assembling a sub-list. call it sublist.    add n1 to the (so far empty) sublist.    If i is already at 10, there's nothing more we can do. Otherwise,    Run a second index, call it j, through the arrray, starting at i+1 and going up to 10.    For each value of j:      select the number at index position j, call it n2.      add n2 to the sublist containing n1      calculate the sum of our sublist so far: Does it add up to 18000?       If the exact total is reached, add the current sublist to our result list.      If the total is exceeded, there's nothing we can add to make it better, so skip to the next value of j.      If the total is less than 18000, you need to pick a third number.      Run a third index, call it k, through the array, starting at j+1 and going up to 10. Skip this if j is already at 10 and there's no place to go.      For each value of k:        select the number at k, call it n3        add n3 to the sublist        check the sublist total against the expected total        if the exact total is reached, store the sublist as a result;         if it's less than the expected, start a 4th loop, and so on.          When you're done with checking a value for a loop, e.g. n4, you need to take your latest n4, n3 or whatever back out of the sublist because you'll be trying a different number next.        Whenever you find a combination of numbers with the correct sum, store it in your results set.    When you've run all your loop counters into the wall (i.e. i is 10 and there's nowhere left to go), your "results" set will contain all sub-lists of the original list that added up to the desired total. It's possible there will be none, in that case there's no (exact) solution to your problem.    If you have 3 or more sub-lists in your results set, you can check if you can find a pair of them that use non-overlapping sets of numbers from the original list. If you have 2, then there should also be a 3rd sub-list containing exactly all the numbers not contained in the first 2 lists, and you have your solution.  

My sample code doesn't do a series of loops; instead, it does one loop going from 1 to (say) 10 and looking for 18000. Then, let's say the first number chosen was 2000, the function calls itself again recursively with a hint to start at 2 (= i + 1) and to try to assemble a total of 16000. That call of the function then calls itself again with a starting position of (whatever + 1) and a total of (16000 - whatever), and it keeps calling itself that way with subsets of the original problem until there's no more room for the indexes to go up. If it finds a "good" sub-list on the way, it stores it in the result set.

How to implement this efficiently depends on the language you're doing it in. FORTRAN 77 doesn't have recursion, Lua doesn't implement lists or sets efficiently, Lisp may have trouble efficiently indexing into a list. In Java, I might use a bitset rather than a sublist. I know nothing about P4GL, so: For implementation, you're on your own!


This has the smell of NP-hardness to me - in which case there is no 'efficient' way to do it. Although you could probably come up with any number of heuristics that could tackle it pretty well.

Having said that you'll still have problems with lists like [179998, 180001, 180001] :)

Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Next Post »