Tutorial :How can I round a column in a single SQL request without changing the overall sum?



Question:

I've got a table defined like this :

create table #tbFoo  (bar float)  

And I'm looking for a way to round every value contained in column bar without changing the total sum (which is known to be an integer, or very close to an integer because of float number precision).

Rounding every value to the nearest integer won't work (ex : 1,5;1,5 will be rounded to 1;1 or 2;2)

It's quite easy to do this using several requests (eg storing the original sum, rounding, computing the new sum, and updating as many rows as needed to go back to the original sum), but this is not a very elegant solution.

Is there a way to do this using a single SQL request?


I'm using SQL Server 2008, so solutions taking advantage of this specific vendor are welcome.


Edit : I'm looking for a request minimizing the differences between the old values and the new ones. In other words, a value should never be rounded up if a greater value has been rounded down, and vice-versa


Solution:1

Update:

See this solution explained in more details in the article in my blog:


You need to keep cumulative offset for each value:

1.2   (1 + 0.0)  ~ 1    1   1.2   +0.2  1.2   (1 + 0.2)  ~ 1    2   2.4   +0.4  1.2   (1 + 0.4)  ~ 1    3   3.6   +0.6  1.2   (1 + 0.6)  ~ 2    5   4.8   -0.2  1.2   (1 - 0.2)  ~ 1    6   6.0   0.0  

This is easily done in MySQL, but in SQL Server you will have to write a cursor or use cumulative subselects (which are less efficient).

Update:

The query below selects the difference between the sums of the values and of those rounded down to the nearest smaller integer.

This gives us the number (N) of values we should round up.

Then we order the values by their fractional part (ones that are closer to their ceiling go first) and round the first N up, the others down.

SELECT  value,          FLOOR(value) + CASE WHEN ROW_NUMBER() OVER (ORDER BY value - FLOOR(value) DESC) <= cs THEN 1 ELSE 0 END AS nvalue  FROM    (          SELECT  cs, value          FROM    (                  SELECT  SUM(value) - SUM(FLOOR(value)) AS cs                  FROM    @mytable                  ) c          CROSS JOIN                  @mytable          ) q  

Here's the script for the test data:

SET NOCOUNT ON  GO  SELECT  RAND(0.20090917)  DECLARE @mytable TABLE (value FLOAT NOT NULL)  DECLARE @cnt INT;  SET @cnt = 0;  WHILE @cnt < 100  BEGIN          INSERT          INTO    @mytable          VALUES  (FLOOR(RAND() * 100) / 10)          SET @cnt = @cnt + 1  END    INSERT  INTO    @mytable  SELECT  600 - SUM(value)  FROM    @mytable  


Solution:2

If you have a list of n values whose elements are accurate only to within an integer value (+-0.5), then any sum of those elements will have a cumulative error or +-(n*0.5). If you have 6 elements in your list which should add up to some number, then your worst case scenario is that you're off by 3 if you just add the integer values.

If you find some way of showing 10.2 as 11 in order to make the sum work, you've changed the precision of that element from +-0.5 to +-0.8, which is counterintuitive when looking at integers?

One possible solution to think about is to round your number during display only (using some format string on your output), not already at the retrieval stage. Each number will be as close as possible to the actual value, but the sum will be more correct too.

Example: If you have 3 values of 1/3 each, displayed as whole-numbered percentages, then you should be showing 33, 33 and 33. To do anything else is to create a margin of error greater than +-0.5 for any individual value. Your total should still be displayed as 100%, because that is the best possible value (as opposed to working with sums of already rounded values)

Also, be aware that by using a float, you've already introduced a limitation on your precision because you have no way of accurately representing 0.1. For more on that, read What Every Computer Scientist Should Know About Floating-Point Arithmetic


Solution:3

First get the difference between the rounded sum and the actual sum, and the number of records:

declare @Sum float, @RoundedSum float, @Cnt int    select @Sum = sum(bar), @RoundedSum = sum(round(bar)), @Cnt = count(*)  from #tbFoo  

Then you spread the difference equally on all values before rounding:

declare @Offset float    set @Offset = (@Sum - @RoundedSum) / @Cnt    select bar = round(bar + @Offset)  from #tbFoo  

Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »