Tutorial :HashSet . slow performance in big set



Question:

I have encountered a problem i cannot find a solution. I am using a HashSet to store values. The values I store is of the custom type Cycles where i have overridden the HashCode and equals as following in order to make sure the slow performance is not cuased by the hascode or the equal methods Also i have set the initial capacity of the hashset to 10.000.000

@Override  public int hashCode() {   final int prime = 31;   int result = 1;   result = prime * result + (int) (cycleId ^ (cycleId >>> 32));   return result;  }    @Override  public boolean equals(Object obj) {   if (this == obj)   return true;   if (obj == null)   return false;   if (getClass() != obj.getClass())   return false;   Cycle other = (Cycle) obj;   if (cycleId != other.cycleId)   return false;   return true;  }  

After the first 1.500.000 first values when i try to add a new value (with the add method of the HashSet class) the program is very slow. Eventually i am going to have java out of memory exception (Exception in thread "Thread-0" java.lang.OutOfMemoryError: Java heap space) before the stored values reach the 1.600.000

The IDE i use is Eclipse. So the next step was to increase the JVM heap size from the default value to 1 giga (using the commnads Xmx1000M and Xms1000M) Now the elipse starts with 10 times more memory available (i can see that in the bottom right where the total heap size memory and used memory is shown) but again i have the same "slow" performance and the same out of memory error IN THE SAME VALUES as before (after the 1.500.000 and before 1.600.000) which is very odd.

Does anyone has an idea what it might be the problem?

Thank you in advance


Solution:1

You don't want to increase the JVM heap for Eclipse, you want to set it for your program.

Go to Run > Run Configurations (or Debug Configurations) and set the VM Options there.


Solution:2

Not enough heap memory (increase it via -Xmx, e.g. -Xmx512m). When free memory goes very low, then much, much time is spent by the garbage collector which furiously scans the heap for unreachable objects.

Your hashCode() is fine, extra points for using all bits of the cycleId long.

Edit. Now I saw you did increase the memory, and didn't help. First of all, are you sure you did manage to increase the memory? You could check this by jconsole, connect to your app and see its heap size.

For an alternative explanation to be verified, is there any particular pattern in your cycleId that could make this hashCode() implementation bad? Like, its 32 high order bits are mostly similar to the 32 low order bits. (Yeah, right).

But no. Even if that would be the case, you would be seeing a gradual degradation of performance, not a sharp drop at a specific point (and you do get a OutOfMemoryError and frenzy gc operation). So my best guess is still a memory issue. You either didn't increase the heap size as you thought, or there is some other code grabbing memory at some point. (You could use a tool like VisualVM to profile this, and get a heap dump upon OOME, and see what objects it contains).

Edit2 I made bold the correct part of the above.


Solution:3

A memory size available for the application you start from Eclipse should be configured from the Run menu. Try:

Run -> Run Configurations -> Arguments -> VM Arguments -> -Xmx1000M

The reason why your program is slow is Garbage Collector - it starts each time a memory is going to be out of the limit.


Solution:4

Have you tested your hashCode method implementation? it always returns 31, for any value of circleId. Not strange that your HashMap works slow, it has a linear performance.


Solution:5

If you want to increase the memory your program can use it won't help to increase Eclipse's heap size. You must put the parameter into the launch configuration's vm parameters of your program.


Solution:6

JVM throws 'out of memory' NOT based on available memory. It is thrown when time being spent on the garbage collection is too much. check this. Exact implementation details vary based on JVM and the garbage collector implementation.

Increasing memory would not help in this case. You may have to choose another approach.


Solution:7

Maybe your computer doesn't have enough memory, hence it has to swap to disk.


Solution:8

How are you initializing your HashSet? You need to be aware of its growth pattern. With every add operation, it checks whether it is getting close to capacity. If it reaches a certain point (determined by its 'load factor'), it performs a resizing operation which can be expensive. From the JavaDoc (of HashMap - the collection that backs HashSet):

As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.


Solution:9

I'm pretty disappointed at the number of answers telling the OP to increase his heap size in his application. That's not a solution--that's a quick-and-dirty patch, which won't address any underlying problem.

I found this presentation extremely informative: http://www.cs.virginia.edu/kim/publicity/pldi09tutorials/memory-efficient-java-tutorial.pdf

Mainly the page listing the minimal byte sizes of each when empty--

ArrayList: 40 or 48  LinkedList: 48  HashMap: 56 or 120  HashSet: 72 or 136  

Turns out that a HashSet is practically a HashMap, and (counterintuitively) takes up more more memory despite holding only values instead of key-value pairs.


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »