Tutorial :How well does .NET dictionary resolve collisions?



Question:

I have a problem with a custom object that needs to be keyed for a table. I need to generate a unique numeric key. I'm having collision problems and I'm wondering if I can leverage a dictionary to help me. Assume I have an object like this:

class Thingy  {      public string Foo;      public string Bar;      public string Others;  }  

and so on with more fields. Lets say Foo and Bar are my key fields - if they're equal between two Thingys, then the two objects should be considered equal (one may represent an update to the other, with Others fields being updated.) So I have these:

public override bool Equals(object obj)  {      Thingy thing = (Thingy)obj; // yes I do type check first      return (this.Foo == thing.Foo && this.Bar == thing.Bar);  }    public override int GetHashCode()  {      return (this.Foo + this.Bar).GetHashCode(); // using default string impl  }  

so this works for the most part, but there are rare occasions where two Thingys that are actually different have the same hash code.

My question is this: could I use a Dictionary<Thingy, int> where I put in my Thingys, and use a sequential value coming out of the dictionary as my actual key? I'm wondering if the Dictionary, when detecting a rare hash code collision, will call my Equals method, determine that the objects are actually different, and store them differently. I imaging then when looking it up, it would see a bucket for that hash and search for the correct Thingy, again using Equals for comparison.

Is this the case with dictionary, or does it only resolve collisions where the hash code is different, but (hash % size) is the same? If this won't work, what might?


Solution:1

Hash collisions only affect performance, not integrity.

A simple test would be to change GetHashCode() to simply return 1;. You'll note that the dictionary still behaves properly, but with any reasonable dataset, it will perform terribly.


Solution:2

Hash collisions will primarily affect performance - not correctness. So long as Equals() behaves correctly.

Dictionary uses the hash code as a way to organize items into separate "buckets". If too many items share the same hash code, you can run into performance problems. However, as long as Equals() can correctly distinguish between instances, you should get correct results.

Where hash codes can result in problems is with mutable objects. If your Thingy class allows Foo or Bar to change for an item in the dictionary, you may then fail to find it in a subsequent access attempt. This is because the hash code produced now differs from the one used to store the value in the dictionary.


Solution:3

GetHashCode is designed for use in hash tables, where collisions need to be minimized but not eliminated. If you need to generate a truly unique key, GetHashCode is a reasonable starting point (and not as excessively long as a guid), but you will need to store the key as part of the object and maintain a list of used keys seperately.

While you may be able to retrieve something that looks usable from the internals of Dictionary, it probably won't work reliably - for example if you add more items than the dictionary was initially allocated to handle, the underlying data structure will get rebuilt and individual items could end up in a completely different part of the dictionary.


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »