Tutorial :Selecting a good dictionary key



Question:

I have an object that I want to use to look up other objects. I will be using a Dictionary<TKey, TValue>().

The key object has two strings that uniquely identify it, say KeyObj.Str1 and KeyObj.Str2.

What do you recommend that I use as the key for the dictionary?

1: The concatenation of the strings.

Dictionary<String, TValue>();  Key = KeyObj.Str1:KeyObj.Str2; ("somestring:anotherstring")  

2: A unique integer for each object to identify it?

Dictionary<int, TValue>();  KeyObj.ID = _nextID++;  Key = KeyObj.ID;  

3: A reference to the object.

Dictionary<KeyObj, TValue>();  Key = KeyObj;  

Option 3 would be the easiest, but it seems like it would be inefficient to index a dictionary based on reference values.

If the key object contained a single unique string, the obvious choice would be use that, but having two strings that are only unique in combination makes it more difficult.


Solution:1

Concatenated strings should work best.

IF you know that their combination is unique, then that is what you should choose -- remember that Hash code is usually unique, but not always.


Solution:2

You could use option 3 if you can override GetHashCode() and Equals() appropriately, i.e. something like this:

    public override int GetHashCode()      {          return str1.GetHashCode() ^ str2.GetHashCode();      }        public override bool Equals(object obj)      {          if (!obj is KeyObj)          {              return false;          }            KeyObj key = (KeyObj)obj;          return this.str1.Equals(key.str1) && this.str2.Equals(key.str2);      }  


Solution:3

I would say option 1.


Solution:4

what about using the KeyObj.GetHashCode()?


Solution:5

Any of them are valid, but I'm assuming you'd want to be able to quickly find these objects based on one of the two strings, so using an int as the key would mean you'd still have to scan the values to find the object you wanted.

Are the strings both unique, or only when combined? If they're both unique, and you're willing to trade a bit of space, you could do:

dict.Add(KeyObj.Str1, KeyObj);  dict.Add(KeyObj.Str2, KeyObj);  

and have two references to the object in the dictionary, using each unique string as a key. Or, you could always just combine the strings if they're only unique together, and it'll use the hashcode internally to look them up.


Solution:6

Concatenating them is probably the best idea. You can expose a property in the KeyObj object that does the concatenation so you don't have to perform it each time you're accessing the dictionary value.

Edit:

I apparently misread the question. I think what you really want to do is a mix of 1 and 3, you can override Equals() and GetHashCode() to use the strings that uniquely identify the object (just make sure they are immutable!)

public override Equals(object obj)   {     if (obj == null || !(obj is KeyObj))        return false;     KeyObj other = (KeyObj)obj;     if (this.Key1 == other.Key1 && this.Key2 == other.Key2)       return true;     return false;  }    public override GetHashCode()  {      return (this.Key1 + this.Key2).GetHashCode();  }  

Then you can use the 3rd option you suggested:

Dictionary<KeyObj, ValueObj>...  


Solution:7

You don't need to use a new class as the dictionary key. Using a new struct instead as it will be much more lightweight... And have it consist of those two string values obviously.


Solution:8

If performance is a major consideration, you could consider using a hashvalue of the two strings. But then your 'value' field would have to contain both the keys and the value.

I have a reference to another SO question, I just have to find it.

Is it faster to search for a large string in a DB by its hashcode?

But that question is more DB oriented. And the performance is considered for thousands of iterations.


Solution:9

Remember that a dictionary is a glorified hash table, so the key (no pun intended) is to use a key that will result in very few (if any) collisions with another key. I'd lean toward #3, but that's assuming the KeyObj type has a good hash value generator.


Solution:10

string as key is the best, see my test code:

var tupleKeyDict = new Dictionary, string>();

        for (int i = 0; i < 1000000; i++)          {              tupleKeyDict.Add(new Tuple<int, int>(i,0),i.ToString() );          }            System.Diagnostics.Stopwatch stopWatch = new Stopwatch();          stopWatch.Start();          string e1 = tupleKeyDict[new Tuple<int, int>(0, 0)];          string e2 = tupleKeyDict[new Tuple<int, int>(500000, 0)];          string e3 = tupleKeyDict[new Tuple<int, int>(999999, 0)];          stopWatch.Stop();          Console.WriteLine("Tuplekey cost(tick): " + stopWatch.ElapsedTicks.ToString());          Console.WriteLine("Tuplekey cost(ms): " + stopWatch.ElapsedMilliseconds.ToString());                    var strKeyDict = new Dictionary<string, string>();            for (int i = 0; i < 1000000; i++)          {              strKeyDict.Add(i.ToString() + ":0", i.ToString());          }            System.Diagnostics.Stopwatch stopWatch2 = new Stopwatch();          stopWatch2.Start();          string se1 = strKeyDict["0:0"];          string se2 = strKeyDict["500000:0"];          string se3 = strKeyDict["999999:0"];          stopWatch2.Stop();          Console.WriteLine("strkey cost(tick): " + stopWatch2.ElapsedTicks.ToString());          Console.WriteLine("strkey cost(ms): " + stopWatch2.ElapsedMilliseconds.ToString());                  var intKeyDict = new Dictionary<int, string>();            for (int i = 0; i < 1000000; i++)          {              intKeyDict.Add(i, i.ToString());          }            System.Diagnostics.Stopwatch stopWatch3 = new Stopwatch();          stopWatch3.Start();          string ie1 = intKeyDict[0];          string ie2 = intKeyDict[500000];          string ie3 = intKeyDict[999999];          stopWatch3.Stop();          Console.WriteLine("intkey cost(tick): " + stopWatch3.ElapsedTicks.ToString());          Console.WriteLine("intkey cost(ms): " + stopWatch3.ElapsedMilliseconds.ToString());  

Output: Tuplekey cost(tick): 104 Tuplekey cost(ms): 0 strkey cost(tick): 12 strkey cost(ms): 0 intkey cost(tick): 66 intkey cost(ms): 0


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »