Tutorial :Need to store string as id for objects in some fast data structure



Question:

I'm implementing a session store for a web-server. Keys are string and stored objects are pointers. I tried using map but need something faster. I will look up an object 5-20 times as frequent than insert.

I tried using hash-map but failed. I felt like I got more constraints than more free time.

I'm coding c/c++ under Linux. I don't want to commit to boost, since my web server is going to outlive boost. :)

This is a highly relevant question since the hardware (ssd disk) is changing rapidly. What was the right solution will not be in 2 years.


Solution:1

I was going to suggest a map, but I see you have already ruled this out.

I tried using map but need something faster.

These are the std::map performance bounds courtesy of the Wikipedia page:

  • Searching for an element takes O(log n) time
  • Inserting a new element takes O(log n) time
  • Incrementing/decrementing an iterator takes O(log n) time
  • Iterating through every element of a map takes O(n) time
  • Removing a single map element takes O(log n) time
  • Copying an entire map takes O(n log n) time.

How have you measured and determined that a map is not optimised sufficiently for you? It's quite possible that any bottlenecks you are seeing are in other parts of the code, and a map is perfectly adequate.

The above bounds seem like they would fit within all but the most stringent scalability requirements.


Solution:2

The type of data structure that will be used will be determined by the data you want to access. Some questions you should ask:

  1. How many items will be in the session store? 50? 100000? 10000000000?
  2. How large is each item in the store (byte size)?
  3. What kind of string input is used for the key? ASCII-7? UTF-8? UCS2? ...

Hash tables generally perform very well for look ups. You can optimize them heavily for speed by writing them yourself (and yes, you can resize the table). Suggestions to improve performance with hash tables:

  1. Choose a good hash function! this will have preferably even distribution among your hash table and will not be time intensive to compute (this will depend on the format of the key input).
  2. Make sure that if you are using buckets to not exceed a length of 6. If you do exceed 6 buckets then your hash function probably isn't distributing evenly enough. A bucket length of < 3 is preferable.
  3. Watch out for how you allocate your objects. If at all possible, try to allocate them near each other in memory to take advantage of locality of reference. If you need to, write your own sub-allocator/heap manager. Also keep to aligned boundaries for better access speeds (aligned is processor/bus dependent so you'll have to determine if you want to target a particular processor type).

BTrees are also very good and in general perform well. (Someone can insert info about btrees here).

I'd recommend looking at the data you are storing and making sure that the data is as small as possible. use shorts, unsigned char, bit fields as necessary. There are other additional ways to squeeze out improved performance as well such as allocating your string data at the end of your struct at the same time that you allocate the struct. i.e.

struct foo {    int a;    char my_string[0]; // allocate an instance of foo to be                        // sizeof(int) + sizeof(your string data) etc  }  

You may also find that implementing your own string compare routine can actually boost performance dramatically, however this will depend upon your input data.


Solution:3

It is possible to make your own. But you shouldn't have any problems with boost or std::tr1::unordered_map.

A ternary trie may be faster than a hash map for a smaller number of elements.


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »