Tutorial :C# comparing similar strings



Question:

I have a generic with some filenames (LIST1) and another biggeneric with a full list of names (LIST2). I need to match names from LIST1 to similar ones in LIST2. For example

LIST1  - **MAIZE_SLIP_QUANTITY_3_9.1.aif**    LIST 2  1- TUTORIAL_FAILURE_CLINCH_4.1.aif  2- **MAIZE_SLIP_QUANTITY_3_5.1.aif**  3- **MAIZE_SLIP_QUANTITY_3_9.2.aif**  4- TUTORIAL_FAILURE_CLINCH_5.1.aif  5- TUTORIAL_FAILURE_CLINCH_6.1.aif  6- TUTORIAL_FAILURE_CLINCH_7.1.aif  7- TUTORIAL_FAILURE_CLINCH_8.1.aif  8- TUTORIAL_FAILURE_CLINCH_9.1.aif  9- TUTORIAL_FAILURE_PUSH_4.1.aif  

I've read about Levenshtein distance and used an implementation of it in a Framework (SignumFramework Utilities). It returns me distance=1 in lines 2 and 3. But in my case line 3 is a better match than line 2.

Is there another method better to compare similar strings? Something more flexible?


Solution:1

When comparing as strings, "9.2" is not a better match than "5.1" for "9.1". If you want the version numbers to be evaluated numerically, you have to parse the strings so that you can compare the string parts and the numerical parts separately.


Solution:2

There was a simlar question here, maybe some of the answers there will be relevant?


Solution:3

Your similarity criteria could be a combination of several other criteria. One could be the Levenshtein distance, others might e.g be the longest common substring or prefix/suffix.

The longest common substring problem is actually a special case of edit distance, when substitutions are forbidden and only exact character match, insert, and delete are allowable edit operations (see here).

Further metrics for string similarity are described here.


Solution:4

A regular expression could be used to get the items that match the name. The version number could be collected in a regex group in the match and parsed into a .net object (e.g. decimal) that you could use to compare which one was closest.


Solution:5

There's a fairly exhaustive set of answers to this SO question. At the bottom is link I put up to C# implementations for soundex, double metaphone, PHP similarity and levenstein.


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »