I'm trying to match off names between 2 bases. I essentially need to find all situations where there is an 80% match off between names. Does anyone know how to do this? If not, can you please tell me where i should post this to get this information.
I used the SPEDIS function in SAS to compare two names. The function will return a score depending on the similarity between the words, then you can use some threshold to filter out the name matches. It is not obvious what a 80 % match means therefore I think it's better to use something like SPEDIS.
Have a look at http://support.sas.com/documentation/cdl/en/lrdict/61724/HTML/defau...
String Comparators are old neews in statistics. RegEx like Soundex operators are specialized but all do essentially the same thing, but according to your "model". Not having such a model only to depend on a default PROC like SPEDIS is always computing, not analytics;
"sam" has a nice site on similarity metrics in general, inclusing string comaprators and some of their computing-cost v. benefit discussions.