Use Levenshtein distance - http://en.wikipedia.org/wiki/Levenshtein_distance

ntoshev · on March 26, 2010

Thanks, my boolean approximation for abbreviations so far is:

  def abbr(short, full):
      return re.match(''.join(c+'.*' for c in short), full)

Actually what I'm doing now seems to be working, so far I can't see any pattern in the things my algo can't match by itself.

Also thanks to ramanujan, who deleted his comment for some reason, but besides pointing out orgmode which I want to check, his proposition reminded me that I'm trying to deal with my dataset incrementally while a batch mode might work better.