Thanks, my boolean approximation for abbreviations so far is:
def abbr(short, full):
return re.match(''.join(c+'.*' for c in short), full)
Actually what I'm doing now seems to be working, so far I can't see any pattern in the things my algo can't match by itself.
Also thanks to ramanujan, who deleted his comment for some reason, but besides pointing out orgmode which I want to check, his proposition reminded me that I'm trying to deal with my dataset incrementally while a batch mode might work better.