Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: Could you recommend language agnostic NLP tools?
3 points by assane101 on July 6, 2018 | hide | past | favorite | 7 comments
I just build a spell-checker for Wolof, my native language, using some basic rules and a dictionary I managed to put together. I need your help finding open source tools for NLP that are language agnostic or do not require lot of heavy lifting to adapt to a new locale.

Thanks for your help.

If you would like to test my spell-checker : https://digibox.info/apps/experiments/wolofix/



Polyglot [0] is a python multilingual NLP toolkit. The quality is not great, but it supports a lot of languages.

[0] https://github.com/aboSamoor/polyglot


Far from an expert but I was just discussing this with a former colleague about a specific problem he is considering and I found this: https://www.r-bloggers.com/natural-language-processing-for-n...


I see Wolof is under "Upcoming UD Languages", I know nothing about R but I see what I can contribute and/or get from there. Thanks!


The Lucene API has a lot of language specific tokenizers and analyzers that will help normalize what a term is in the index regardless of language. You can then apply various statistical NLP methods which tend to be more language agnostic.


I work in NLP at a company that actually develops language agnostic solutions, but I'm not aware of any open-source tool that can do this.

Nonetheless, if you can be more specific about what kind of tools you are looking for maybe I can give you some pointers.


Thanks for your reply. If you don't mind sharing a link to your company's website or products, I would appreciate.

These are some areas of interest to me : 1- Translation : ie French->Wolof 2- Speech understanding & question answering systems 3- Text to speech .. among others. (I will work day and night to build training samples if I have the tools)


Sure, the company is Babelscape (http://babelscape.com). For the translation tasks you can find massive parallel dataset with several language pairs at http://opus.nlpl.eu/, the other two things that you mentioned are not really in my area of expertise so nothing comes to my mind at the moment.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: