You can maintain the API while overhauling the models underneath. spaCy so far has had almost no API breakages.
For instance, you get sentences as follows:
doc = nlp(u'Hello world. This is a document.')
for sent in doc.sents:
for word in sent:
...
It doesn't matter to users whether behind the scenes, the sentence boundaries are being calculated from character heuristics, or from the syntactic parse. It was the former, now it's the latter. Similarly, part-of-speech tags are currently predicted in their own processing step. In future they may be predicted jointly with the parsing. The API won't change.
Other libraries ask users to choose between a variety of different statistical models, e.g. they ask you to specify that you want the "neural network dependency parser", or the "probabilistic context-free grammar parser", or whatever. By doing this they tie the API to those models.
spaCy just picks the best one and gives it to you. The benefit is that you don't need to be informed when a new model is implemented, even if the change is quite drastic. The modelling is a transient implementation detail, not exposed in the API.
For instance, you get sentences as follows:
It doesn't matter to users whether behind the scenes, the sentence boundaries are being calculated from character heuristics, or from the syntactic parse. It was the former, now it's the latter. Similarly, part-of-speech tags are currently predicted in their own processing step. In future they may be predicted jointly with the parsing. The API won't change.Other libraries ask users to choose between a variety of different statistical models, e.g. they ask you to specify that you want the "neural network dependency parser", or the "probabilistic context-free grammar parser", or whatever. By doing this they tie the API to those models.
spaCy just picks the best one and gives it to you. The benefit is that you don't need to be informed when a new model is implemented, even if the change is quite drastic. The modelling is a transient implementation detail, not exposed in the API.