I definitely agree that the NLP library shouldn't have an opinion about your mac...

I definitely agree that the NLP library shouldn't have an opinion about your machine learning, shouldn't give you every accuracy measure under the sun, etc. My original vision for spaCy was smaller. But there are two problems.

1) If I ship you a statistical model, and it's late in a pipeline, like a parser, the earlier components in the pipeline are not swappable. If you change the tokenization, POS tagging, lemmatization etc, the parser model will give you worse output.

This isn't obvious to people, and the problem can be subtle. For instance, some NER models use POS tag features, others don't.

2) The output format isn't actually that convenient. It sucks that everyone has to write this tree processing code, and then aligning the tokenized output back to the original string is a pain, if you want to calculate mark-up.