Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>The model isn't getting this capability by training on WikiPedia or Reddit

I don't know about the former, but the latter absolutely has sexually explicit material that could make the model more likely to generate erotic stories, flirty chats, etc.



OK, maybe bad example, but it would be easy to create a classifier to identify stuff like that and omit it from the training data if they wanted to, and now that they are going to be selling this I'd assume they are explicitly seeking out and/or paying for creation of training material of this type.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: