Hacker Newsnew | past | comments | ask | show | jobs | submit | tjruesch's commentslogin



I haven't done as much testing as I'd like to confidently answer this in general terms. In our own environment we have the benefit of defining the system prompt for translation, so we can introduce the logic of the tags to the LLM explicitly. That said, in our limited general-purpose testing we've seen that the flagship models definitely capture the logic of the tags and their semantic properties reliably without 'explanation'. I'm currently exploring a general purpose prompt sanitizer and potentially even a browser plugin for behind-the-scenes sanitization in ChatGPT and other end-user interfaces.


that's exactly right. PII stays local (and the PII-Tag-Map is encrypted)


That sounds interesting! I've been thinking about using representative placeholders as well, but while they have their strengths, there are also some downsides. We decided to go with an XML tag also because it clearly identifies the anonymized text as being anonymized (for humans) so mixups don't happen. After reading your comment I think it would also be really interesting to be able to add custom metadata to the tags. Like if you have a username that you want to anonymize, but your database has additional (deterministic) information like the gender, we should add a callback for you as the user to add this information to the tag.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: