For de-anonymizing, the idea is to give the encrypted service the plain text and get a matching token. But then that will be more of a hash. If you are encrypting where all the tokens are different, you can't do a join or analysis. You can't for instance count how many unique phone numbers you have. If a user is using your app, how do they see their PI data?
> If you are encrypting where all the tokens are different, you can't do a join or analysis.
That would hopefully be part of the reason for doing it this way.
I once worked on a system where we encrypted most customer data on registration and took it entirely off line once a day (so new data was in encrypted form online for a day, and then was air-gapped permanently).
The fact that marketing etc. had to request reports to be run manually on the airgapped customer database was an important barrier that made them think about how they could meet their needs without it.
Sometimes, of course, they had genuine needs that needed access to the unencrypted data, but it was rare.
I'm a big fan of making it take extra effort to do these things - time and resources seems to be a far stronger barrier than requiring authorization.
You're correct that it does make certain kinds of analysis more difficult.
However that doesn't mean we can't ever get access to the original data.
Most of our current BI needs to can be met using the un-encrypted data, but for example, if we did want to answer your phone number question, we could craft a special purpose program to perform the analysis without compromising user privacy.
1. Select all phone number tokens
2. Decrypt
3. Produce counts (total unique, etc)
Said program would have to go through normal code review and approvals, and then deployed into the secure zone (so it could access the encryption service).