Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you only need to receive, store, and send text, Unicode is easy enough and you can just treat it as a byte stream. Once you get into things like manipulating text, comparisons and searches, or displaying text, things get hairy and all kinds of fun algorithms from the various Unicode Technical References and Notes make their appearance. Those parts are the ones that increase complexity.

Also, a major reason why Unicode is large and complex is because languages and scripts are large and complex. Unless we all agree on using simple computer-friendly languages and scripts that complexity is not going to change, and the need of working with older scripts (e.g. for historians and researchers) still requires something like Unicode. Unicode is the kind of thing that emerges from a messy world, and unsurprisingly it's messy as well.



Unicode is still _way_ less hard than anything else for manipulating text. Global human written language is complicated, unicode is a pretty ingeniously designed standard, it's got solutions that work pretty darn well for almost any common manipulation you'd want to do. Now, everything isn't always implemented or easily accessible on every platform, and people don't always understand what to do with it -- because global human written language is complicated -- but unicode is a pretty amazing accomplishment, quite successful in various meanings of 'succesful'.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: