Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Oddly enough, I ran into an odd case a few months ago with a zip file which required me to use winrar.

In windows, when you extract a zip file that contains japanese characters, they get 'garbled', which can cause problems if you need to maintain the directory and file names. I tried with 7-Zip as well with the same outcome.

I found a fix on stackoverflow [1] which mentioned using winrar, as it had an option to change the name encoding for archived file names. Using that I was able to extract the zip and the files and directories maintained their original japanese names.

[1] - https://superuser.com/questions/554108/extracting-a-zip-file...



infozip also has problems with japanese characters. I ended up solving the problem with the python zip module. considerably more awkward to use but it offers a lot of control over the extraction process.

I did not look it very closely so I don't know what exactly infozip was getting incorrect in my case. but I did find this interesting bug report from 2012. Apparently a lot of encoders are sloppy about the spec and will leave header fields zeroed rather than set them. and if infozip reads a header that says the zipfile was created by dos(a zero) it believes it and extracts it using a dos compatible encoding.

https://sourceforge.net/p/infozip/support-requests/10/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: