Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"NTFS allows any sequence of 16-bit values for name encoding (file names, stream names, index names, etc.) except 0x0000. This means UTF-16 code units are supported, but the file system does not check whether a sequence is valid UTF-16 (it allows any sequence of short values, not restricted to those in the Unicode standard). "

- from wikipedia NTFS page [1]

So if you assume that NTFS filename is valid UTF-16 and convert it to UTF-8 there might be a problem. Basically they can be any sequence of 16-bit values.

  [1] https://en.wikipedia.org/wiki/NTFS


There was a time when some of our customers had lots of problems with gigantic files on their drives that was impossible to delete with windows explorer. I would come home to them and help them delete the files with the command line using filename*.ext to catch them. My guess was that the filename had some protected characters that windows explorer didn't allow. Don't remember how they ended up with the files but most likely some download program and someone having a laugh :-)


Doesn't it (or Windows) also disallow the path component separator character(s) ('/' and '\')?

Unix and alike disallow NULs and /, for obvious reasons.


There are a number of characters like path separators that cannot be part of a file name on windows. However I am not sure if this is enforced by the OS APIs or by NTFS itself. It is entirely possible that NTFS could allow something that higher layers don’t.


If the kernel (and SMB, and...) imposes these constraints, it's fine for the filesystem to not also impose the same constraints on file naming.


You could break NTFS into accepting this. Fun things happen, for a specific definition of "fun".




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: