Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Neovim: input encoding and you (aktau.be)
64 points by bpierre on Nov 5, 2014 | hide | past | favorite | 30 comments


If this is the general quality of new code in the Neovim code base, I would be fairly worried.

In the first piece of code, which is apparently new to Neovim, there's a bug waiting to happen: casting size_t pointers to int pointers will fail on big-endian platforms where sizeof(size_t) != sizeof(int), and in general, the number of casts sprinkled everywhere is indicative of sloppy coding.

The whole area of code also seems to be poorly reimplementing iconv, which is actually supported in the code, but never enabled in Neovim for some reason, according to the footnote - why have two code paths to do the same thing?


The problem isn't big-endian, its just platforms where sizeof(size_t) != sizeof(int).

I believe the problem is the legacy vim API takes an (int ), so that code will also need to be corrected:

  char_u *string_convert_ext __ARGS((vimconv_T *vcp, char_u *ptr, int *lenp, int *unconvlenp));


Article author here.

You're correct. It's new code that's calling into old, unrefactored code. Hence the casts. convert_input() is new, string_convert_ext() is old. As is mentioned in the article, I'm working on a refactoring of string_convert_ext() that will obviate the casting (and incidentally also some allocating).


"The problem isn't big-endian, its just platforms where sizeof(size_t) != sizeof(int)."

To be honest, my memory is failing me, so I would appreciate it if someone with a fresher understanding of the standards or BE compiler behavior would clarify.

Does a cast from (pointer to int32_t) to (pointer to int64_t) change the memory location referred to? My assumption would be not. If it does not, then there is a problem with this code on big-endian systems that does not exist on little-endian systems:

On a little-endian system, you are only reading/writing the low bits of the lengths. If anything won't fit in an int, that can be a problem. It seems plausible that other things (buffer sizes?) guarantee that can never occur, though it would still be better code not to rely on it.

On a big-endian system, you are only reading/writing the high bits of the lengths as if they were the low bits. That will almost always be wrong.


Neovim is a fork of the original Vim codebase. Much of the code hasn't been touched yet, including the code quoted in the article.


From the article:

> The function is short enough to paste below. It is pretty unique to Neovim, as it was wrought for its brand new event-oriented nature:


> In the first piece of code, which is apparently new to Neovim, there's a bug waiting to happen: casting size_t pointers to int pointers will fail on big-endian platforms where sizeof(size_t) != sizeof(int), and in general, the number of casts sprinkled everywhere is indicative of sloppy coding.

At first I thought you were referring to casting `size_t` to `int` (and the converse) in general, but now I notice you were referring to pointers to `int` to pointers to `size_t`. You are right, of course, and we should be more careful about that. Nevertheless, this specific case will soon dissapear.

To answer the last part of your question:

> The whole area of code also seems to be poorly reimplementing iconv, which is actually supported in the code, but never enabled in Neovim for some reason, according to the footnote - why have two code paths to do the same thing?

I alluded to it in the article itself, and I may only guess as to what Bram's goals where, but I reckon:

1) Speed, for an important conversion like latin-1 to UTF-8, Bram might've wanted some extra juice. This is not uncommon in the Vim codebase. Possibly, one can't (or couldn't when Vim was made) rely on iconv being well-engineered. Also note that Vim was made to run on some pretty bare bones systems. 2) Compatibility: as it stands, (Neo)vim can work just fine without iconv and still do the most useful transforms. This is actually good because iconv is not available everywhere or buggy (switches for disabling it have been requested in vanilla Vim).

I haven't grokked the other encoding sides of vim (file and output), but it's possible I'll discover some things that might preclude an iconv-only approach.


In general they appear to have a problem with mixing lengths between being ints and being size_ts (Decently common problem). In that particular instance they could run into a problem if they compile for 64-bit (64-bit size_t, 32-bit int) and string_convert_ext gives a negative length, since int is signed.


We compile for 64-bit and 32-bit on every commit with travis CI and for many files activate extra warnings such as `-Wconversion`. We are acutely aware of the problem. However, we can't refactor the entirety of the (quite large) Vim codebase in one go. Whenever we add new code or replace old code that calls into more old code (often), we have to decide whether we will also refactor said old code to be more modern.

In fact, we have sections of the projects wiki devoted to this type issue: https://github.com/neovim/neovim/wiki (the specific section: https://github.com/neovim/neovim/wiki/Integer-types-refactor...).

It's not perfect, but you'll see we've put at least some thought into it and we welcome new contributions.

It's a problem of manpower and/or time, not necessarily of misunderstanding C and making rookie mistakes about types as the grandparent seems to imply.

There, had to get that off my chest. (I'm not seeing Neovim's code, old or new, is perfect, but it's probably better than what the tone of the comment implied).


aktau, your HN account appears to be dead. I'm guessing it's because this comment of yours was posted multiple times.


Thanks for the heads up. I went back and pressed submit several times because in my incognito window my comment didn't show up. (I was trying to post while noprocrast was on, didn't expect something I wrote to hit HN). I wonder now what I should do to not become dead anymore? I deleted the superfluous comments, hopefully that's enough. I'll look up what it means to be dead on HN.


Deleting those posts seems to have worked. I can now see your previously elided comments with my "showdead" set to on.


Your account is fine; it isn't and was never "dead". The system just kills duplicate comments.


Just posting to agree with the "decently common problem" comment. I am sure that back in the day I could write;

  int len = strlen( some_c_string );
Without getting a warning! I don't really need to be reminded that the random little strings I create in my code need to be less than 2 billion characters long!


What does endianness have to do with it?


Casting smaller doesn't hurt if the numbers are small and you're little endian - because the ptr will still point at the meat of the scalar (instead of the high bytes with zeros in them as it would if big-endian)


I'm not sure you understand big-endian casting—it's still value based, so you'll still get the least significant bits in the casted value.


The issue was, casting an untyped pointer which does NOT get the conversion you mention.


As a user, what benefit do I have switching from vim to neovim?


The most popular neovim feature (based on reddit's reactions) seems to be job control, although not enough people are using it. You can start an asynchronous job via `jobstart('name', 'prog', ['arg'...])` which will return a job id. You can send data to the job via `jobsend(id, 'text'). Every time the program has some output available, it will trigger an autocmd to let you process it. (This happens synchronously.)

We have refactored some of the code base using modern coding standards, which makes it more hackable. Some of these systems have become more efficient by, for example, using pipes over temp files.

We accept upstream patches in order to stay on top of vim, so I hope that no one will ever not use neovim because of a feature lacking (aside from a GUI--see below).

neovim creates a socket for each instance that external programs can communicate with, allowing one to write a plugin or extension in any language, providing it has a msgpack implementation. These plugins may eventually control things like the GUIs, spell checkers, syntax analysers, allowing nvim to become more focused, like a micro-kernel.

As of right now, many people have switched without having any real problems.


Any plans for nuking vimscript?



No, Vim plugins wouldn't work anymore. Vimscript will be transpiled to Lua under the hood, though, and other languages will be fully supported for writing plugins.


They'll work just fine, by being "transpiled".


Nothing yet. It's not ready. Eventually, better plugins and GUIs without losing what makes vim so productive.


Cool, thanks.


My understanding is that Neovim will make it much easier for plugins to execute tasks in parallel and in the background.

I'd use that for running ctags on save without blocking the editor, for example.


The reason I'm following neovim is that I'm hoping for a future where linting doesn't freeze my editor on save. Sadly,

I do like my gui, so I'll have to wait a bit more


like this new Floobits plugin [0]

[0] https://news.ycombinator.com/item?id=8559146


A plugin like eclim would benefit greatly from the new multi-process model.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: