I responded to the OP, but I'll add here too for the conversation. Where are the...

jerrysievert · on March 25, 2022

> The performance gains would have to either come from a faster (JIT'd) protocol parser, or from eliminating additional FFI calls (that it isn't obvious why they'd exist).

or, as I noted above, and will try to simplify here:

let's suppose you have a buffer which contains 5 key/value pairs, and you want to convert that into an array of 5 javascript objects in v8.

1. obtain a pointer to the isolate 2. create an array object, and check 3. create 5 objects, and check each one 4. create 5 key objects, and check each one 5. create 5 value objects, and check each one 6. convert each key into a v8 typed object, and check each one 7. convert each value into a v8 typed object, and check each one 8. attach each key to each object, and check each one 9. attach each value to each object, and check each one 10. attach each object to the array, and check each one 11. check that array one more time 12. done

vs.

1. obtain a pointer to the isolate 2. create a buffer, and check 3. parse in javascript

inside of v8, creating a javascript object inside of javascript is much faster than doing the same via c++, because of the additional checks that are needed each time you cross that barrier.

    Local<v8::Value>::New(isolate, String::NewFromUtf8(isolate, t).ToLocalChecked());

is an example of simply creating a string, not assigning it into an object.

w1nk · on March 25, 2022

Hey Jerry!

I'll respond here so we don't have split threads :). I'm also quite familiar with at least the older versions of the node internals (I too maintained a popular db binding for a number of years) and I'm very confused by the way you're positioning the operation of the v8 vm in these 2 scenarios. Sure, the c++ is going to require you to do some sanitizing as you force your data into v8, but as we noted that's inevitable no matter how you slice it. After that though, even once in javascript land, you're still crossing these barriers constantly to allocate data and objects and memory, etc. You don't just end at 'parse in javascript', the virtual machine is going nuts calling into this same c++ codebase. Now maybe in some cases the v8 internals offer some advantages the generic c++ api can't access, but this argument isn't convincing of their existence so far.

My memories of the redis client is different than yours so I'd be quite interested to see those conversations / benchmarks. From what I recall those early advantages in the js redis client were similar to the ones we're seeing here, ie: better pipelining of commands.

As a simple thought experiment, in the scenario you're describing we should see a javascript implementation of a JSON parser to beat the pants off the v8 engine implementation, but this doesn't seem to the case.

jerrysievert · on March 25, 2022

> Sure, the c++ is going to require you to do some sanitizing as you force your data into v8

it's not just sanitizing, there's a lot more to the object creation inside v8 itself. but, even if it were just sanitizing, that mechanism has become a lot more complicated than it ever was in v8 3.1 (timeframe around node 0.4) or 3.6 (timeframe around node 0.6). when interacting with c++, v8 makes no assumptions, whereas when interacting with javascript, a large number of assumptions can be made (e.g. which context and isolate is it being executed in, etc).

> but as we noted that's inevitable no matter how you slice it.

yes, from c++ to javascript and back, but when you need to make that trip multiple times, instead of once, that interchange adds up to quite a bit of extra code executed, values transformed, values checked, etc. sure, banging your head against a wall might not hurt once, but do it 40 times in a row and you're bound to be bloodied.

> Now maybe in some cases the v8 internals offer some advantages the generic c++ api can't access

by a fairly large margin, as it turns out, especially as v8 has evolved from the early 3.1 days to the current 9.8: 11 years. there has been significant speedup to javascript dealing with javascript objects compared to c++ dealing with javascript objects. see below.

> My memories of the redis client is different than yours so I'd be quite interested to see those conversations / benchmarks.

super easy to find, all of that was done in public: https://github.com/redis/node-redis/pull/242 - there are multiple benchmarks done by multiple people, and the initial findings were 15-20% speedup, but were improved upon. the speedup was from the decoding of the binary packet, which was passed as a single buffer, as opposed to parsing it externally and passing in each object through the membrane.

> As a simple thought experiment, in the scenario you're describing we should see a javascript implementation of a JSON parser to beat the pants off the v8 engine implementation, but this doesn't seem to the case.

that's a bit of a straw man argument. especially given that JSON.parse() is a single call and does not require any additional tooling/isolates/contexts to execute, it's just straight c++ code with very fast access into the v8 core:

    Local<v8::Value> result = Local<v8::Value>::New(isolate, JSON.Parse(jsonString));

but, let's take your straw man a little further. let's suppose that all of the actual parsing is done for you already, and all you're doing is iterating through the data structure, creating objects through the c++ api, and calling it good. that should be faster than calling the c++ JSON.parse(), shouldn't it? since we don't have to actually parse anything, right? no, it's actually much slower. you can see this in action at https://github.com/plv8/plv8/blob/r3.1/plv8_type.cc#L173-L60...

again, we're not talking about whether javascript in an interpreter is faster than c++, we're talking about whether v8's api causes enough slowdown that some workloads that require a lot of data between c++ and javascript are slower than the same workload that requires very little data between c++ and javascript ... because passing through v8's c++/javascript membrane is slow.

eyelidlessness · on March 26, 2022

I can’t link to any projects, implementing network protocols per se isn’t what I focused on researching this. My focus was on optimizing postMessage between threads. They’re not that different but the important distinction is that the user-facing API is in JS, not a network boundary.

My hypothesis was that:

1. Converting to binary data

2. Using facilities for shared memory

Would yield better performance in a language well suited to de/serialize binary data, with a message encoding designed for performance. The latter does perform better! If you need to spend a lot of CPU time wrangling binary data, it’s a clear win. If you have a workload with many small messages, they invariably slow down compared to structuredClone. If your workload can avoid crossing the boundary that’s a clear win still. But the moment JS VM values need to get shuttled around, structuredClone is specifically optimized for structuredClone-able values, and optimizing the JS/native boundary for data which benefits from that is an extremely narrow edge case. The only way I’ve found to win is not to play.

That said I don’t have the cleverness a lot of performance geeks have and maybe there’s some technique I’ve missed! But I honestly can’t imagine how I’d optimize JS/native interop better than the JIT without finding myself getting surprising new career opportunities on a VM team.

porsager · on March 25, 2022

I think you're right. Here's a project proving the exact opposite of native bindings being slow - https://github.com/uNetworking/uWebSockets.js

jerrysievert · on March 25, 2022

I think you might find that a large part of their speedup is by not using any of the openssl trappings of node, and using boringssl instead.