Both Herb and Miguel seem to be writing on a flawed assumption, which is that JIT will only ever apply optimizations that can be proven safe. As I understand it, this is true for CLR implementations. And if that were the case generally, there would be no significant-enough difference between the set of optimizations available with AOT and the set of optimizations available with JIT, and JIT would never be "as fast as C".
But this disregards one of the major selling points of profiling + optimizing JIT: you can apply optimizations that cannot be proven safe and you can get away with it. This means you can not only be "as fast as C", you can implement a JIT that will mop the floor with C.
Good JITs are already taking advantage of this, and the very first comment on Miguel's article points this out. This (four-year-old!) article by Charles Nutter explains it a bit more in the context of the JVM:
I keep on hearing that it's possible to write a JIT-based vm that's "faster than C". I have yet to actually see one where that is true for anything other than one or two very specific micro-benchmarks.
Edit: the old C2 wiki actually has something useful to say about this.
It's actually not that difficult to find programs where a JIT is better.
Things that benefit significantly from profile directed feedback often run faster in JIT than static compilers, because JIT's tend to do this dynamically, whereas static compilers require training runs.
For example performance critical code that has a lot of conditional branches, but only a few are taken at runtime, will often work faster in a JIT.
(Of course, the typical solution is not to write performance critical code with a large number of branches, i'm just giving you examples)
Sure, but even in those cases the saving has to win back the overall cost of the runtime profiling and dynamic compilation. The crossover point seems to be high enough in the graph that it's not reached very often in practice - especially since, as you say, programmers tend to learn early on that hoisting conditions out of inner loops Makes Code Faster.
The crossover point seems to be high enough in the graph that it's not reached very often in practice
I remember a study posted here a few months back that supported the notion that most web apps have a certain "steady state" that they reach after they start up, where almost all the types of variables are known, and most of the funkier "dynamic" changes don't happen. If one can get a JIT to remember the tracing information from session to session, it should be able to run as fast as or faster than C.
True, but let's say the implementation Lang in question has optional types. Consider the unfilled type information to be technical debt. If the tracing information can be exploited to automatically fill in the type info, then this is technical debt that magically repays itself. That's something startups should be interested in.
"Winning back" the overhead of the profiler + JIT is not terribly hard right now for long-lived processes. And unlike the assertions in Herb's original article this is not something that requires a mythical "sufficiently smart compiler" to make even better -- it just requires people to implement and fine-tune things we already know.
Yes, you can perform unsafe optimizations, but optimizations take time, so if you are wrong enough, you lose.
Anyway, as I said below, there is nothing that stops you from implementing a JIT for C, it's just not cost effective from a "where do i put money to get better performance" perspective. If you did this, these JIT's would do just as well as the ones you think will "mop the floor" with C.
Something that seems to be missed here is that the whole point of compilers (and JITs) having intermediate IR is to abstract away the language to a point that you can optimize and codegen stuff.
So talking about a "statically compiled" or "dynamically compiled" language is well, silly. Most popular languages could be compiled either way[1], it just happens that we don't because it's not "better enough' in the real world.
If you want a real world example, LLVM will happily JIT C/C++ code for you, for example. It doesn't care that it's C/C++. It will even run it interpreted if you like.
[1] Some languages would still require a support library with an interpreter/jit/compiler if you statically compiled them, in order to handle dynamic class loading.
Polymorphic inline caching can be proven safe; it's just proven safe through dynamic checks of the cache, rather than statically. Miguel is right here, he's not working with a flawed assumption.
If you bundle JIT with garbage collection, bounds checking, and all the usual things that come with the JVM/CLR and execute on current hardware, probably not.
If you were to JIT a language like C it would probably be faster as you could optimize to specific hardware. OpenCL uses JIT for example and it's many times faster than static compilation (because it can take advantage of extra hardware at runtime).
The issue to contend with is the reality that about 30 years of the industry have gone into making C fast, everything from compilers to the way CPUs are designed. There's likely no inherent reason any style of compilation / execution couldn't beat static compilation, it's just we need the same level of resources and hardware support for those execution models.
For instance with garbage collection and bounds checking you no longer need an MMU.
Right, that is the point that both Herb and myself agree on: current managed languages have chosen paths where fewer errors, safety and productivity are more important than raw performance.
That said, although bounds checking eliminates some of the reasons for MMUs, it does not really solve many other features that we take for granted today that depend on MMUs, like on-demand-loading.
Garbage collectors and VMs typically use the MMU to improve the performance of their own operations. You use the MMU to cause execution to stop, you use MMUs to setup redzones on the stack to avoid checking on every function entry point for how much space is left on the stack and VMs use the MMU to map their metadata into their address space without having to load them from file and provide the services on demand.
> That said, although bounds checking eliminates some of the reasons for MMUs
Microsoft had a research OS called Singularity that was written in C# mostly and could be run in real memory mode because they didn't need the barriers.
Inferno (developed pretty much by the same team that recently created Go at google, and before created Plan 9 and Unix) doesn't need a MMU either, this is a great advantage when running on hardware without an MMU (there is a pretty good port to the Nintendo DS), and also makes it easier to port to platforms with an MMU (a single guy ported it to the PS2 in his spare time), MMUs are one of the main sources of platform-specific complexity.
Is that "If you were to JIT a language like C it would probably be faster" true? I think a JIT C compiler would run into trouble with alias detection. Let's say that it removes an if statement because the variable a in the "if(a)" that starts it always is false. In C, code like "x->y = 1;" could affect the value of a, as could _any_ pointer dereference. If the JIT cannot prove that that could happen, it would have to insert a check "does this modify a?" for every such statement. I think there will be quite a bit of code out there where even a very smart JIT would not be able to conclusively prove that such aliasing does not happen.
Also, I do not accept that OpenCL argument. IMO, OpenCL is more similar to static compilation than to a JIT. It is as if you recompiled your kernels whenever you changed your video card (perhaps also when you change its configuration); it does not do things that a JIT would do, such as "hm, most of the pixels are black; let's optimize for that case".
We (compiler optimization folks) have gotten very good at making pointer and alias analysis fast, particular simple pointer analysis like you might find in JIT. For a JIT, you would likely write out conservative but correct static results in whatever the equivalent of "javac" is for your C jit, and then refine it in the JIT. You'd invalidate it if you saw accesses you can't account for come up later on.
Even if you didn't want to do that, on demand CFL reachability formulations of pointer analysis can calculate reasonable pointer results for individual pointers fast enough if it became important.
Realistically however, no JIT is going to do advanced pointer analysis, unless you have CPU to burn.
As for "conclusively prove that aliasing does not happen", you don't actually have to, because if it is truly going to improve performance, you can insert runtime checks.
if (&a == &b)
<do super fast thing>
else
<slower fallback code>
If your C program has aliased pointers to different types, you are already running in undefined behavior land. The JIT is just as free to optimize away that test as the static compilers that optimize it away today.
OpenCL is only vaguely JIT like in the common sense. If I recompiled Photoshop so each image transform were completely unrolled and optimized for the image's dimensions, that'd be a closer approximation. It's more like online static compilation. You could kind of get the same effect using libtcc. [Now I'm not sure if I'm agreeing or disagreeing with you. I can't quite articulate the difference between JIT and "online static compilation", but I feel there is one.]
The very first comment misunderstands reality.
It talks about how you can make java programs that outrun C++ ones using hotspot.
Ignoring, of course, the fact that nothing really stops you from making a good C++ JIT except engineering time, and it would do just as well in those cases.
Nobody does it because the cases in which it would help are small, and most of those folks are willing to do profiling/training runs, and get about as good results, even if it requires more pain.
As Herb said, the languages were made for different tradeoffs.
Speaking as a compiler guy, yes, you can make almost any of them as fast as each other given enough time and effort.
But putting time and effort into a JIT may not be as effective as choosing a different language.
> Speaking as a compiler guy, yes, you can make almost any of them as fast as each other given enough time and effort.
But putting time and effort into a JIT may not be as effective as choosing a different language.
So, the most "bang for your buck" may be using a JIT for a language like Python. Sure, it won't be as fast (at least, not until a huge amount of work has been done on it), but it can get faster. Trying to beat C + a good C compiler is ... ambitious.
Yes, fortran is actually faster than C for some things. OK, C++ with the right templates, or a later C standard can theoretically do what Fortran does (remind me again - pass by reference, no recursion, and bounded arrays rather than pointers?). And Intel tends to be faster than gcc.
There's no way you can make broad statements (like "nothing will beat C" or "fortran is actually faster than C for numerical stuff") without having a few exceptions. Actually, Pypy is faster than C, in extremely specific circumstances (i.e. ones which were set up by the Pypy people to show off, see "PyPy faster than C on a carefully crafted example" ... thought they point out there are things you could do in C to make it win).
The most bang for you buck in JITs is for dynamic languages, because while stuff like type inference can exponentially explode, you'll only really have 1 type in many cases (i.e. a float), and the JIT can pick that up. And getting Python within an order of magnitude of C would be a massive win.
If you redefine C++ so that it looks like Java / C# (i.e. runs on a virtual machine, with a JIT and probably some requirements for memory safety), then yes, you can do these things with C++.
But the people who advocate C++ wouldn't recognize that language as a C++, because all the overhead of a JIT runtime violates the "only pay for what you need" dictum.
(E.g. consider the uptake of Managed C++, C++/CLI, etc.; which effectively are what you describe.)
I think you may be confused.
You don't have to redefine anything at a language level to make this work. It will just work.
The features Java redefined were mainly for security reasons, and pointer safety, not because it made it possible to run with a JIT
As a proof, you can JIT C/C++ with LLVM right now if you wanted to.
It's not a great JIT because it has no adaptive compilation heuristics, etc, but it works just fine.
C/C++ very specifically leave most of the execution environment undefined. They do not require anything that makes JIT'ing require language level changes. Your JIT may be unsafe (IE crash as easily as a regular C/C++ program), or otherwise require basically native access (IE not really viable for sandboxing in a browser), but it would work fine
Things like Managed C++/etc required language changes because they wanted it to interoperate with a common intermediate language and runtime.
There are things that you can do in C++ that you can't do in CLR (again, because the CLR chose safety/etc over other considerations), and vice versa (IE garbage collection at a language level), so managed C++ required language changes to support this.
I'm not confused (thank you very much for your diagnosis though - that's a grand way to gratuitously insult people). I didn't say redefining things at the language level; I meant redefining at the programming environment level.
You can compile standard C++ with C++/CLI just fine, and have it run on a fine managed platform. "IJW" was the slogan: It Just Works. Still didn't go anywhere.
(The CLR did not choose safety over other considerations. It has a safe subset, but it is not like the JVM; it has unsafe operations too. When you write IL for it, you are not even required to use GC memory; you can manipulate things at the level of untyped blobs of data if needs be.)
One thing I would like to see is speculative execution. I have an 8 core machine. When the JIT compiler runs there is no reason why it can't generate 7 alternatives to any particular code sequence, run them all in parallel and keep whichever was the fastest. Or run both sides of an if statement simultaneously.
This helps solve the problem that compiler writers often have multiple alternatives they can generate, but have to make hard decisions as to which to pick.
I'm not saying that implementing this will be easy, but then again all of the easy things have already been done.
Describing this as not easy is a bit of an understatement; there has been a lot of research into speculative threading and communication overhead between cores almost always kills it.
The main hurdle is getting a block of execution large enough to make it worthwhile trying alternatives and absorbing the cost of setup and finish. I suspect some form of STM would help a lot since you could then run multiple threads of code and only "commit" the winner. That allows a larger execution block.
I also think software people still have a innate fear of "wasting" CPU/memory/disk which is hard to overcome.
One example of where this is used in practice is the fftw library (fastest Fourier Transform in the west). At runtime you can run a trial of FFT algorithms for your data size and architecture, and it will pick the fastest.
It is a pain with FFTW since they expose all this in the API, let you save/load the details etc. A better example is MongoDB. When you make a query they try multiple query plans concurrently, note which is fastest and use that in the future with continued performance monitoring. Once diverging from expectations they run the contest again.
Compare with other databases that put an inordinate amount of effort and tuning into coming up with the one true query plan per query.
> Java HotSpot takes a fascinating approach: they do a quick compilation on the first pass, but if the VM detects that a piece of code is being used a lot, the VM recompiles the code with all the optimization turned on and then they hot-swap the code.
Wait, what? Hotspot does a "quick compile" on the first pass? When did this happen?
Some time ago. It has a "pasting" JIT for as a first optimisation stage, which just pastes together machine code for each opcode, to remove the overhead of interpreting.
One of the performance advantages of C is the sheer amount of detail that's left to the implementation. By contrast, languages like C# and Java are typically much more tightly specified, which prevents optimizations that are not provably correct. This is not a matter of JIT vs non-JIT.
The JVM is quite happy to speculatively apply optimizations that are not provably correct, and has done so for years. Have a look at the link in this comment:
But this disregards one of the major selling points of profiling + optimizing JIT: you can apply optimizations that cannot be proven safe and you can get away with it. This means you can not only be "as fast as C", you can implement a JIT that will mop the floor with C.
Good JITs are already taking advantage of this, and the very first comment on Miguel's article points this out. This (four-year-old!) article by Charles Nutter explains it a bit more in the context of the JVM:
http://headius.blogspot.com/2008/05/power-of-jvm.html