*some benchmarks* from Wed, 10 Oct 2012 jruby-1.7.0.RC2 -Xcompile.invokedynamic=...

ksec · on Oct 22, 2012

So with about the same CPU time, JRuby will use up to 100x memory in worst case?

Anyway Ruby 2.0 is suppose to solve the Ruby is Slow problem. although i have yet to see any evidence that will the case. Sigh.

bascule · on Oct 22, 2012

> So with about the same CPU time, JRuby will use up to 100x memory in worst case?

The JVM often trades increased memory usage for increased performance. That said "100x memory" is total hyperbole, even in the context of a microbenchmark. If you want the JVM to use less RAM, you can turn down the maximum heap size. Otherwise the heap will grow until it reaches close to full before the GC does a stop-the-world style collection of the old generation.

> Anyway Ruby 2.0 is suppose to solve the Ruby is Slow problem

Absolutely not. There are presently no plans to add a JIT compiler to the de facto Ruby interpreter.

The JVM on the other hand has the most advanced JIT compiler in the world. InvokeDynamic on Java 7 allows JRuby to do method inlining, and on Java 8 it will be able to take advantage of the full suite of HotSpot optimizations, including escape analysis and lock elision when compiling Ruby code into machine code.

No other Ruby VM will have that level of optimization.

mbell · on Oct 23, 2012

> Otherwise the heap will grow until it reaches close to full before the GC does a stop-the-world style collection of the old generation.

Most real applications will be using ConcMarkSweepGC(or G1) and ParNewGC. With these collectors the JVM won't just burn through all the memory and halt the world unless you throw a workload at it which the GC can't handle with its normal GC cycles.

The GC makes small sweeps regularly but the duration of those sweeps is limited to ensure relatively consistent response time. If you were creating and throwing away so many objects which were promoted to the tenured generation that these small, time limited, GC sweeps couldn't keep up, then you would see the heap grow to its maximum and a 'stop the world' GC would take place. In practice that doesn't happen in the vast majority of workloads.

bascule · on Oct 23, 2012

No, you're wrong, an extremely scant number of applications can completely avoid allocating memory in the old generation. Your best bet would probably be to use the Disruptor approach of preallocating everything then avoiding any subsequent allocations if that's really what you want to accomplish. However, if you honestly believe "most real world applications" can avoid allocating memory in the old generation, I'd sure love to see some graphs of some heap profiles of real world usage of these applications!

CMS does a stop the world collection of the old generation, however it will do an incremental collection of the eden generation. If you've ever looked at a graph of the JVM heap observing CMS running under VisualVM or YourKit, you'll notice a distinct sawtooth pattern to its collections. The diagonal parts of the sawtooth represent incremental collections in eden space. The vertical parts of the sawtooth represent stop-the-world collections of the old generation.

The "small regular sweeps" you're talking about affect the eden generation, not the old generation. Collecting the old generation is a stop the world operation in every JVM GC option except Azul's C4 collector. C4 resorts to all sorts of clever tricks to avoid stopping the world, most notably read barriers and eager pointer updates. On the Azul hardware, these tricks involved hardware transactional memory. On x86 they don't have HTM (at least until Transactional Synchronization Extensions ship on Haswell). What they do, however, is quickly set up GC invariants and a trap on a given page. Whatever thread accesses a page that contains an object which needs to be relocated relocates the object on the GC's behalf and gets the new pointer. All that said, C4 is only available on the Azul JVM, which is commercial.

CMS can't do this. It stops the world at a JVM safe point whenever it needs to collect the old generation.

mbell · on Oct 24, 2012

CMS collector has two different types of collections it will do on the old generation. Both are "stop the world" but for drastically different amounts of time. During its 'normal' operation most of the sweep is done concurrently with two short "stop the world" points for each GC cycle. If it can't keep up, then it will do a full blow "stop the world for real" collection.

igouy · on Oct 22, 2012

>> If you want the JVM to use less RAM, you can turn down the maximum heap size. <<

That's how the programs used to be run with JRuby 1.6.7 and with those same limits the n-body program failed to complete within 1 hour using JRuby 1.7 -- so the brittle RAM limits were abandoned.

spitfire · on Oct 22, 2012

Possibly not. Those benchmarks (mandelbrot and n-body, pi-digits) are very low memory footprint benchmarks. So it might be a difference of going from 20K to 30meg constant memory usage.

Whereas the other benchmarks seem to have single digit multiples of memory usage. SO seemingly real world apps will use more memory, but not 100x in practice.

bascule · on Oct 22, 2012

Also note that for Fixnum dependent benchmarks, JRuby has to allocate memory due to boxed integers, whereas on the de facto interpreter it's able to use tagged pointers which allow Fixnums to be treated as simple numeric values.

This may be going away in JDK9 as they claim they're eliminating primitive types in favor of an everything-is-an-object model.

benmmurphy · on Oct 22, 2012

these benchmarks also include the time it takes the vm to start up and may not include full JIT performance. locally in jruby pidigits.rb 2000 takes 4.132s, running it twice it takes 4.965s. jruby is quite annoying for running small little scripts because of the startup time but it an be quite nice for full applications.

igouy · on Oct 22, 2012

>> jruby pidigits.rb 2000 takes 4.132s <<

Why did you measure a workload 1/5th of the workload shown on the benchmarks game webpages?

Each program page shows measurements for 3 workloads, in the case of pi-digits 2,000 6,000 and 10,000.

http://shootout.alioth.debian.org/u32/program.php?test=pidig...

bascule · on Oct 22, 2012

Yes, each Ruby method needs to get executed 50 times to JIT to JVM bytecode, and from there it takes an additional 10,000 calls for HotSpot to JIT that to machine code.

Any hotspots therefore need to be hit at least 10,050 times in order to be fully warmed up.

mbell · on Oct 23, 2012

Just start the JVM with -XX:CompileThreshold=<some_number>

It only defaults to 10,000 when the JVM is running in server mode, its 1,500 when in client. Which mode a default install runs in is a bit wonky so I always explicitly set it (with -server as an argument to the JVM).

ZitchDog · on Oct 22, 2012

I guess I should have specified, I was thinking of the benchmarks vs non-invokedynamic JRuby. No intention of starting a benchmark war. Ruby 1.9 is fast too.

igouy · on Oct 22, 2012

OK. Back to August 29th, JRuby 1.7.0 preview2, invoke-dynamic=true on the right, the last number is elapsed time in seconds (so 451.046 vs 258.643 with invoke-dynamic=true) --

http://anonscm.debian.org/viewvc/shootout/shootout/website/w...

igouy · on Oct 23, 2012

I see that ViewVC link seems to have problems showing the diff at the moment, here are elapsed seconds (without invokedynamic and with invokedynamic)

jruby 1.7.0 (1.9.3p203) 2012-10-22 ff1ebbe on Java HotSpot(TM) Server VM 1.7.0_09-b05 [linux-i386]

    invoke-dynamic=	false	true
    binarytrees #1	290.08	252.55
    binarytrees #2	290.34	252.92
    binarytreesredux #2	289.64	262.04
    chameneosredux 31	39.85	32.73
    fannkuchredux #2	1720.70	1481.23
    fasta #6		372.59	418.07
    fasta #5		294.54	270.02
    fasta #1		278.98	260.14
    knucleotide #2	428.75	377.28
    knucleotide #1	470.40	461.46
    mandelbrot #3	1404.58	1149.08
    mandelbrot #1	2301.81	2167.50
    meteor #1		17.59	17.42
    meteor #2		13.19	13.37
    nbody #1		678.62	568.72
    pidigits #2		3.10	3.22
    pidigits #3		25.89	25.73
    pidigits #1		50.80	51.03
    regexdna #1		78.76	78.60
    regexdna #3		44.29	45.33
    revcomp #2		24.05	24.49
    spectralnorm #2	462.09	387.54
    spectralnorm #1	496.94	438.55

YMMV

nahname · on Oct 23, 2012

Ruby 1.9.? There is a significant difference between 1.9.1 and 1.9.3

igouy · on Oct 23, 2012

http://shootout.alioth.debian.org/u32/ruby.php#about

or

http://shootout.alioth.debian.org/u32/program.php?test=nbody...

or ...