The stack is not managed by the GC in the ordinary meaning so.. benchmarks that only allocate on the stack literally doesn’t matter.
And by “what a binary tree is doing” you mean like.. garbage collecting no longer used objects? Like, why is it hard to believe that the runtime on which perhaps the majority of serious, huge web services run (twitter, apple’s web services, but google as well are huge java shops), the likes of which handle 325,000 transactions per second (Alibaba) underwent a tremendous amount of engineering and in the GC category is definitely the queen?
The JVM is probably heavily optimized for what a binary tree is doing, does not mean the JVM overall is better for all use cases.