The only deterministic thing about reference counting is call sites.
There is nothing deterministic about how long those calls take,
or stack size requirements, especially if they trigger cascade deletions, or are moved into a background thread to avoid such scenarios.
Reference counting optimizations slowly turn into a tracing GC.
It's possible, but that's an exaggeration of a degenerate case. RC doesn't mean all data must be tangled into an unknowably large randomly connected web of objects.
The behavior is deterministic enough to be profiled and identified if it actually becomes an issue.
Identifying causes of pressure on a mark-and-sweep style GC is much more difficult, and depends on specialized GC instrumentation, not just a regular profiler.
In practice, you have predictable deallocation patterns for the vast majority of objects. The things that are "young generation" in a GC are the things that get deallocated right away in RC.
Time required to deallocate is straightforwardly proportional to the dataset being freed. This can be a predictable bounded size if you're able to control what is referencing what. If you can't do that, you can't use more constant-time alternatives like memory pools either, because those surprise references would be UAFs.
I've been there when Apple moved Cocoa from GC to ARC, and the UI stutters have disappeared. It's much more palatable to users to have RC cost happening in line with the work application is doing, than have it deferred to cause jank at unexpected times seemingly for no reason.
Generational GC designs do not "deallocate" objects. In fact, most GCs don't. It's an understandable but an unfortunate misconception that sometimes causes developers to write more GC-unfriendly code than necessary.
When a collection in a generational GC occurs, the corresponding generation's heap is scanned for live objects, which are then usually relocated to an older generation. In a most common scenario under generational theory - only few objects survive, and most die in a young/nursery/ephemeral generation (.NET, JVM and other ecosystems have different names for the same thing).
This means that upon finishing the object relocation, the memory region/segment/heap that was previously used can be immediately made available to subsequent allocations. Sometimes it is also zeroed as part of the process, but the cost of such on modern hardware is miniscule.
As a result, the more accurate intuition that applies to most generational GC implementations is that pause time / CPU cost scales with live object count and inter-generational traffic. There is no individual cost for "object deallocation". This process is vastly more efficient than reference counting. The concern for overly high allocation traffic remains, which is why allocation and collection throughput are defining characteristics of GC implementations, alongside average pause duration, frequency, costs imposed by specific design elsewhere, etc.
Allocating and deallocating a complex graph of reference-counted objects costs >10x times more than doing so with a modern GC implementation. I don't know which implementation was used back in the day in Cocoa, but I bet it was nowhere as advanced as what you'd see today in JVMs or .NET.
A few entry points into the documentation, where several ifs and buts are described on how to make use of Objective-C GC, without possibility having things go wrong.
All of this contributed for Objective-C not being a sound implementation with lots of radar issues and forum discussions, as you might expect trying to make existing projects, or any random Objective-C or C library now work under Objective-C GC semantics and required changes, wasn't that easy.
Naturally having the compiler automate retain/release calls, similar to how VC++ does with _com_ptr_t (which ended up being superseded by other ways) for COM, was a much better solution, without requiring a "rewrite the world" approach.
Automate a pattern developers were already expected to do manually, and leave everything else as it is, without ifs and buts regarding code best practices, programming patterns, RC / GC interoperability issues with C semantics and so on.
The existing retain/release calls wouldn't be manually written any longer, everything else stays the same.
Naturally Apple being Apple, they had to sell this at WWDC as some kind of great achievement of how RC is much better than GC, which in a sense is correct but only from point of view of the underlying C semantics and the mess Objective-C GC turned out to be, not tracing GC algorithms in general.
Were you also there when GC failure was actually underlying C semantics from Objective-C, producing random crashes, especially with mixed code bases, instead of the marketing material "why RC?"?
There is a reason why RC is considered the baby algorithm from automatic memory management algorithms.
Anything that people point out as optimizations, and profiling tools, also exist for the better algorithms.
Most languages with automatic memory management support, also offer primitives to deterministic call sites, if one so desires.
Finally it isn't as if Apple is a genius that managed to revolutionized memory management algorithms, doing in Cocoa what Microsoft was already doing with COM, was the natural way out given Objective-C' GC unsound implementation.
Swift's requirement to stay compatible with Objective-C memory management, naturally required the same approach, the alternative being something like CCW/RCW COM interop from .NET, which understandably they didn't want to go down to, given previous history.
My point was that RC in practice is pretty deterministic in the vast majority of cases, and I think equating its unpredictable rare cases with natural unpredictability of a GC is scaremongering. The differences between their non-deterministic behaviors are significant enough that they can't be simply equated with a sweeping generalization.
This is true regardless of which approach has higher-throughput implementations or lower overhead overall, and completely unrelated to Apple's marketing, or Microsoft being first at something.
There is nothing deterministic about how long those calls take, or stack size requirements, especially if they trigger cascade deletions, or are moved into a background thread to avoid such scenarios.
Reference counting optimizations slowly turn into a tracing GC.