The reference count on each element would presumably require atomic operations that could race with those used to update the actual data structure, and therefore you'd lose all your concurrency guarantees?
Generally speaking, this shouldn't be an issue if you perform batch collections.
Either way, any new mechanism to release memory would need to take into account both the generation and the reference count, and the atomic release should be performed in the same GCAS fashion.