If your language supports good strong/weak references and containers thereof, cleaning up dead interned strings isn't hard. I'm not aware of any language that provides this out-of-the-box, unfortunately.
Why do so many languages make weak references such second-class citizens? Why do all containers suck so much that you have to implement your own (and that's hoping the language is actually efficient enough to let you?)
> string interning package to its standard library
TFA literally says interning isn't there in Go yet.
While the unique package is useful, Make is admittedly not quite like Intern for strings, since the Handle[T] is required to keep a string from being deleted from the internal map. This means you need to modify your code to retain handles as well as strings.
> I'm not aware of any language that provides this out-of-the-box, unfortunately
The currently most prominent example would be Rust. Rc<T> is a simple generic container that implements reference counting, and any instance of it can be downgraded to a Weak<T>. Or Arc<T> and std::sync::Weak<T> if it needs to be thread safe.
I've done it in C++, so Rust is probably capable of it if you add enough layers of rc refcell and whatever else it requires to fit into its restricted worldview.
Does Rust actually have container implementations that do all of the following:
* When walking the container (either iterating or looking up, even through a non-mutable reference), call a user-provided predicate (not just builtin "weak" like many languages have via weakset/weakkeymap/weakvaluemap) to detect if a node should be considered "dead", and if so transparently remove the node. [In my experience this is relatively easy to add when you're implementing the container algorithms yourself, though I've never done it for bulk algorithms yet.]
* When looking up a key (which may have different type or identity), the lookup returns the actual key the container had. [This may be impossible for container implementations that split the key.]
Mutating a container through a shared reference means the container either has to be single-threaded (not marked as Sync/Send), or be thread-safe.
The single-threaded ones are easy to make, but Rust will prevent you from sending them to another thread, which is probably something you want.
For thread-safe things, look into the crossbeam crate, it has really good collections.
One I worked with was the dashmap, which has a .retain() method [1] that works over a shared map reference, but runs a closure which gets mutable access to each key and value, and decides whether to keep the pair or not.
Its .get() [2] uses equality (so you can use a different object), but returns a reference to the original key-value pair. The .get_mut() will return it as mutable, but inside a guard that keeps the item locked until it goes out of scope.
`.retain()` unfortunately isn't what I'm talking about, since it's at least O(n) per call. It might be better if you can arrange to call it just once after a batch of operations, but that isn't always the case.
"Delete as you go" usually† adds no space/time complexity to the operations you're already doing (though it does affect the constant factor). It does mean you're giving up on predictable destructor calls, but generally the "heavy" destructor was called by whoever made it expire (e.g. the death of the last remaining shared owner if that's what expiry is based on).
† If many expirations happen at once, this can easily drop to merely amortized time. It's also possible for a container to offer, say, O(1) iteration via dedicated pointers but have a delete operation that's more complicated.
To the first question, not really, and if it did it would be pretty fragile because of mutability requirements. It's fragile in C++ too because of iterator invalidation, Rust mostly turns that into a compiler error.
I didn't have any problem with iterator/pointer invalidation (easy to merge into the same thing), since holding an active iterator inhibited expiration. It's possible to imagine an expiration system that doesn't automatically guarantee this but I didn't have one; the only non-weak-based expiry I had was based on timers, and it's sanest to only count time as elapsing at the heart of the event loop.
Why do so many languages make weak references such second-class citizens? Why do all containers suck so much that you have to implement your own (and that's hoping the language is actually efficient enough to let you?)