Isn't this what the european Cyber Resilience Act (CRA) is about?
See https://orcwg.org/cra/ and the work of the Open Regulatory Compliance Working Group in general.
More or less, though the CRA is pretty minimal: it has a few basic requirements and hobby/unpaid open source software is not covered. A company integrating open source software is essentially responsible for covering those requirements themselves.
The company being responsible for the open source components they integrate should solve the biggest dependency problems, though. From a security perspective, it doesn't really matter if a company fixes the bugs themselves or if they pay someone to fix it for them.
FWIW, LibreOffice automatically inserts an actual Unicode NO-BREAK SPACE when I type ":" at the end of a word (if the language is set for French of course). If I insert an actual SPACE and then hit ":", it even replaces the SPACE with a NO-BREAK SPACE.
I'd be surprised MS Word doesn't do the same. No need for a "true" typesetting solution.
But of course, nobody uses the standard library since it came late to the game, and instead there are at least half a dozen popular HTTP libs commonly used (Apache, Jetty, Spring, OkHttp, Google, etc.). It's not too different from the logging clusterf*ck in Java, where there are now 9 ways to do it, and none of them are compatible with each other.
A few years back we used slf4j for compatibility between Java logging libraries. Have they managed to break it? Or are you after compatibility at a lower level?
Slf4j is definitely the way to go, but it is still VERY hard for typical enterprise devs to get it working correctly, which means that both your own code and then all the dependent libraries are all on the same page w/ regard to what level and format the logs are printed, and to where they're going.
The problem is that even if your own code uses SLF4j correctly as a logging API, and then wires up the correct logging implementation (e.g. Logback, Log4j2, etc), and then finally include the necessary SLF4j "bridge" libraries, you still need to ensure that every 3rd party transitive dependency explicitly excludes Commons Logging, Log4j, etc libraries.
It's a mess, and almost none of projects that I've worked on over the years got it 100% right the first time.
That being said, I see the exact same problem in a lot of the modern Python and Node code that's out there. It just becomes too complicated because there are too many logging options, and when a modern project - in any language - now relies on dozens of 3rd party libraries, then it's likely you're ending up with 5 different incompatible logging implementations in your final runtime, all doing their own thing.
It is a perfect example of the famous XKCD "multiple competing standards."
> Transactional memory is currently one of the hottest topics in computer-science research, having attracted the focus of researchers in programming languages, computer architecture, and parallel programming, as well as the attention of development groups at major software and hardware companies. The fundamental source of the excitement is the belief that by replacing locks and condition variables with transactions we can make it easier to write correct and efficient shared-memory parallel programs.
Has this research come to fruition? What languages/environments use such an approach today?
Well, I'd say it's a bit muddled. Pure STM systems have seen limited success, mostly on the jvm, particularly clojure. But as a technology it's not spread widely in the industry, despite the fundamental advantage of composability. I personally would attribute this to a sort of gulf created by two common approaches:
1. It's become very common to structure software as a stateless service cluster backed by a stateful database technology. Most of the databases used in this approach would not be described as STM, but the net effect to the application is similar, even if involving more layers and complexity.
2. People have gotten pretty comfortable with simple mutex patterns connected by queues of various sorts. This is a sort of worse is better situation, where the simplicity and high performance of a mutex protected whatever in a toy test or microbenchmark is far more efficient than STM. However, a system composed of many of these mutex protected islands proves a whole different beast indeed. STM has largely been criticized from the perspective of the former, vs the latter.
There are many people who have made the observation that transactional concurrency control, state of the art garbage collection, and even file systems have been converging on similar features. This is being driven by the common constrains on both sides of what hardware and humans expect. In particular with persistent memory, I think you'll see all three of these unify into a single design, because systems that attempt to solve these problems separately will have very inferior match to the hardware.
I think trying to scale out compute + mutable state across multiple CPUs on a single box has somewhat fallen by the wayside outside of specialized applications like databases and distributed query engines, as a local maximum avoided in favour of scalability via more boxes.
There's several forces behind this. Applications are more likely to be web-based or services - i.e. inherently distributed - and less likely to be desktop or traditional client/server, where almost all compute happened on a single server. As distributed services, maximizing statelessness and avoiding mutable shared memory is key to solving a lot of problems: scaling out (no shared memory), redundancy (keep a second copy of the application logic running somewhere, no sync required), recovery and idempotence (if something fails, try again until it succeeds - repeated attempts are safe).
Reliable persistent queues are part of that. They let you bring services up and down and switch over without down time, or restart after a failure and resume where they left off.
The problems of shared mutable state are best kept in specialized applications: databases, queuing systems, distributed caches, consistent key-value stores. Keeping state consistent in a distributed system is a genuinely hard problem, and STM isn't much help, except perhaps as an implementation detail in some of those specialized applications.
For what it's worth, scaling mutable shared state across multiple CPUs on a single box has fallen by the wayside for databases too. Thread-per-core style software architecture has become idiomatic, in part because it efficiently scales up on machines with a very large number of cores. Nothing about this precludes scale-out, it just allows you to serve 10-100x the load on a single box as the more typical naive design.
Web applications aren't special in this regard. We've just normalized being incredibly wasteful of computing resources when designing them, spinning up huge clusters to serve a workload that could be easily satisfied with a single machine (or a few machines for availability).
Premature scale-out, because we've forgotten how to scale-up, is the root of much evil.
This has been my experience as well. I shifted from embedded systems programming to web services world briefly. I found that scale up approach has been so maligned that it is ok to spin up several hundred ec2 instances running a (poorly written) java application instead of a single multi core instance running erlang that performed much better. Some frameworks even ran web services in php and spent all effort in not having the request reach php code since it was doing like 10 reqs/sec.
Basically, poor STM performance in .Net is too much mutation and mutation-induced logging. This is why clojure's STM is much more widely used.
I have particularly deep experience in various Haskell projects and I must insist that no relatively (1) big Haskell project (2) with a hint of concurrency and (3) developed by several people can go without use of STM.
I can use MVars myself for problem space discovery. When there are at least two of use, I will use TVar and insist others should too.
I am not free to name the product, but I worked on a product that has done billions of dollars in revenue that designed a bunch of state management checkpointing using a custom STM implementation for C (yes, C). It made so, so many things straightforward in terms of corner cases; we also extended transactionality to sync over the wire in some cases.
I also think STM-adjacent designs are gaining some traction - for example, FoundationDB looks an awful lot like an STM system from the point of view of applications, much more than it looks like a traditional database.
It’s widely regarded as a dead end. Conceptually, it’s quite appealing so who knows, it might find new life someday. Garbage collection was a niche technology for decades, as was JIT compilation.
Joe Duffy’s write-up on how STM failed at Microsoft with .NET is illustrative:
right... it's not hard to invent some scheme that scales to a 2 core computer but it is not crazy to have 64 or more cores these days and even if you have 2% serial overhead from communications you are not going to scale past 50.
The problems of making a system really scale as opposed to just scale are bad enough that the best we know how to do is give people simpler primitives that work -- each layer you add to the system is another opportunity to add another limitation to it.
Modern garbage collectors are fairly scalable (e.g. don't break outright as the heap size increases) and any manual memory management would also have the risk of scaling problems, to the point where you might say that garbage collection is more "scalable" when the thing being scaled is the kind of arbitrary complexity that appears in business applications.
What people complain about with garbage collection are: (1) a fixed constant overhead they think is too high and will probably always think to high, (2) long pauses that are manageable in practice unless you are timing events with millisecond precision with a microcontroller.
Thus one of these things is mainstream and other isn't.
> What people complain about with garbage collection are: (1) a fixed constant overhead they think is too high and will probably always think to high, (2) long pauses that are manageable in practice unless you are timing events with millisecond precision with a microcontroller.
I keep hearing this about "modern" GC and it keeps being very much not true in practice. I'm running IntelliJ IDEA (Java) on a laptop with 8GB of RAM, and it keeps freezing every so often. I'm suspecting GC causing paging. This is something that is obviously avoided with reference counting or manual memory management.
It's not like tracing GC is without memory overhead...
Slower is a question of priorities. I'd definitely trade off slower overall performance (I don't really need my IDE to be that performant... it's not like it's using 100% CPU, or even 10%!) but a huge improvement in latency (i.e. no catastrophic slowdowns / freezes).
There are very few ways of having zero memory overhead, though.
Also, most of the flaws of Java's memory management seem to be concentrated in the part of the language that says that almost everything has to be an object with a header and on the heap. Remove that and your graph of objects to trace shrinks very substantially.
Interestingly, I was running out of memory, so I gave it more RAM (like 20GB) and it got even worse. Apparently it takes a long time to GC that much memory. I’d also rather amortize that pause. The loss of productivity due to a sudden pause is real.
Yeah. As is pointed out down-thread of the linked post, even when it was confined to a niche, garbage collection did what it claimed it would (eliminate the need for error-prone manual memory management) at an acceptable cost for many systems & applications. JITs were similar.
So far, STM and HTM have followed a path more like VLIW instruction sets: Conceptually appealing for their promise of simplification, but in practice only demonstrating success in specialized uses, with the tradeoffs proving unattractive seemingly everywhere else they've been tried.
I think it is a false equivalence to say that STM and HTM are like VLIW (which has been been a resounding success for DSP workloads). I am not defending VLIW.
Transactions are extremely powerful conceptually, to be composable and not cause flapping requires a higher level of coordination.
It kind of came and went in Clojure, not much code uses it today.
In your second link, the linked followup post is also interesting as it captures some misunderstandings one might have when starting out with STM code.
Hardware support is only very recent, Intel added support at various times and then had to remove it due to security and correctness issues, but I think it is now finally available:
> Intel's Transactional Synchronization Extensions (TSX) is available in some of the Skylake processors. It was earlier implemented in Haswell and Broadwell processors as well, but the implementations turned out both times to be defective and support for TSX was disabled. The TSX specification describes the transactional memory API for use by software developers, but withholds details on technical implementation.[11] ARM architecture has a similar extension.[12]
I believe even with it available, OSes turn it off because of all the other Intel vulnerabilities and it making them even more expensive to workaround.