Analyzing Every Clojure Project on GitHub

lukev · on Sept 7, 2022

> Average mutable reference usage per repository: 1.94

> Repositories with no mutable reference usages: 7,245 (63%)

> Truly bananas. Clojure libraries really do have less state. It probably isn't surprising if you've been programming in clojure for any length of time, but it's pretty wild to see the data back this up. I don't think I would have believed this 10 years ago.

Truly interesting and impressive, although I'll note that there are plenty of ways (most of them via Java interop) to introduce mutability of some kind into a Clojure program besides the first-class reference types (atoms/refs/agents/volatiles.)

I'd be interested in an analysis in what percentage of Clojure functions are truly referentially transparent, but that's difficult (if not impossible) to determine statically.

phronmophobic · on Sept 8, 2022

> although I'll note that there are plenty of ways (most of them via Java interop) to introduce mutability of some kind into a Clojure program besides the first-class reference types (atoms/refs/agents/volatiles.)

Fair point, I tried to word that statistic accurately without being too verbose.

> I'd be interested in an analysis in what percentage of Clojure functions are truly referentially transparent, but that's difficult (if not impossible) to determine statically.

Absolutely, I'm not sure it's possible or even practical to measure functional purity in an absolute sense, but there's definitely room for improvement:

- measure Java Interop

- measure usage of mutation functions (eg. swap!, set!, binding, send, vswap!, alter, etc)

- measure (def ^:dynamic *my-var*) usages

My intuition is that the numbers wouldn't change that much, but I'd rather have the data than guess.

dustingetz · on Sept 7, 2022

lazy seqs are not RT so ~zero

dwohnitmok · on Sept 8, 2022

If your lazy seqs contain side effects that are evaluated as the seq is evaluated then definitely not RT (and this is a pretty big no-no in Clojure to begin with). Otherwise, it seems like lazy seqs are RT in the sense usually used by programming languages. Otherwise a language like Haskell wouldn't have anything considered RT.

dustingetz · on Sept 8, 2022

also exceptions and dynamic scope

User23 · on Sept 8, 2022

Learning Clojure is probably the easiest way to get paid to write Lisp. Personally I prefer Common Lisp, but I know I’m a grognard. And I really do like Clojure too. Both have a very pragmatic philosophy. It’s kind of fun that Steele’s quip that Java was about dragging C programmers to Lisp has come more true than maybe even he imagined.

stcredzero · on Sept 7, 2022

I'm generally interested in tools like cljdoc that work at the ecosystem level.

Isn't the meta-lesson of the history of programming languages from the past 60 years, basically that great language design should take into account the ecosystem level?

The success or failure of a language has really depended on the health of its interaction with its community and ecosystems, much more so than narrow technical merit of the language.

Name a language, and its history bears this out. (Clojure as well.)

bm3719 · on Sept 7, 2022

Nice to see the data confirm the general consensus (by my measure, at least) regarding the Clojure STM options.

Like others, I've been telling Clojure newbies something like: When you need Clojure STM, default to using an atom unless you really know you need to use something else.

Syzygies · on Sept 7, 2022

Ok, newbie here!

In Haskell I can add a handful of lines to parallelize an "embarrassingly parallel" computation. For example, there are 66,960,965,307 atomic lattices on 6 atoms. A 20 core Mac Studio can figure this out by reverse search in just over four minutes; divvy the work up into piles, and have everyone count the work in their pile. For the problems I care about, everyone's looking for needles in a haystack; they can report what they found with no concern for what anyone else is doing.

So what's the dumbest "try this first" approach in Clojure?

nimih · on Sept 8, 2022

Within the standard library, `clojure.core/pmap` is good for simple problems[0], whereas the various functions in the clojure.core.reducers[1] namespace are a bit more sophisticated and would probably solve the sort of problem you're describing pretty well. There are also a number of good clojure parallelism libraries floating around if you don't mind incurring dependencies: I've used claypoole[2] and tesser[3] in the past and been pretty happy with them.

All of these options are, IME, relatively easy to drop in to some extant data processing pipeline to parallelize it, and probably require a similar level of finagling to what you're used to in Haskell.

[0] (->> some-lazy-seq (pmap ...) (reduce ...)) goes pretty far, but nesting/composing pmaps or doing i/o doesn't always work particularly well since the JVM [currently] uses OS threads rather than something like the lightweight threads GHC provides.

[1] https://clojure.org/reference/reducers

[2] https://github.com/clj-commons/claypoole

[3] https://github.com/aphyr/tesser

dwohnitmok · on Sept 8, 2022

I find this to be an unfortunate state of affairs. If Clojure's STM was more developer-ergonomic, it could be used in a wide variety of places.

For example right now in Clojure, when you have concurrent access to a map, you're forced to choose between either atomic, but entirely serial writes (wrap the map in an atom) or per-key concurrency, but no inter-key atomicity (either use nested atoms or use ConcurrentHashMap).

But this false dilemma has all the hallmarks of complection. It's an all-or-nothing choice brought on by an overly coarse idea of atomicity. You could instead use a map structure built on top of STM to get exactly the amount of atomicity you need. If you need two keys to be modified in the same transaction then they get modified atomically. If you need another key to be modified in parallel, then that can happen. The amount of atomicity you need is specified dynamically and on the fly, instead of bound inextricably to a predetermined choice.

I find myself wishing for this kind of tool whenever I have concurrent write contention on a map. Yes I usually bite the bullet and just accept forced serial writes to an atom, but I do so begrudgingly.

Other ecosystems with more ergonomic STM systems use this to great effect (e.g. Haskell's stm-containers library: https://hackage.haskell.org/package/stm-containers).

bcrosby95 · on Sept 8, 2022

> or per-key concurrency, but no inter-key atomicity (either use nested atoms or use ConcurrentHashMap).

I guess it depends upon what you mean by "per key", but if your map is a ref, and the values are refs, you can get per-value concurrency with inter-value atomicity.

You couldn't concurrently add new keys though, since that's changing the map's ref.

nlitened · on Sept 8, 2022

I’ve seen a Clojure library that does proper “STM inside a map” concurrency (couldn’t easily google it right now, but it’s there).

I think single atom with serial writes is still better. Firstly, I think it’s more performant still, serial updates on atom are fast. And secondly, it’s much easier to reason about such code, much easier to test and debug it.

capableweb · on Sept 7, 2022

I feel like sometimes even `atom` gets over-used. When explaining to newbies, maybe we can say:

> When you need Clojure STM, you usually don't, there is usually another way. But if you really, really do need it, default to using an atom unless you really know you need to use something else.

robto · on Sept 7, 2022

There's a handy chart in "The Joy of Clojure" that I refer to whenever I'm trying to tackle concurrency problems:

    |              | ref | agent | atom | var |
    | coordinated  | x   |       |      |     |
    | asynchronous |     | x     |      |     |
    | retriable    | x   |       | x    |     |
    | thread-local |     |       |      | x   |

Turns out I have never needed coordinated, synchronous stuff. I have dabbled with agents, but just for an Advent of Code problem.

I do like that 63% (!!) of clojure repos have no mutable references at all - that tracks very strongly with my experience. And that the average number of mutable references is less than 2! Immutability can carry you a long ways, and I love that I can trust that contract. On the other hand, it's nice that I can opt in to mutation really easily if I need it.

hospitalJail · on Sept 7, 2022

Just mentioning, these are public projects on Github. Both my big corp job uses github privately and I personally use it privately.

I wonder how that would skew things.

mms__ · on Sept 8, 2022

Also work a big corp Clojure job. Would be awesome to see some of the libs go open source

stanislavb · on Sept 7, 2022

If you are interested, you can discover the most popular clojure projects (by number of mentions) on LibHunt https://www.libhunt.com/l/clojure. As most of us could expect, Logseq is amongst the most popular and most trending projects.