Is the compiler, or a library you're using, somehow adding code that takes a different strategy at runtime when the linked list gets bigger than the performance-fall-off size?
Good question, but I don't think so. Two mergesort variants show it, one does not. A C transliteration of one of the variants that show it, also shows performance drops. The Go compiler for MacOS on M2 silicon probably generates very different code than the X86_64 on Linux, yet they both have the same performance drops.
The code only uses standard Go packages. I wrote the linked list struct, I'm not using a package's struct or code. I don't even use standard package code during sorting, only in timing and list preparation.
What you suggest may be true, but it seems very unlikely.