> Doing any styling or scripting inline should be frowned upon as hard as table-based layouts.
I strongly disagree: inlining your entire CSS and JS is absurdly good for performance, up to a surprisingly large size. If you have less than 100KB of JS and CSS (which almost every content site should be able to, most trivially, and almost all should aim to), there’s simply no question about it, I would recommend deploying with only inline styles and scripts. The threshold where it becomes more subjective is, for most target audiences, possibly over half a megabyte by now.
Seriously, it’s ridiculous just how good inlining everything is for performance, whether for first or subsequent page load; especially when you have hundreds of milliseconds of latency to the server, but even when you’re nearby. Local caches can be bafflingly slow, and letting the browser just execute it all in one go without even needing to look for a file has huge benefits.
It’s also a lot more robust. Fetching external resources is much more fragile than people tend to imagine.
It's called Content Security Policy, not Content Performance Policy. My thoughts:
1. Inlining everything burns bandwidth, even if it's 100KB each. (I hope your cloud hosting bills are small.) External resources can be cached across multiple pageloads.
2. Best practice is to load CSS files as early as possible in the header, and load (and defer) all scripts at the end of the page. The browser can request the CSS before it finishes loading the page. If you're inlining scripts, you can't defer them.
3. If you're using HTTP/2+ (it's 2025, why aren't you?[0]), the connection stays open long enough for the browser to parse the DOM to request external resources, cutting down on RTT. If you have only one script and CSS, and they're both loaded from the same server as the HTML, the hit is small.
4. As allan_s mentioned, you can use nonce values, but those feel like a workaround to me, and the values should change on each page load.
> Local caches can be bafflingly slow, and letting the browser just execute it all in one go without even needing to look for a file has huge benefits.
Source? I'd really like to know how and when slow caches can happen, and possibly how to prevent them.
[0] Use something like nginx, HAProxy, or Cloudflare in front of your server if needed.
> Source? I'd really like to know how and when slow caches can happen.
I don't have a source I can link to or share. But cache outliers are a real thing. If you aggregate Resource Timing results, you'll find some surprising outliers in that dataset where transferSize=0 (aka cached load on Chrome). You'll have users with a slow/contended disk where as they might have a fast link, but you'll also have the reverse where you'll have users with a fast cache and a slow network link (high latency, low bandwidth or both).
There's no universal answer here and I feel like the above poster tries to oversimplify a complex problem into one-size-fits-all answers. You'll have different users making up your distribution and you'll have to decide how you weight optimizations. This could very much depend on your product, the expectations and if your user are power users running a complex SaaS frontend, or a news site supporting a range of mobile devices.
A few years ago I traced and notice that Chrome has a pseudo O(n^2) behavior when pulling a bunch of sequential resources from its cache. I reported it but I'm not sure if it got fixed.
> It's called Content Security Policy, not Content Performance Policy
As is often the case with security, the downsides of locking something down may not be worth the increased security .
Another reason not to prohibit inline scripts and stylesheets is if you need to dynamically generate them (although I think strict-dynamic would allow that).
> External resources can be cached across multiple pageloads.
That only matters if the resource is actually shared across multiple pages
> If you have less than 100KB of JS and CSS (which almost every content site should be able to, most trivially, and almost all should aim to), there’s simply no question about it
Do you have data to back this up? What are you basing this statement on?
My intuition agrees with you for the reasons you state but when I tested this in production, my workplace found the breakeven point to be at around 1KB surprisingly. Unfortunately we never shared the experiment and data publicly.
I would expect it to be closer to 1KB, as well. 100KB is (at time of writing) about 5× the size of this webpage, and this doesn't load instantly for me.
note that for inline style/script, as long as you're not using `style=''` or `onclick=''` , you can use `nonce=` to have a hash and to my understanding, newly added inline script will not be tolerated, allowing to have the best of both world
It does seem like CSP nonces do not play well with caching (since they must have a different value on each page load), which would make them a detriment to performance.
I think that's a limitation of our implementations. In principle, it's just bytes that we shoving down the pipe to the browser, so it shouldn't matter for performance whether those bytes are 'inline' or in 'external resources'.
In principle, you could imagine the server packing all the external resources that the browser will definitely ask for together, and just sending them together with the original website. But I'm not sure how much re-engineering that would be.
Simple models are still useful: understanding exactly how and why they fail is instructive. There's a reason spherical cows in a vacuum come up again and again.
Firstly, upgrade from HDD to SSD. For random access, these are commonly 100–500× as fast, and even for block I/O 10–30×, and that will concretely speed up startup by a large fraction of that ratio, quite apart from speeding up other things later.
Once you get used to modern SSDs, as almost everyone on this site will be, I think you lose track of just how bad HDDs are, to run the OS from. My wife’s ten-year-old work laptop takes well over five minutes to boot up, log in, start a browser, load something like Gmail, and settle down so the disk is idle and it’s running as smoothly as it ever will; and sure, the aging i5-4300M CPU doesn’t help¹; but I suspect spending less than a thousand rupees replacing its HDD with even the cheapest and smallest SSD (acceptable capacity, in this case) might cut that to a minute, and spending a few thousand for a faster one would speed it up to below a minute.
(One fun thing about SSDs is that, overall, bigger is faster. At some points in history, for some makes, it’s been almost as simple as “twice as large, twice as fast”. This is, of course, a gross simplification, but I think not too far off.)
Secondly, if you have less than 8GB of RAM, get more. Beyond that it varies depending on what you’re using it for, but up to at least that point, it’s just an unconditional improvement.
—⁂—
¹ PassMark lists single/multi scores for the Intel Core i5-4300M of around 1,700/3,000. Some units in recent generations from approximately the same segment: the Intel Core i5-1334U scoring 3,350/13,400, and the Intel Core Ultra 5 125H scoring 3,450/21,500. This basically means an absolute minimum of 2× speedup on any workload, and for most it’s more like 3–4×. There’s a lot of difference in ten years of CPU.
I find it funny calling Arch “less stable”, because I’m inclined to find it more stable, for my purposes, skills and attitudes.
I’ve administered at least one each of: Ubuntu server (set up by another; the rest were by me), Ubuntu desktop at least ten years ago, Arch desktop, Arch server.
The Arch machines get very occasional breakages, generally either very obvious, or signposted well. I did have real trouble once, but that was connected with cutting corners while updating a laptop that had been switched off for two years. (I’ve updated by more than a year at least two other times, with no problems beyond having to update the keyring package manually before doing the rest. The specific corners I cut this one time led to the post-upgrade hooks not running, and I simply forgot to trigger them manually in order to redo the initcpio image, because I was in a hurry. Due to boot process changes, maybe it was zstd stuff, can’t remember, it wouldn’t boot until I fixed it via booting from a USB drive and chrooting into it and running the hooks.)
Now Ubuntu… within a distro release it’s no trouble, except that you’re more likely to need to add external package sources, which will cause trouble later. I feel like Ubuntu release upgrades have caused a lot more pain than Arch ever did. Partly that may be due to differences in the sorts of packages that are installed on the machines, and partly it may be due to having used third-party repositories and/or PPAs, but there were reasons why those things had to be added, whether because software or OS were too old or too new, and none of them would have been needed under Arch (maybe a few AUR packages, but ones where there would have been no trouble). You could say that I saw more trouble from Ubuntu because I was using it wrong, but… it wouldn’t have been suitable without so “using it wrong”.
I’m curious: does this fundamentally need to contain an actual model, or would it be okay if it generated a synthetic model itself, full of random weights? I’m picturing downloading just, say, a 20MB file instead of the multi-gigabyte one, and…
> The llamafile executable size is increased from 30mb to 200mb by this release. This is caused by https://github.com/ggml-org/llama.cpp/issues/7156. We're already employing some workarounds to minimize the impact of upstream development contributions on binary size, and we're aiming to find more in the near future.
Ah, of course, CUDA. Honestly I might be more surprised that it’s only this big. That monstrosity will happily consume a dozen gigabytes of disk space.
llamafile-0.9.0 was still 231MiB, then llamafile-0.9.1 was 391MiB, now llamafile-0.9.2 is 293MiB. Fluctuating all over the place, but growing a lot. And localscore-0.9.2 is 363MiB. Why 70MiB extra on top of llamafile-0.9.2? I’m curious, but not curious enough to investigate concretely.
Well, this became a grumble about bloat, but I’d still like to know whether it would be feasible to ship a smaller localscore that would synthesise a suitable model, according to the size required, at runtime.
—⁂—
¹ Eww, GitHub is using the “MB” suffix for its file sizes, but they’re actually mebibytes (2²⁰ bytes, 1048576 bytes, MiB). I thought we’d basically settled on returning the M/mega- prefix to SI with its traditional 10⁶ definition, at least for file sizes, ten or fifteen years ago.
Llamafile could certainly be released without the GPU binaries included by default and it would slim down the size tremendously.
The extra 70MiB is that the CUDA binaries for LocalScore are built with CuBLAS and for more generations of NVIDIA architectures (sm60->sm120), whereas Llamafile is built with TinyBLAS and for just a few generations in particular
I think it's possible to randomize weights with a standard set of layers, and maybe a possibility for the future
let mut iterator = data.chunks_mut(stride);
while let Some(line) = if !flipped { iterator.next() } else { iterator.next_back() } {
…
}
I sometimes wonder idly if the language would have been better off without a `for` loop (if `while let` had happened much earlier). `while let` is more verbose, mainly because iterator construction has to be an explicit separate line, but it’s also more amenable to alteration to make it more fit for purpose. (You can go down to `loop` and `break`, but I think this sacrifices too much in ergonomics and comprehensibility in the usual case. As `while let` also compromises, but I wonder if it’s closer to the sweet spot than commonly imagined, and `for` doesn’t actually get you so much.)
I can reproduce it in 137 stable on Android and 138 Nightly on Linux from 2025-03-10 (I’m not normally so far out of date, there was a specific reason this time), but it requires the uBlock Origin extension to be enabled.
> Confetti source text consists of zero or more Unicode scalar values. For compatibility with source code editing tools that add end-of-file markers, if the last character of the source text is a Control-Z character (U+001A), implementations may delete this character.
I’ve heard of this once, when researching ASCII control codes and related ancient history, but never once seen it in real life. If you’re insisting on valid Unicode, it sounds to me like you’re several decades past that happening.
And then given that you forbid control characters in the next section… make up your mind. You’re saying both that implementations MAY delete this character, and that source MUST NOT use it. This needs clarification. In the interests of robustness, you need to specify what parsers MUST/SHOULD/MAY do in case of content MUST violations, whether it be reject the entire document, ignore the line, replace with U+FFFD, &c. (I would also recommend recapitalising the RFC 2119 terms. Decapitalising them doesn’t help readability because they’re often slightly awkward linguistically without the reminder of the specific technical meaning; rather it reduces their meaning and impact.)
> For compatibility with Windows operating systems, implementations may treat the sequence Carriage Return (U+000D) followed by Line Feed (U+000A) as a single, indivisible new line character sequence.
This is inviting unnecessary incompatibility. I recommend that you either mandate CRLF merging, or mandate CR stripping, or disallow special CRLF handling. Otherwise you can cause different implementations to parse differently, which has a long history of causing security problems, things like HTTP request smuggling.
I acknowledge this is intended as the base for a family of formats, rather than a strict single spec, but I still think allowing such variation for no good reason is a bad idea. (I’m not all that eager about the annexes, either.)
A plain ASCII document is a valid UTF-8 document, but I agree that special support for ^Z is pointless for a file format invented 20+ years after the demise of MS-DOS. Handling ^Z would probably be MS-DOS’ job anyway.
Using the event dispatch mechanism is flat-out bigger, anyway. Here’s the interface of the original script (that is, global pub/sub functions taking a name), except that the receiver site no longer needs to look at the .detail property so it’s better:
let t={};
sub=(e,c)=>((e=t[e]??=new Set).add(c),()=>e.delete(c));
pub=(n,d)=>t[n]?.forEach(f=>f(d))
The original was 149 bytes; this is 97.
(The nullish coalescing assignment operator ??= has been supported across the board for 4½ years. Avoiding it will cost six more bytes.)
This isn't the same though. With EventTarget, if one of the callback throws, the later callbacks would still get called. With yours the later callbacks don't get called.
This has been a popular technique at times, but it tends to increase compressed sizes: gzip and similar are better at common string deduplication, having lower overhead. Such shenanigans are also bad for performance, especially in hot paths due to making it harder for the browser to optimise it.
> It's not like people just one-shot a whole module of code, why would LLMs?
For conversions between languages or libraries, you often do just one-shot it, writing or modifying code from start to end in order.
I remember 15 years ago taking a 10,000 line Java code base and porting it to JavaScript mostly like this, with only a few areas requiring a bit more involved and non-sequential editing.
I think this shows how the approach LLMs take is wrong. For us it's easy because we simply sort of iterate over every function with a simple prompt of doing a translation, but are yet careful enough taking notes of whatever may be relevant to do a higher level change if necessary.
Maybe the mistake is mistaking LLMs as capable people instead of a simple, but optimised neuron soup tuned for text.
One of the nifty things about the target being JavaScript was that I didn’t have to finish it before I could run it—it was the sort of big library where typical code wouldn’t use most of the functionality. It was audio stuff, so there were a couple of core files that needed more careful porting (from whatever in Java to Mozilla’s Audio Data API, which was a fairly good match), and then the rest was fairly routine that could be done gradually, as I needed them or just when I didn’t have anything better to focus on. Honestly, one of the biggest problems was forgetting to prefix instance properties with `this.`
I strongly disagree: inlining your entire CSS and JS is absurdly good for performance, up to a surprisingly large size. If you have less than 100KB of JS and CSS (which almost every content site should be able to, most trivially, and almost all should aim to), there’s simply no question about it, I would recommend deploying with only inline styles and scripts. The threshold where it becomes more subjective is, for most target audiences, possibly over half a megabyte by now.
Seriously, it’s ridiculous just how good inlining everything is for performance, whether for first or subsequent page load; especially when you have hundreds of milliseconds of latency to the server, but even when you’re nearby. Local caches can be bafflingly slow, and letting the browser just execute it all in one go without even needing to look for a file has huge benefits.
It’s also a lot more robust. Fetching external resources is much more fragile than people tend to imagine.