When I've talked to people running this kind of ai scraping/agent workflow, the costs of the AI parts dwarf that of the web browser parts. This causes computational cost of the browser to become irrelevant. I'm curious what situation you got yourself in where optimizing the browser results in meaningful savings. I'd also like to be in that place!
I think your ram usage benchmark is deceptive. I'd expect a minimal browser to have much lower peak memory usage than chrome on a minimal website. But it should even out or get worse as the websites get richer. The nature of web scraping is that the worst sites take up the vast majority of your cpu cycles. I don't think lowering the ram usage of the browser process will have much real world impact.
The cost of the browser part is still a problem. In our previous startup, we were scraping >20 millions of webpages per day, with thousands of instances of Chrome headless in parallel.
Regarding the RAM usage, it's still ~10x better than Chrome :) It seems to be coming mostly from v8, I guess that we could do better with a lightweight JS engine alternative.
Yes but WebKit is not a browser per se, it's a rendering engine.
It's less resource-intensive than Chrome, but here we are talking orders of magnitude between Lightpanda and Chrome. If you are ~10x faster while using ~10x less RAM you are using ~100x less resources.
Careful, as you implement misssing features your RAM usage might grow too. Happened to many projects, lean at the beggining, get's just as slow when dealing with real world mesiness.
Yeah, could be nice to allow the user to select the type of ECMAScript engine that fits their use-case / performance requirements (balancing the resources available).
Generally, for consumer use cases, it's best to A) do it locally, preserving some of the original web contract B) run JS to get actual content C) post-process to reduce inference cost D) get latency as low as possible
Then, as the article points out, the Big Guns making the LLMs are a big use case for this because they get a 10x speedup and can begin contemplating running JS.
It sounds like the people you've talked to are in a messy middle: no incentive to improve efficiency of loading pages, simply because there's something else in the system that has a fixed cost to it.
I'm not sure why that would rule out improving anything else, it doesn't seem they should be stuck doing nothing other than flailing around for cheaper LLM inference.
> I think your ram usage benchmark is deceptive. I'd expect a minimal browser to have much lower peak memory usage than chrome on a minimal website.
I'm a bit lost, the ram usage benchmark says its ~10x less, and you feel its deceptive because you'd expect ram usage to be less? Steelmanning: 10% of Chrome's usage is still too high?
The benchmark shows lower ram usage on a very simple demo website. I expect that if the benchmark ran on a random set of real websites, ram usage would not be meaningfully lower than Chrome. Happy to be impressed and wrong if it remains lower.
Seems like Hyundai own 33% of Kia, rather than it just being a brand under the same company like Lexus/Toyota. They share some things and compete on others.
It's a lot more complicated than that. They also have some common owners, and Kia owns parts of some Hyundai subsidiaries. Chaebol's are complicated beasts.
I think Hacker News might appreciate some of the behind the scenes of this post.
Getting this page to load quickly was not trivial. The initial dataset of books starting sentences was over 20 megabytes. By only sending the unique prefix of each book, I was able to get that to be much smaller. Using a custom format, sorting the prefixes, and gzipping got the size down to 114kb. About 3 bytes per book. The full first sentences are downloaded on demand as the books are filtered down.
Rendering the books requires 5 million triangles. I used WebGL 2's drawArraysInstanced method. This allows me to define the book geometry only once, and each book is just defined by it's rotation/position/color. Then it's just a matter of keeping the fragment shader simple.
Going into this project, I wasn't sure if it was possible. But I have left feeling really impressed with how capable the web is these days if you are willing to push a bit.
It would make competitive chess even more draw-ish. It is much easier to see when you accidentally get into a losing position than when you miss a winning idea. So the take back would be used defensively.
On the full set of 1000 questions, the language models are getting 30-35% correct. With patience, humans can do 40-50%.
The language models were prompted with the text + each candidate answer, and the one with the lowest perplexity was picked. I tried to avoid instruction tuned models wherever possible to avoid the "voice" problem.
i'm curious, how did you arrive at "40-50%" possible human performance?
the task of "predicting the next word" can be understood as either "correctly choosing the next word in the hidden context", or "predicting the likelihood of each possible word".
the quiz is evaluating against the former, but humans are still far from being able to express a percentile likelihood for each possibility.
i only consciously arrive at a vague feeling of confidence, rather than being able to weigh the prediction of each word with fractional precision.
one might say that LLMs have above human introspective ability in that regard.
Are you smarter than a language model?
There are a lot of benchmarks that try to see how good language models are at human tasks. But how good are you at the quintessential language model task of predicting the next word?
And then a list of questions.
How am I supposed to know it has anything to do with HN?
Temperature doesn't play a role here, because the LLM is not being sampled (other than to generate the candidate answers). Instead the answer the llm picks is decided by computing the complexity for the full prompt + answer string.
The language model generating the candidate answers generates tokens until a full word is produced. The language models picking their answer choose the completion that results in the lowest perplexity independent of the tokenization.
I'd say the test is still not quite valid, and more of in between the original "valid" task and "guess what LLM would say" as suggested in another comment here. The reason is: it might be easier for LLMs to choose the completion out of their own generated variants (1) than the real token distribution.
1. perhaps even out of variants generated by other LLMs
I think your ram usage benchmark is deceptive. I'd expect a minimal browser to have much lower peak memory usage than chrome on a minimal website. But it should even out or get worse as the websites get richer. The nature of web scraping is that the worst sites take up the vast majority of your cpu cycles. I don't think lowering the ram usage of the browser process will have much real world impact.