Hacker Newsnew | past | comments | ask | show | jobs | submit | DanyWin's commentslogin

Yes, we are working on that! We are preparing to release a feature for people to enable telemetry to contribute to a decentralized and open dataset to train and evaluate models for Selenium code


Very interesting indeed!

We are thinking of developing an extension that would connect the browser to LaVague so that actions can be sent to the extension and be executed locally, thus bypassing their barriers


You are exactly right! As I wanted to have a solution that works with many LLMs out of the box, I focused on chain of thoughts and few shot learnings.

Lots of paper show that fine-tuning only helps with steerability and form (https://arxiv.org/abs/2402.05119), therefore I thought it would be sufficient to provide just the right examples and it did work!

We do intend to create a decentralized dataset to further train models and have maybe a 2b or 7b model working well


What kind of problems are you seeing that you think can be improved with a fine tune?


Thank you for linking that paper!


Thanks! Funny thing, we did not use Vision models but text only with the HTML of the current page. However, we intend to add it to boost performance


Interesting that it’s not vision based, I suspect you will get much better performance once vision is incorporated, using e.g LLaVa style models


Thanks a lot! Love the support <3


This is just the beginning, but it is indeed on the roadmap!

Once we solve browser automation, we intend to support other integrations to further facilitate automation of workflows


It could indeed have an impact on jobs, just like any productivity gains have destroyed jobs.

However, the net gains, in my humble opinion, could be phenomenal. Imagine all the time, mental energy and money spent on navigating through the legacy of today's society? From the legacy legal systems that is super complex, to legacy websites, I believe there is much time to be saved so we can dedicate resources to what truly matters, intellectual pursuits or quality time with friends and family


> However, the net gains, in my humble opinion, could be phenomenal.

Doesn't seem like a very humble opinion, every time people lose work they need to find income somewhere else or end up working more anyway. Productivity gains equalling more free time has only really ever worked for people who end up or who were already unemployed or self-employed, otherwise it's propaganda spread by people who stand to gain. Even in cases where someone's job became only less manual, it's not like they suddenly got the rest of the day off to spend with their family, they just ended up operating the machine all day anyway, and often getting paid less to do it, to a point where eventually families and friends as a concept started becoming more rare.


> However, the net gains, in my humble opinion, could be phenomenal.

And historically, have always been phenomenal.

If 100 years ago, you told people that only 1.5% of people in USA/Canada would work in agriculture, politicians would have been horrified and in fear of mass unemployment. They would have been similarly horrified if you told them that virtually nobody would work in textile manufacturing in the Western World.

But in reality, the jobs in the former are considered so dismal that they are heavily staffed by desperate people who have no other legal work options and migrant workers from poor countries and jobs in the latter pay so poorly globally that you would be better off running a lemonade stand in a Western country.

We are far better off for the combine harvester freeing us from harvesting wheat by hand. We are far better off for the sewing machine.


> We are far better off for the combine harvester freeing us from harvesting wheat by hand. We are far better off for the sewing machine.

Who's "we"? It's not like the people who aren't working with a scythe have moved up to be un-employed computer programmers, they're just picking fruit now.

People who were sewing by hand as a professional don't generally get the afternoon off now to chill with their homies, they just use the sewing machine all damn day.

The only "we" who is better off are consumers and business operators, because they pay less or nothing for that labour. Nobody is talking about the comfy lives of fast fashion makers or the people who assemble our $7000 MacBook pros.


> Imagine all the time, mental energy and money spent on navigating through the legacy of today's society?

I can see the business perspective for sure. But I really don’t think humanity have the luxury to consume even more energy to run billions of GPUs to do what a programmer team could do and in the meantime having an excuse to not fix its legacy.

That sounds like either totally cyberpunk or very late stage capitalism.

We need to reduce global energy consumption and fix the society as much as we can, not going full throttle in the current direction.


Here we just provide natural language instructions and the LLMs generate the code appropriate at a given time. If the site changes, we can regenerate the code using the same instruction, so unless the site changes a lot, it is quite robust


Right so in general I can see this in use by development teams itself cuz we don't want to sit there and manually write tests.

I'd love to tell it to just log in to my own website, click on certain pieces of functionality and repeat that. Especially with more casual day to day tasks.

Heck, we could even auto-generate tests from a bug report (where the steps to reproduce are written in plain english by non-technical testers).

That means less time for a dev to actually reproduce those steps, right?


Exactly! In the future, testers could just write tests in natural language.

Every time we detect, for instance with a vision model, that the interface changed, we ask the Large Action Model to recompute the appropriate code and have it be executed.

Regarding generating tests from bug report totally possible! For now we focus on having a good mapping from low level instructions ("click on X") -> code, but once we solve that, we can have another AI take bug reports -> low level instructions, and use the previously trained LLM!

Really like your use case and would love to chat more about it if you are open. Could you come on our Discord and ping me? https://discord.gg/SDxn9KpqX9


I don't use discord much but joined to provide any additional thoughts.


There is still a design decision to be made on whether we go for TPMs for integrity only, or go for more recent solutions like Confidential GPUs with H100s, that have both confidentiality and integrity. The trust chain is also different, that is why we are not committing yet.

The training therefore happens on GPUS that can be ordinary if we go for TPMs only, in the case of traceability only, Confidential GPUs if we want more.

We will make the whole code source open source, which will include the base image of software, and the code to create the proofs using the secure hardware keys to sign that the hash of a specific model comes from a specific training procedure.

Of course it is not a silver bullet. But just like signed and audited closed source, we can have parties / software assess the trustworthiness of a piece of code, and if it passes, sign that it answers some security requirements.

We intend to do the same thing. It is not up to us to do this check, but we will let the ecosystem do it.

Here we focus more on providing tools that actually link the weights to a specific training / audit. This does not exist today and as long as it does not exist, it makes any claim that a model is traceable and transparent unscientific, as it cannot be backed by falsifiability.


What's the point of any of this TPM stuff? Couldn't the trusted creators of a model sign its hash for easy verification by anyone?


I think the point is to get a signed attestation that an output came from a given model, not merely sign the model.


Why does this matter at all?


You go to a jewelry store to buy gold. The salesperson tells you that the piece you want is 18karat gold, and charges you accordingly.

How can you confirm the legitimacy of the 18k claim? Both 18k and 9k look just as shiny and golden to your untrained eye. You need a tool and the expertise to be able to tell, so you bring your jeweler friend along to vouch for it. No jeweler friend? Maybe the salesperson can convince you by showing you a certificate of authenticity from a source you recognize.

Now replace the gold with a LLM.


You go to school and learn US History. The teacher tells you a lot of facts and you memorize them accordingly.

How can you confirm the legitimacy of what you have been taught?

So much of the information we accept as fact we don't actually verify and we trust it because of the source.


In a way, students trust the aggregate of "authority checking" that the school and the professors go through in order to develop the curriculum. The school acts as the jeweller friend that vouches for the stories you're told. What happens when a school is known to tell tall tales? One might assume that the reputation of the school would take a hit. If you simply don't trust the school, then there's no reason to attend it.


A big part of this is what the possible negative outcomes of trusting a source of information are.

An LLM being used for sentencing in criminal cases could go sideways quickly. An LLM used to generate video subtitles if the subtitles aren't provided by someone else would have more limited negative impacts.


If my reading of it is correct this is similar to something like a trusted bootchain where every step is cryptographically verified against the chain and the components.

In plain english the final model you load and all the components used to generate that model can be cryptographically verified back to whomever trained it and if any part of that chain can't be verified alarm bells go off, things fail, etc.

Someone please correct me if my understanding is off.

Edit: typo


How does this differ from challenges around distributing executable binaries? Wouldn't a signed checksums of the weights suffice?


I think this is more a „how did the sausage get made“ situation, rather than an „is it the same sausage that left the factory“ one.


Sausage is a good analogy. It is both (at least with chains of trust) the manufacturer and the buyer that benefits but at different layers of abstraction.

Think of sausage(ML model), made up of constituent parts(weights, datasets, etc) put through various processes(training, tuning), end of the day, all you the consumer cares about is the product won't kill you at a bare minimum(it isn't giving you dodgy outputs). In the US there is the USDA(TPM) which quite literally stations someone(this software, assuming I am grokking it right) from the ranch to the sausage factory(parts and processes) at every step of the way to watch(hash) for any hijinks(someone poisons the well), or just genuine human error(gets trained due to a bug on old weights) in the stages and stops to correct the error and find the cause and allows you traceability.

The consumer enjoys the benefit of the process because they simply have to trust the USDA, the USDA can verify by having someone trusted checking at each stage of the process.

Ironically that system exists in the US because meatpacking plants did all manner of dodgy things like add adulterants so the US congress forced them to be inspected.


Except there’s a quantifiable difference between 18k and 9k gold.

Differences in interpretations of historical and cultural events are far more nuanced.

We’ll likely end up in a place with many trusted sources of attestation, each with their own bias toward particular notions of the truth.

Like schools and media outlets, there will be many LLMs to choose from that will tell you, confidently and authoritatively, what you want to hear.


Why should we trust your certificate more than it looking shiny? What exactly are you certifying and why should we believe you about it?


You shouldn't trust any old certificate more than it looking shiny. But if a third party that you recognise and trust happens to recognise the jewelry or the jeweler themselves, and goes so far as to issue a certificate attesting to that, that becomes another piece of evidence to consider in your decision to purchase.


Art and antiquities are the better analogy.

Anything without an iron-clad chain of provenance should be assumed to be stolen or forged.

Because the end product is unprovably authentic in all cases, unless a forger made a detectable error.


Exactly! It's not sufficient but it's at least necessary. Today we have no proof whatsoever about what code and data were used, even if everything were open sourced, as there are reproducibility issues.

There are ways with secure hardware to have at least traceability, but not transparency. This would help at least to know what was used to create a model, and can be inspected a priori / a posteriori


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: