For student assignment cheating, only really the em dashes would still be in the output. But there are specific words and turns of phrases, specific constructions (e.g., 'it's not just x, but y'), and commonly used word choices. Really it's just a prim and proper corporate press release style voice -- this is not a usual university student's writing voice. I'm actually quite sure that you'd be able to easily pick out a first pass AI generated student assignment with em dashes removed from a set of legitimate assignments, especially if you are a native English speaker. You may not be able to systematically explain it, but your native speaker intuition can do it surprisingly well.
What AI detectors have largely done is try to formalize that intuition. They do work pretty well on simple adversaries (so basically, the most lazy student), but a more sophisticated user will do first, second, third passes to change the voice.
Because of the way regurgitation works. "You're absolutely right" primes the next tokens to treat whatever preceded that as gospel truth, leaving no room for critical approaches.
No. No one is looking for em-dashes, except for some bozos on the internet. The "default voice" of all mainstream LLMs can be easily detected by looking at the statistical distribution of word / token sequences. AI detector tools work and have very low false negatives. They have some small percentage of false positives because a small percentage of humans pick up the same writing habits, but that's not relevant here.
The "humanizer" filters will typically just use an LLM prompted to rewrite the text in another voice (which can be as simple as "you're a person in <profession X> from <region Y> who prefers to write tersely"), or specifically flag the problematic word sequences and ask an LLM to rephrase.
They most certainly don't improve the "correctness" and don't verify references, though.
It appears ChromeOS is being killed and they're porting much of its feature set into Android. This may be marketed as "ChromeOS", with identical functionality, and consumers won't be none the wiser.
My Moto Edge 2024 has "Ready For" which is basically this still today. I plug in the USB-C cable normally connected to my work MacBook and I instantly get a full desktop experience; mouse, keyboard and sound included.
It's how I play Minecraft with my kids when they get the itch. Sometimes if I know I'm only gonna be zoning out on Youtube at night I'll use to to save a few watts too.
It can do 1440p at 120hz, all on a really affordable phone. It's nice.
Phones were way less powerful 15 years ago and native software was much more important. A modern phone CPU running a browser on a larger screen takes care of a lot of what you need these days.
I've only used it when I'm in a pinch but it's handy. Blowing up mobile apps to a larger screen and multitasking isn't ideal certainly but I've been able to handle "email job" type activities while out of pocket. That said I've never heard of anyone else who's actually used it.
Internet censorship is more of a reality and a problem than it felt at the dawn of the age of cheap wireless broadband. I can certainly see the value in local wikipedia copies if internet blocks, age gates, etc need to be contended with.
I guess the first question I have is if these problems solved by LLMs are just low-hanging fruit that human researchers either didn't get around to or show much interest in - or if there's some actual beef here to the idea that LLMs can independently conduct original research and solve hard problems.
That's the first warning from the wiki : <<Erdős problems vary widely in difficulty (by several orders of magnitude), with a core of very interesting, but extremely difficult problems at one end of the spectrum, and a "long tail" of under-explored problems at the other, many of which are "low hanging fruit" that are very suitable for being attacked by current AI tools.>> https://github.com/teorth/erdosproblems/wiki/AI-contribution...
There is still value on letting these LLMs loose on the periphery and knocking out all the low hanging fruit humanity hasn’t had the time to get around to. Also, I don’t know this, but if it is a problem on Erdos I presume people have tried to solve it atleast a little bit before it makes it to the list.
Is there though? If they are "solved" (as in the tickbox mark them as such, through a validation process, e.g. another model confirming, formal proof passing, etc) but there is no human actually learning from them, what's the benefit? Completing a list?
I believe the ones that are NOT studied are precisely because they are seen as uninteresting. Even if they were to be solved in an interesting way, if nobody sees the proof because they are just too many and they are again not considered valuable then I don't see what is gained.
Some problems are ‘uninteresting’ in that they show results that aren’t immediately seen as useful. However, solutions may end up having ‘interesting’ connections or ideas or mathematical tools that are used elsewhere.
More broadly, I think there’s a perspective that literally just building out thousands more true statements in Lean is going to keep cementing math’s broadening knowledge framework. This is not building a giant castle a-la Wiles, it’s laying bricks in the outhouse, but someday those bricks might be useful.
Phind was the first AI search I used as well. But they seemed to be quickly outfoxed by Perplexity. I started using Perplexity after it was recommended to me as having fewer hallucinations - now it can integrate its tools with SOTA models like Opus.
Would love to know the thought process and rationale of whoever underwrote that policy. My experience suggests you should never trust unsupervised LLMs for anything life or mission critical.
Having to prime it with more context and more guardrails seems to imply they're getting worse. That's fewer context and guardrails it can infer/intuit.
No, they are not getting worse. Again, look at METR task times.
The peak capability is very obviously, and objectively, increasing.
The scaffolding you need to elicit top performance changes each generation. I feel it’s less scaffolding now to get good results. (Lots of the “scaffolding” these days is less “contrived AI prompt engineering” and more “well understood software engineering best practices”.)
Why the downvotes, this comment makes sense. If you need to write more guardrails that does increase the work and at some point amount of guardrails needed to make these things work in every case would be just impractical. I personally dont want my codebase to be filled baby sitting instructions for code agents.
* grep to remove em dashes and emojis
* re-run through another llm with a prompt to remove excessive sycophantry and invalid url citations
reply