Gonna be completely honest, if you want to draw in people with anything more than a casual interest in literature, your examples are an immediate turn-off. I suggest you spend time on subreddits of major genres + booktok + see what's trending on apps like Fable if you want insight into what books people outside of the silicon valley/techbro bubble consume and enjoy.
lmarena/lmsys is beyond useless, looking at prior rankings of models vs formal benchmarks or testing for accuracy + correctness on batches of real world data. It's a bit like using a poll of Fox News to discern the opinions of every American; the audience voting is consistently found wanting. Not even getting into how easily a bad actor with means + motivation (in this "hypothetical" instance wanting to show that a certain model is capable of running the entire US government) can manipulate votes which has been brought up in the past (yes I'm aware of the lmsys publication on how they defend against attacks using cloudflare + recaptcha, there are ways around that.)
So you're saying that either A: users interacting with models can't objectively rate what responses seem better to humans, B: xAi as a newcomer has somehow managed to game the leaderboard better than all those other companies, or C: all those other companies are not doing it. By those standards every test ever devised for anything is beyond useless. But simply not having the model creator running the evaluation is already going a long way.
No I'm saying that some companies are doing it (OpenAI at the very least), the company in question has motive and capability to game the system (kudos to them for pushing the boundaries there), AND the userbases' rankings have been historically, statistically misaligned with data from evals (though flawed) and especially when it comes to testing for accuracy + precision on real world data (outside of their known or presumed dataset). Take a look at how well Qwen or Deepseek actually performed vs the counterparts that were out at the same time vs their corresponding rankings.
In the nicest way possible I'm saying this form of preference testing is ultimately useless, primarily due to a base of dilettantes with more free time than knowledge parading around as subject matter experts and secondarily due to presumed malfeasance. The latter is more apparent to more of the masses (that don't blindly believe any leaderboard they see) now that access to the model itself is more widespread and people are seeing the performance doesn't match the "revolution" promised [0]. If you're still confused why selecting a model based on a glorified Hot or Not application is flawed, perhaps ask yourself why other evals exist in the first place (hint: some tests are harder than others.)
At work, developed our own suite of benchmarks. Every company with a serious investment in AI-powered platforms needs to do the same. Comparing our results to the Arena turns up some pleasant surprises, like DBRX hitting way above its weight for some reason.
You say no, but then go on and explain why you believe a combination of both option A and option B. That's fine I guess, I just don't consider it particularly likely given the currently available information.
The cases of NAION observed post-Ozempic usage in Denmark is 150 (up from 60-75) out of 424,152 patients, for a rare ailment that already affects patients specifically with diabetes. Sorry to say those taking it as a "shortcut" in your words are even less susceptible.
As someone who's been fortunate enough to be fit and able to work out their entire life, not sure how there are people like you who shun and shame those trying to gain a semblance of control over their weight in a world where it does have a real impact whether they get serious medical attention or not. Your likely skewed thoughts on vanity be damned, bigger people are treated worse across the board and GLP-1 is a genuine salve.
Any part of "saving" Intel should include a mechanism barring them from putting any more money that should be spent on R&D towards stock buybacks ($152B since 1990 as of September.) That said quoting the former Intel CEO (who still owns 3,245,986 shares) as "[one of the] expert[s] who says breaking up Intel won't do any good" seems like journalist malpractice--and makes me all the more certain it should be subsumed by a company with executives hungry to actually win again.
Buybacks are just dividends with better tax implications that somehow make people angry.
Companies historically are expected to pay dividends, at least when their business is doing well. Business at Intel was doing well for most of 1990-2017. There was some time after the Pentium 4 stopped scaling before the Pentium 4M offered a recovery, and the Itanium mess; but overall pretty good until 2017.
Buybacks aren't exactly like dividends because they directly affect the pricing of the stock by interfering with the supply and demand. That said I think people are mostly angry about the conflict of interest where CEOs that have current and future shares are making decisions by what will maximize their personal returns rather than what's best for the company and shareholders.
When a company with growth prospects does well it should invest those $$$'s into things like R&D and expansion. Companies that pay their profit as dividend are generally not expected to grow as much and their stock prices (P/E) tends to reflect that.
That said the taxation aspect is maybe a problem and should be addressed if it's not working as intended.
Ok so I prefer dividends as I think that they encourage better behavior on the part of both investors and companies.
I think that buybacks definitely create a massive conflict of interest for C levels remunerated based on share price or EPS, and I dislike that I must sell to realize any gains. But perhaps this is a niche position .
>that should be spent on R&D towards stock buybacks ($152B since 1990 as of September.)
Starting from 1990 seems like a weird starting point, because it includes much of Intel's heyday when their profits were arguably well deserved. Is the implication that every business shouldn't have profits and should plow every cent back to R&D?
I used that period (vs saying they spent $110B between 2005-2021) to establish the fact that it's a known, expected pattern of behavior regardless of Intel's performance, roadmap, or market conditions to lead the reader to recognize that if bailed out they'll likely continue in the near future instead of utilizing that money for its intended purpose.
Instead of assuming my comment is a generalized view on how businesses should operate as whole (and not the subject of the piece), perhaps take a moment to consider how the magnitude of buybacks--in the face of stiff competition, that have now leapfrogged them--is directly correlated to the mismanagement and dysfunction within Intel that leaves them unable to rise to the challenge the country demands.
Most companies do stock buybacks as a way to pay out bonuses to employees and execs with a lower tax rate. Since RSUs are taxed lower (I think), companies pay employees with those. But those grants are given by creating new shares. To not dilute the value of these shares, the company needs to keep buying back shares.
RSUs are treated as cash income at vest and taxed at same rates.
Stock buybacks benefit general shareholders (i.e. beyond employees) since they push up stock value without causing a taxable event. The alternative is dividends which are immediately taxed.
The conclusions reached in the paper and the headline differ significantly. Not sure why you took a line from the abstract when even further down it notes that it's that some elements of "truthfulness" are encoded and that "truth" as a concept is multifaceted. Further noted is that LLMs can encode the correct answer and consistently output the incorrect one, with strategies mentioned in the text to potentially reconcile the two, but as of yet no real concrete solution.
If something that sells 100 million+ devices isn't "super popular", I don't know what is. And not even counting the millions of TVs that have it built-in (Hi-Sense, TCL, Samsung) the brand is pretty ubiquitous.
I was being generous and said "not even counting," but no despite the internal name change, most still maintain the "Chromecast Built-In" designation on their branding and sites which takes a mere second to Google and see.
The phones were released in October, while Gemini Nano's announcement happened in December. I, like other developers and consumers reaching for the smaller version, might've bought the device for the ability to run the ML features advertised in their keynote/based on the research they released the week prior to that (in the case of the former.)
During Gemini's initial release the language surrounding nano was that it was only the Pro initially, and I was happy to wait. The complete inability to run it, when the new Samsung phones can (including the model with 8GB as reported above) feels not only like a bait-and-switch/false-advertising, but a constraint based solely on driving sales. It does demand a clear explanation.
I care less about another potential Pixel class action, and more that I have to get another phone to test and deploy my apps to a smaller audience to.
The two phones might have similar or even identical memory chips, but that doesn't mean the carving of the address space is the same. A more meaningful comparison would be looking at how much of those 8GB are pinned by the various components (kernel, graphic buffers, sound, camera, telephony, radios, etc.), how much is left to user and system apps, how the system is tuned for active/background processes. 8GB is just a single data point, too simplistic to draw any even remotely plausible conclusions.
Not surprised this was flagged despite a civil discussion and how relevant it is to this late stage limbo social networks currently occupy. To be frank, it runs counter to the narrative that the dedicated cohort of "free speech" absolutists here whom don't want an example of why there absolutely need to be limits to what can be posted and disseminated.
It's not fine when this "magic" is being advertised as on-device.
After reading (and attempting to quickly implement the models ensembles within) both the RealFill[0] and Break-A-Scene[1] papers published from Google researchers just prior to the Pixel 8 launch I was expecting either a leap in their G3 tensor core akin to 2013 Moto X NLP+contextual awareness cores[2] (which provided better implementations of Active Display, gesture recognition, and voice recognition in loud environs than 95% of current mobile devices) or the Coral[3], the edge TPU they developed that got shockingly amazing inference performance from (though HW production handed off to ASUS in 2022--thanks to the chip shortage, the general arbitrary nature of the company, and their wholesale divestment from IoT) I expected more.
All that to say this: your assumptions of inference performance on >$1000 hardware are fundamentally flawed (the fact that you reach for the buzzy "generative" prefix suggests they're erroneously informed by twitter influencers and attempting to deploy current LLMs.)
Custom hardware can and has been developed in the past (on mobile devices) that could've been tailored to the task at hand. If they failed to meet performance, power draw, or processing time requirements, they should've reframed their pitch instead of exposing themselves to what is likely going to be yet another class action suit focusing on their hardware.
> It's not fine when this "magic" is being advertised as on-device.
I can't help but notice that you included a lot of references, but none for this claim.
Neither of the features mentioned in the article is claimed to be on-device in the official Pixel 8 Pro announcement blog[0]. The only feature that the blog post claims is on-device is the Best Take feature, which the article does not say requires an internet connection.
But of course that's just one bit of marketing material, and I'm sure you've seen these features advertised as happening on-device. Maybe you could post a link?
> Custom hardware can and has been developed in the past (on mobile devices) that could've been tailored to the task at hand.
You think google doesn't know that? are you aware of what's inside of Google's phones?
I'm not sure what performance benefit you expect out of custom hardware. How many orders of magnitude? You're going to probably need at least a few, probably more, to make generative AI work well in the palm of your hand.
Oh, and if you've figured that out, Apple, Google, OpenAI, and other AI companies would like a word.
I'm aware that what's in Google's phones aren't capable of doing the on-device ML inference they claim. You might want to actually read what both I and the article are addressing in particular beyond the broad "generative AI" umbrella that you and other philistines new to the field are imagining aren't capable of being performed on device.
> But of course, on-device generative AI is really complex: 150 times more complex than the most complex model on Pixel 7 just a year ago. Tensor G3 is up for the task, with its efficient architecture co-designed with Google Research.
This is a direct quote from an official press release [0]. They claimed Tensor G3 is "up for the task" that is "on-device generative AI".
I'd say "if you can't do it, simply don't promise it", but the fact is, this is the third time Tensor has been outright incapable of what was promised. People pointing that out are more than justified.
Prior to the launch of the Pixel 6, with their first generation of Tensor SOC, they made big promises concerning HDR video performance [1], implying heavily or outright stating (depending on whom generous you want to be) that they'd finally manage to be on par with Apple. They weren't, by a lot. Pixel 6 video performance was neither on par with Apple nor did it exceed the Pixel 5 on an upper-mid SD765G. Still, first-gen and a bit of overhyping happen to the best of us.
During the Pixel 7 launch [2], they claimed Tensor G2 enabled users to finally get computational photography for high-quality videos. Spoiler alert: It didn't. Fool me once...
Now, on the Pixel 8 with their third generation of Tensor, they finally have a solution that gets their nighttime video processing results competitive with the current iPhone in the form of Video Boost. Instead of doing that processing on their amazing Tensor SOC though, they offload that to the cloud [3]. At least they didn't promise on-device processing improvements to video with the G3, only a tone of GenAI capabilities...
I have followed Tensor extensively, and I am happy to see that they are at least utilizing their control over the silicon to provide a longer update cycle. But few of their local processing promises have held water, and even fewer appear to be impossible on contemporary SOCs from competitors such as Qualcomm (who are by no means angles and need all the competition the market can provide).
If the Pixel team were more honest about their SOCs capabilities and proactively transparent on what they run locally vs off-load to datacenters, that'd be appreciated. With Video Boost they did just that, though I fear that was mainly because of the upload times...