Of course he does. Heck most of us in early stages of LLM did the same thing. Th...

tdb7893 · 2025-01-19T18:24:12 1737311052

They dropped the ball on cloud and need to catch up and now it's AI. It's kinda interesting how being ahead with data center infrastructure and also AI research didn't lead to them being ahead on those products

sitkack · 2025-01-19T18:33:10 1737311590

Google is a playground funded by Ads and Ads make so much damn money that nothing can compete, even internally. If I were an activist investor, I'd make ads its own company. I was the FTC, I'd make ads its own company.

ripped_britches · 2025-01-20T05:38:42 1737351522

And what are the other companies? Just GCP? Why separate those?

whiplash451 · 2025-01-19T18:37:25 1737311845

Ads fund Waymo.

lvl155 · 2025-01-19T18:30:45 1737311445

To be fair, they did have the lead as late as 2018. It’s just they treated it like it was their PhD thesis. Didn’t protect their IP at all and let all their talent leave.

xbmcuser · 2025-01-19T18:42:31 1737312151

In my opinion the Ai and absorbing all knowledge part of Google was Larry Page after his health scare his focus and priorities changed about actually living his life not Google. I think he had also realized what was happening with Google and so wanted Alphabet as an umbrella organisation but in the end he gave it up and let be run as a normal company.

llm_trw · 2025-01-19T18:21:27 1737310887

And the only reason they had the data is because they scanned every book ever for Google books.

pk-protect-ai · 2025-01-19T18:44:45 1737312285

and every e-mail, and every document in google docs, and every video on youtube ...

yencabulator · 2025-01-19T18:44:50 1737312290

How was the data Google already had access to any less protected by copyright?

The data Google had was book scans, search engine indexing of arbitrary 3rd party content, and private email and documents they hosted.

whiplash451 · 2025-01-19T18:36:37 1737311797

Google dropping the ball on AI… given their achievements on Waymo, Gemini and Gemma (just to name a few)… does not sound like a fair statement

bn-l · 2025-01-20T02:33:07 1737340387

Those models are absolutely garbage. Terrible code understanding. Ridiculous hallucinations.

thebytefairy · 2025-01-21T13:57:04 1737467824

Have you actually used them recently? Gemini is top of chatbot arena, and Gemma is one of the best open models at its size.

bn-l · 2025-01-26T02:16:10 1737857770

And that makes me extremely suspicious of that ranking. I use it at least a few times a week when I have a problem that’s unusual for me (to see it’s just terrible in my domain but not in others). It has a 9/10 fail rate.

It is the best at OCR though. Not many people are talking about that. It’s a very nice thing to know.

logifail · 2025-01-19T18:20:49 1737310849

Perhaps the more interesting question would be exactly how did they obtain their copy/copies of Libgen?

janice1999 · 2025-01-19T22:23:33 1737325413

It's hinted at in the article. If they torrented one large dataset, it's likely they did the same for Libgen.

> "I think torrenting from a corporate laptop doesn’t feel right,” wrote one engineer in April 2023, adding a smiley face emoji. (A later email acknowledged that the “SciMag” data had indeed been torrented.)

perfmode · 2025-01-19T18:23:06 1737310986

Are you asking for a way to obtain a copy?

logifail · 2025-01-19T18:28:16 1737311296

Nope, I have no need for any <whisper>further</whisper> copies.

I'm more interested in how a for-profit corp decides to obtain a copy for development of a commercial product, and how they execute that ... whether they still have the data, and whether legal know about it :)

It's exactly not the kind of thing you can say you "found on a USB stick lying around in the car park"...

qingcharles · 2025-01-19T19:25:40 1737314740

There's torrents of it. I remember one AI company saying somewhere they just grabbed the big 7z torrent of it for their training.

thfuran · 2025-01-19T18:32:29 1737311549

You should've seen the size of it. More of a USB baton really.

selimthegrim · 2025-01-19T18:35:23 1737311723

If Ryobi and DeWalt can make Bluetooth speakers, ASP can get into USB drives.

logifail · 2025-01-19T18:35:37 1737311737

> You should've seen the size of it. More of a USB baton really.

<glances at shelf with many, many external USB drives hooked up to a Pi 400>

Oh, really? :)