More

benbreen · 2025-04-16T17:38:26 1744825106

Author of the original Appendix article here (the one about Darwin's kids) - I think it got on HN today because I linked to while discussing Onfim here: https://resobscura.substack.com/p/onfims-world-medieval-chil...

dang · 2025-04-16T21:33:54 1744839234

Hi Ben! I'll email you a repost invite for the Onfim article (https://news.ycombinator.com/item?id=43705174) - if you wait a week or so and then use it, the repost will go in the second-chance pool.

The reason for waiting is to give the hivemind cache time to clear. Normally we'd re-up the existing post, but we don't want two overly similar threads on the frontpage within a short time period.

srean · 2025-04-17T19:11:40 1744917100

That's one of the most endearing article I have read in a long time. Thanks for the joy.

dr_dshiv · 2025-04-16T22:15:18 1744841718

2014! Amazing.

benbreen · 2025-04-02T22:30:50 1743633050

Gwern and others who have dug into it this far might be interested by this footnote in the Crespo article: "I have tried to lay my hands on the original version of the conversation, as I am sure Simon did, too. I contacted Gabriel Zadunaisky, who, as the article explains, participated in the meeting. He is a professional translator. I asked him for the original version, and he replied on WhatsApp: 'Mr. Crespo: I am very sick. Unfortunately, I am unable to provide you with the information requested.' My hypothesis is that Zadunaisky translated the conversation directly from the recorded version and that this original version has been lost."

My read is that most likely, it was recorded on an old school reel-to-reel tape recorder. It's entirely possible that the tapes are still sitting on a shelf somewhere in Argentina, though the chances of actually tracking them down are pretty low. I worked with some reel-to-reel tapes that Alan Ginsberg made (now held at Stanford) in the mid-60s (including one where he is talking to Bob Dylan!) and they held up pretty well. Had to use audio editing software to remove tape hiss, but they were not as badly preserved as I expected.

gwern · 2025-04-03T13:49:29 1743688169

That's unfortunate about Mr Zadunaisky, but it does suggest there's no hope of a recording unless someone should stumble across it in the Borges papers (although given how his estate has been abused, little hope of that being useful, one way or another) or the Argentine library archives...

Anyway, I uploaded the Simon book chapter at https://gwern.net/doc/borges/1996-simon-2.pdf

benbreen · 2025-04-03T17:58:27 1743703107

That's awesome, thank you for uploading.

benbreen · 2025-04-02T21:05:49 1743627949

I've been developing a more elaborate variation on the "chat with a pdf" idea for my own use as a researcher. It's mostly designed for a historian's workflow but it works pretty well for science and engineering papers too. Currently Flash 2.0 is the default but you can select other models to use to analyze pdfs and other text through various "lenses" ranging from a simple summary to text highlighting to extracting organized data as a .csv file:

https://source-lens.vercel.app

(Note: this is not at all a production ready app, it's just something I've been making for myself, though I'm also now sharing it with my students to see how they use it. If anyone reads this and is interested in collaborating, let me know).

boodleboodle · 2025-04-03T02:36:15 1743647775

Wow I just tried this and it's great!

I regularly paste papers into LLM interfaces but they all spit out generic non-helpful answers. Your app is the only one i've seen that actually helps me understand.

I am using Gemini 2.0 pro

harshPaliwal · 2025-04-03T07:55:46 1743666946

awesome tool ben, tried talking to author, although instead of answering it keep referencing me to read paper. is that the intent?

benbreen · 2025-01-27T17:25:44 1737998744

I agree, I probably should've gone into more detail on the actual case studies and implications. I may write this up as a more academic article at some point so I have space to do that.

To your point about OCR: I think you'll find that the existing OCR tools will not know where to begin with the 18th century Mexican medical text in the second case study. If you can find one that is able to transcribe that lettering, please do let me know because it would be incredibly useful.

Speaking entirely for myself here, a pretty significant part of what professional historians do is to take a ton of photos of hard-to-read archival documents, then slowly puzzle them out after the fact - not by using any OCR tool (because none of them that I'm aware of are good enough to deal with difficult paleography) but the old fashioned way, by printing them out, finding individual letters or words that are readable, and then going from there. It's tedious work and it requires at least a few days of training to get the hang of.

If anyone wants to get a sense of what this paleography actually looks like, this is something I wrote about back in 2013 when I was in grad school - https://resobscura.blogspot.com/2013/07/why-does-s-look-like...

For those looking for a specific example of an intermediate-difficulty level manuscript in English, that post shows a manuscript of the John Donne poem "A Triple Fool" which gives a sense of a typical 17th century paleography challenge that GPT-4o is able to transcribe (and which, as far as I know, OCR tools can't handle - though please correct me if I'm wrong). The "Sea surgeon" manuscript below it is what I would consider advanced-intermediate and is around the point where GPT-4o, and probably most PhD students in history, gets completely lost.

re: basically perfect, the errors I see are entirely typos which don't change the meaning (descritto instead of descritta, and the like). So yes, not perfect, but not anything which would impact a historical researcher. In terms of existing tools for translation, the state of the art that I was aware of before LLMs is Google Translate, and I think anyone who tries both on the same text can see which works better there.

re: "irrelevant books," there's really no way to make an objective statement about what's relevant and what's not until you actually read something rather than an AI summary. For that reason, in my own work, this is very much about augmenting rather than replacing human labor. The main work begins after this sort of LLM-augmented research. It isn't replaced by it in any way.

eviks · 2025-01-28T16:20:04 1738081204

> To your point about OCR: I think you'll find that the existing OCR tools will not know where to begin with the 18th century Mexican medical text in the second case study. If you can find one that is able to transcribe that lettering, please do let me know because it would be incredibly useful.

My point about OCR is you haven't done any comparison and is now making the same mistake of claiming without any evidence. The most basic one from google translate does "know where to begin", it even doesn't make the "physical" mistake, though makes others. It also does know where to begin with the image from your second post, although it seems worse. And it's not the state of the art, and I don't know what that is for spanish either, but again, that wasn't my point. You do not have a care-free option, to be able to understand that "physical" mistake you'd still need to read the source, which means you still need those days/weeks of training

> none of them that I'm aware of are good enough to deal with difficult paleography

And you haven't demonstrated anything re. difficult paleography for the LLMs in your article either!

> entirely typos which don't change the meaning

First, you'd need to actually demonstrate that, and that would require the full accounting which you haven't done (and no, I don't plan to do that either) This could be a typo in a name or a year, which is bound to have some impact on a historical researcher? He'd try searching for a misspelled name and find nothing while there could've been an interesting connection in some other text?

>translation, the state of the art that I was aware of before LLMs is Google Translate, and I think anyone who tries both on the same text can see which works better there.

Yes, do try it, for example, in Deepl, to see that it's not any worse

> no way to make an objective statement about what's relevant and what's not until you actually read something rather than an AI summary

Sure, but presumably you've done that before making the claim of relevance "on reflection"? So how is it relevant to demand this "human labor" of the students?

benbreen · 2025-01-29T20:26:06 1738182366

Such a confusing comment, because when I enter the text from case study #1 into Deepl, it's very clearly much worse than what Claude or GPT4o can come up with (the first few lines from Deepl are: "With his expositions to all the Tables, particularly of the quality of the countries, et of the most notable things, to be found in them. Which Tables, can be, and t are taught to reduce' together" and so on).

Likewise with using Google translate on both case studies #1 and #2 - the results are self-evidently far worse. In both cases there were multiple errors in each line and in case study #2 it was entirely unable to transcribe or translate the title line. If you see this, please email me at bebreen [at] ucsc dot edu to share the better results you are seeing because I genuinely am interested and open to using alternative tools - I just am not seeing what you are seeing, apparently.

In terms of typos not changing the meaning, yes naturally a real human needs to double check absolutely everything if it's being used in research. We agree on that - the point is simply that this significantly speeds up the initial research process, not that it replaces the expertise necessary to, for instance, double check that a name or year is transcribed correctly. A huge amount of historical research is simply about skimming through documents looking for relevent info to zero in on - this is where LLMs can really help.

benbreen · 2025-01-26T23:50:12 1737935412

Thank you! Have been a big fan of your writing on LLMs over the past couple years. One thing I have been encouraged by over this period is that there are some interesting interdisciplinary conversations starting to happen. Ethan Mollick has been doing a good job as a bridge between people working in different academic fields, IMO.

benbreen · 2025-01-26T23:46:49 1737935209

Author here, I agree that my read may not be correct either. It’s tough to make out. Although keep in mind that “ph” is used in Latin and Greek (or at least transliterations of Greek into the Roman alphabet) so in an early modern medical context (I.e. one in which it is assumed the reader knows Latin, regardless of the language being used) “ph” is still a plausible start to a word. Early modern spelling in general is famously variable - common to see an author spell the same word two different ways in the same text.

jolmg · 2025-01-26T23:50:32 1737935432

> So, if you try to make sense of it in such a way that you assume a nonsense word is you misreading

> I agree that my read may not be correct either

Just in case, by "you", I meant from the POV of the AI, not you the author.

That's interesting to know about "ph". I didn't know it was present in Latin, and I wonder if that's also the case with Spanish.

schoen · 2025-01-27T00:01:31 1737936091

I just looked in the Corpus Diacrónico del Español

https://corpus.rae.es/cordenet.html

and it found 33 hits for "phisica" and 99 for "phisico", mostly from the 1490s. Now some of these can be deceptive, like a few are from a bilingual Spanish-Latin book and occur in the Latin portions rather than the Spanish portions, but it seems like some authors in the 1400s wrote "ph" in some Spanish words, at least when they knew the Latin or Greek etymologies.

I don't know when the Iberian languages first got their more phonetic orthographies, especially suppressing that h (that was originally in Latin digraphs used to transliterate Greek letters θ, φ, χ).

Edit: There are also about two dozen hits for physico/physica, interestingly more from the 1700s rather than 1400s.

jolmg · 2025-01-27T00:08:06 1737936486

> but it seems like some authors in the 1400s wrote "ph" in some Spanish words, at least when they knew the Latin or Greek etymologies.

You know, that might be analogous to Spanish speakers familiar with English writing "tweet" in Spanish text, while being ignorant that RAE added "tuit"[1] to the language, which is more in-line with general language rules. IDK if any Spanish speaker has ever written "tuit" in real life.

[1] https://dle.rae.es/tuit?m=form

benbreen · 2024-10-09T22:10:24 1728511824

Author here, I had the same question and looked into it. The author of that comment seems to be onto something because hygrine is indeed found in nightshades as well as in coca. Interesting.

lolinder · 2024-10-09T23:25:30 1728516330

Where are you getting that hygrine is found in nightshades? The authors of the paper specifically say it's only found in Erythroxylum and I'm not finding any references in my own research to that specific chemical being present in nightshades.

DisgracePlacard · 2024-10-10T05:02:58 1728536578

I'm no chemist, but according to wikipedia, cuscohygrine is found in belladona plants and it metabolizes into hygrine. So that could be what he's referring to?

stef25 · 2024-10-10T17:31:47 1728581507

I read the same wikipedia page and that statement is confusing if not incorrect. Hygrine is a not a metabolite of cuscohygrine, it's in fact the other way round: hygrine is the precursor and cuscohygrine is the metabolite.

The first reference on that page is "The role of hygrine in the biosynthesis of cuscohygrine and hyoscyamine"

stef25 · 2024-10-10T17:29:45 1728581385

No, it does not appear to be true that hygrine is found in nightshades.

Cuscohygrine does occur in those plants yet it's precursor hygrine does not. How it then gets there without us being able to detect hygrine could be because it only occurs in very small concentrations or is produced and then quickly and wholly converted in to its cusco metabolite, or that it's produced through a different biosynthetic pathway.

benbreen · 2024-06-16T16:58:12 1718557092

Reading this now and really enjoying it. Here's a brief review from Stewart Brand: https://twitter.com/stewartbrand/status/1800941614287946003

neonate · 2024-06-16T17:35:27 1718559327

https://threadreaderapp.com/thread/1800941614287946003.html

benbreen · 2024-06-11T04:36:28 1718080588

Same. I emailed him about whether he'd ever met Margaret Mead, John C. Lilly, or Gregory Bateson in the 1960s while researching my book. I got this reply within hours:

"Afraid I never met any of those you mention, though I’ve followed their work for many years.

I’ve never been close to intellectual elite circles, including people I very much admire."

The time stamp for my email was Tuesday, Nov 26, 2019 at 2:19 PM. It was answered by chomsky@mit.edu at Tuesday, Nov 26, 2019 at 9:29 PM. Pretty remarkable.

bn-l · 2024-06-11T04:54:58 1718081698

> I’ve never been close to intellectual elite circles

That is very humble

exe34 · 2024-06-11T05:03:23 1718082203

is this sarcasm? I read it as "I'm not welcome to parties because I don't toe the line"

Scarblac · 2024-06-11T07:05:43 1718089543

I don't understand how you can get any information from that line about the reasons why he wasn't close to them.

bn-l · 2024-06-11T06:17:19 1718086639

No, no sarcasm and that's how I read it too.

vasco · 2024-06-11T07:01:15 1718089275

So how is it humble? He is just saying he didn't hobnob with known intellectuals - isn't that just a statement of fact?

dotancohen · 2024-06-11T07:04:41 1718089481

Chomsky has very controversial opinions on some subjects and I suspect that precludes him being invited to many parties.

DSingularity · 2024-06-11T21:17:18 1718140638

Probably precludes him from accepting the invitations.

You cant be anti-imperialist and accept a dinner invitation where you will inevitably be forced to smile at and rub elbows with the same men you critique as war-criminals. The man is principled.

I hope he recovers. He would be sorely missed.

dotancohen · 2024-06-12T15:32:18 1718206338

Why not? He's supposedly anti-imperislist yet had no problem living in their society.

DSingularity · 2024-06-14T15:59:41 1718380781

Where would you have him live? In an apartheid nation that rejects him so thoroughly that he might get assassinated by some right-wing hardliners for speaking out against the oppression?

Seems that by your logic one who disagrees with an evil he sees in his society should leave immediately or commit to silence.

exe34 · 2024-06-12T07:58:36 1718179116

> you will inevitably be forced to smile at and rub elbows with the same men you critique as war-criminals

I'm autistic, so I would smile and tell them exactly what I think of them. but fortunately I don't get invited to parties.

exe34 · 2024-06-11T18:48:55 1718131735

i think it's the opposite of humble, it's saying that he knows something that these intellectuals don't.

benbreen · on Feb 9, 2024

Author here - I love Dune and mostly just used the phrase for the title. But I did write this about Herbert and the drug/spice trade which might be of interest: https://lareviewofbooks.org/article/pharmacology/