Hi Ben! I'll email you a repost invite for the Onfim article (https://news.ycombinator.com/item?id=43705174) - if you wait a week or so and then use it, the repost will go in the second-chance pool.
The reason for waiting is to give the hivemind cache time to clear. Normally we'd re-up the existing post, but we don't want two overly similar threads on the frontpage within a short time period.
Gwern and others who have dug into it this far might be interested by this footnote in the Crespo article: "I have tried to lay my hands on the original version of the conversation, as I am sure Simon did, too. I contacted Gabriel Zadunaisky, who, as the article explains, participated in the meeting. He is a professional translator. I asked him for the original version, and he replied on WhatsApp: 'Mr. Crespo: I am very sick. Unfortunately, I am unable to provide you with the information requested.' My hypothesis is that Zadunaisky translated the conversation directly from the recorded version and that this original version has been lost."
My read is that most likely, it was recorded on an old school reel-to-reel tape recorder. It's entirely possible that the tapes are still sitting on a shelf somewhere in Argentina, though the chances of actually tracking them down are pretty low. I worked with some reel-to-reel tapes that Alan Ginsberg made (now held at Stanford) in the mid-60s (including one where he is talking to Bob Dylan!) and they held up pretty well. Had to use audio editing software to remove tape hiss, but they were not as badly preserved as I expected.
That's unfortunate about Mr Zadunaisky, but it does suggest there's no hope of a recording unless someone should stumble across it in the Borges papers (although given how his estate has been abused, little hope of that being useful, one way or another) or the Argentine library archives...
I've been developing a more elaborate variation on the "chat with a pdf" idea for my own use as a researcher. It's mostly designed for a historian's workflow but it works pretty well for science and engineering papers too. Currently Flash 2.0 is the default but you can select other models to use to analyze pdfs and other text through various "lenses" ranging from a simple summary to text highlighting to extracting organized data as a .csv file:
(Note: this is not at all a production ready app, it's just something I've been making for myself, though I'm also now sharing it with my students to see how they use it. If anyone reads this and is interested in collaborating, let me know).
I regularly paste papers into LLM interfaces but they all spit out generic non-helpful answers. Your app is the only one i've seen that actually helps me understand.
I agree, I probably should've gone into more detail on the actual case studies and implications. I may write this up as a more academic article at some point so I have space to do that.
To your point about OCR: I think you'll find that the existing OCR tools will not know where to begin with the 18th century Mexican medical text in the second case study. If you can find one that is able to transcribe that lettering, please do let me know because it would be incredibly useful.
Speaking entirely for myself here, a pretty significant part of what professional historians do is to take a ton of photos of hard-to-read archival documents, then slowly puzzle them out after the fact - not by using any OCR tool (because none of them that I'm aware of are good enough to deal with difficult paleography) but the old fashioned way, by printing them out, finding individual letters or words that are readable, and then going from there. It's tedious work and it requires at least a few days of training to get the hang of.
For those looking for a specific example of an intermediate-difficulty level manuscript in English, that post shows a manuscript of the John Donne poem "A Triple Fool" which gives a sense of a typical 17th century paleography challenge that GPT-4o is able to transcribe (and which, as far as I know, OCR tools can't handle - though please correct me if I'm wrong). The "Sea surgeon" manuscript below it is what I would consider advanced-intermediate and is around the point where GPT-4o, and probably most PhD students in history, gets completely lost.
re: basically perfect, the errors I see are entirely typos which don't change the meaning (descritto instead of descritta, and the like). So yes, not perfect, but not anything which would impact a historical researcher. In terms of existing tools for translation, the state of the art that I was aware of before LLMs is Google Translate, and I think anyone who tries both on the same text can see which works better there.
re: "irrelevant books," there's really no way to make an objective statement about what's relevant and what's not until you actually read something rather than an AI summary. For that reason, in my own work, this is very much about augmenting rather than replacing human labor. The main work begins after this sort of LLM-augmented research. It isn't replaced by it in any way.
> To your point about OCR: I think you'll find that the existing OCR tools will not know where to begin with the 18th century Mexican medical text in the second case study. If you can find one that is able to transcribe that lettering, please do let me know because it would be incredibly useful.
My point about OCR is you haven't done any comparison and is now making the same mistake of claiming without any evidence. The most basic one from google translate does "know where to begin", it even doesn't make the "physical" mistake, though makes others. It also does know where to begin with the image from your second post, although it seems worse.
And it's not the state of the art, and I don't know what that is for spanish either, but again, that wasn't my point. You do not have a care-free option, to be able to understand that "physical" mistake you'd still need to read the source, which means you still need those days/weeks of training
> none of them that I'm aware of are good enough to deal with difficult paleography
And you haven't demonstrated anything re. difficult paleography for the LLMs in your article either!
> entirely typos which don't change the meaning
First, you'd need to actually demonstrate that, and that would require the full accounting which you haven't done (and no, I don't plan to do that either)
This could be a typo in a name or a year, which is bound to have some impact on a historical researcher? He'd try searching for a misspelled name and find nothing while there could've been an interesting connection in some other text?
>translation, the state of the art that I was aware of before LLMs is Google Translate, and I think anyone who tries both on the same text can see which works better there.
Yes, do try it, for example, in Deepl, to see that it's not any worse
> no way to make an objective statement about what's relevant and what's not until you actually read something rather than an AI summary
Sure, but presumably you've done that before making the claim of relevance "on reflection"? So how is it relevant to demand this "human labor" of the students?
Such a confusing comment, because when I enter the text from case study #1 into Deepl, it's very clearly much worse than what Claude or GPT4o can come up with (the first few lines from Deepl are: "With his expositions to all the Tables, particularly of the quality of the countries, et of the most notable things, to be found in them. Which Tables, can be, and t are taught to reduce' together" and so on).
Likewise with using Google translate on both case studies #1 and #2 - the results are self-evidently far worse. In both cases there were multiple errors in each line and in case study #2 it was entirely unable to transcribe or translate the title line. If you see this, please email me at bebreen [at] ucsc dot edu to share the better results you are seeing because I genuinely am interested and open to using alternative tools - I just am not seeing what you are seeing, apparently.
In terms of typos not changing the meaning, yes naturally a real human needs to double check absolutely everything if it's being used in research. We agree on that - the point is simply that this significantly speeds up the initial research process, not that it replaces the expertise necessary to, for instance, double check that a name or year is transcribed correctly. A huge amount of historical research is simply about skimming through documents looking for relevent info to zero in on - this is where LLMs can really help.
Thank you! Have been a big fan of your writing on LLMs over the past couple years. One thing I have been encouraged by over this period is that there are some interesting interdisciplinary conversations starting to happen. Ethan Mollick has been doing a good job as a bridge between people working in different academic fields, IMO.
Author here, I agree that my read may not be correct either. It’s tough to make out. Although keep in mind that “ph” is used in Latin and Greek (or at least transliterations of Greek into the Roman alphabet) so in an early modern medical context (I.e. one in which it is assumed the reader knows Latin, regardless of the language being used) “ph” is still a plausible start to a word. Early modern spelling in general is famously variable - common to see an author spell the same word two different ways in the same text.
and it found 33 hits for "phisica" and 99 for "phisico", mostly from the 1490s. Now some of these can be deceptive, like a few are from a bilingual Spanish-Latin book and occur in the Latin portions rather than the Spanish portions, but it seems like some authors in the 1400s wrote "ph" in some Spanish words, at least when they knew the Latin or Greek etymologies.
I don't know when the Iberian languages first got their more phonetic orthographies, especially suppressing that h (that was originally in Latin digraphs used to transliterate Greek letters θ, φ, χ).
Edit: There are also about two dozen hits for physico/physica, interestingly more from the 1700s rather than 1400s.
> but it seems like some authors in the 1400s wrote "ph" in some Spanish words, at least when they knew the Latin or Greek etymologies.
You know, that might be analogous to Spanish speakers familiar with English writing "tweet" in Spanish text, while being ignorant that RAE added "tuit"[1] to the language, which is more in-line with general language rules. IDK if any Spanish speaker has ever written "tuit" in real life.
Author here, I had the same question and looked into it. The author of that comment seems to be onto something because hygrine is indeed found in nightshades as well as in coca. Interesting.
Where are you getting that hygrine is found in nightshades? The authors of the paper specifically say it's only found in Erythroxylum and I'm not finding any references in my own research to that specific chemical being present in nightshades.
I'm no chemist, but according to wikipedia, cuscohygrine is found in belladona plants and it metabolizes into hygrine. So that could be what he's referring to?
I read the same wikipedia page and that statement is confusing if not incorrect. Hygrine is a not a metabolite of cuscohygrine, it's in fact the other way round: hygrine is the precursor and cuscohygrine is the metabolite.
The first reference on that page is "The role of hygrine in the biosynthesis of cuscohygrine and hyoscyamine"
No, it does not appear to be true that hygrine is found in nightshades.
Cuscohygrine does occur in those plants yet it's precursor hygrine does not. How it then gets there without us being able to detect hygrine could be because it only occurs in very small concentrations or is produced and then quickly and wholly converted in to its cusco metabolite, or that it's produced through a different biosynthetic pathway.
Same. I emailed him about whether he'd ever met Margaret Mead, John C. Lilly, or Gregory Bateson in the 1960s while researching my book. I got this reply within hours:
"Afraid I never met any of those you mention, though I’ve followed their work for many years.
I’ve never been close to intellectual elite circles, including people I very much admire."
The time stamp for my email was Tuesday, Nov 26, 2019 at 2:19 PM. It was answered by chomsky@mit.edu at Tuesday, Nov 26, 2019 at 9:29 PM. Pretty remarkable.
Probably precludes him from accepting the invitations.
You cant be anti-imperialist and accept a dinner invitation where you will inevitably be forced to smile at and rub elbows with the same men you critique as war-criminals. The man is principled.
Where would you have him live? In an apartheid nation that rejects him so thoroughly that he might get assassinated by some right-wing hardliners for speaking out against the oppression?
Seems that by your logic one who disagrees with an evil he sees in his society should leave immediately or commit to silence.
Author here - I love Dune and mostly just used the phrase for the title. But I did write this about Herbert and the drug/spice trade which might be of interest: https://lareviewofbooks.org/article/pharmacology/