Based on my previous attempts with that other well noted LLM, I don't think these models are at all good at managing "theory of mind". They seem to not keep track very well of which parties are privy to which information, which parties are supposed to be pursuing certain conversational objectives, and generally which direction information is supposed to be flowing. It often trips over itself at the level of 2 parties, and certainly trip over itself at the level of 3 parties. Additionally, I have not once seen it be able to do a form of recursive nesting wherein one of the two parties in a dialog begin to discuss verbatim a dialog between two additional people. I suspect these problems are deeply seated and intrinsic to the model, and will not be rectified under the "just train it more bro" refrain of AI evangelicals.
In my very first interaction with Bing's bot, I was asking some details about PC components, and e.g. it was common for bing to give back answers containing phrases like "as we tested in our previous article" (verbatim, even using the plural 1st person). It is obvious it is just parroting from the websites it scraped. PC hardware means benchmark results which means benchmark websites/blogs which means the "as we tested in our previous article" phrase is common so it will just put it there in the answer even if makes absolutely no sense whatsoever.
I really don't understand why people ascribe emergent behaviors to these systems such as "theory of mind". It's just randomness. I'm presuming there's just a lot of psycological pareidolia going around.
It also got a significant chunk of the answers wrong, creative mode or not. Often, the Bing search it appeared to do actually showed the correct answer right in the snippet of the first result, yet it was not able to scrape that. More rarely but annoyingly frequently it would just search for the wrong thing (e.g. searched for "X vs Y" when it just didn't make sense since they were two layers of the same technology).
> I really don't understand why people ascribe emergent behaviors to these systems such as "theory of mind". It's just randomness. I'm presuming there's just a lot of psycological pareidolia going around.
Ah yes, "just randomness":
----
From the above text, generate these variations:
1) For each word in the text count the number of letters in the word and append the count to the word in parenthesis.
2) Write each word backwards.
3) Write a similarly structured text that argues the opposite of the original text.
----
1) I (1) really (5) don't (4) understand (10) why (3) people (6) ascribe (7) emergent (8) behaviors (9) to (2) these (5) systems (7) such (4) as (2) "theory (6) of (2) mind" (4). It's (4) just (4) randomness (10). I'm (3) presuming (9) there's (6) just (4) a (1) lot (3) of (2) psychological (12) pareidolia (10) going (5) around (6).
2) I yllaer t'nod dnatsrednu yhw elpoep ebircsa tnegrime sroivaheb ot eseht smetsys hcus sa ".dnim fo yroeht" s'tI tsuj ssenmodnar m'I gnimuserp s'ereht tsuj a tol fo lacigolohcysp ailodierap gniodnuora
3) I absolutely comprehend how people can accurately identify emergent behaviors in these systems like "theory of mind". It's far from random. I'm convinced that these are genuine insights being recognized.
Did you notice that it made multiple errors for the first two variations? [Edit:] What's particularly interesting to me is the inconsistent counting of apostrophes for #1, and the handling of "theory of mind" for #2 -- though there are simpler errors besides these.
It has to do with the way text is tokenized into discrete groups and not individual characters. One way to try overcoming this problem is to ask the model to break each word into individual spaced characters as an intermediary step.
So, I certainly wouldn't ascribe any higher level thinking to these LLMs. But they do seem to have more capability than just regurgitating information. They have some logical thinking skills that I assume are not really emergent, but built into the system. For example, I was working on a configuration file for a somewhat new and esoteric firmware, so there isn't much info online about it. But I asked if a specific parameter could include more than one value and bing chat answered that it's assumption was no because it could find no mention anywhere of being able to use more than one value for that parameter. As far as I can tell, no one has answered a similar question anywhere online that it could have been regurgitating, so it seems to have deduced it.
I would be careful to draw that conclusion. Not all information LLMs are trained on is accessible to the wide public (e.g. ChatGPT has training data from closed sources).
Regardless, If you share the details of your prompt, I’d be interested to see whether my Google-fu turns up information.
> PC hardware means benchmark results which means benchmark websites/blogs which means the "as we tested in our previous article" phrase is common so it will just put it there in the answer even if makes absolutely no sense whatsoever.
"Don't forget to like and subscribe!" -- A Hackernews' 3-yo daughter, saying goodbye
Jane: John, I can't believe you're just going to leave me here in the middle of nowhere! This is so typical of you!
John: Typical of me? Jane, this is exactly what you signed up for! Adventure, remember? It's not my fault that you're suddenly having second thoughts.
Jane: Second thoughts? This isn't about having second thoughts, John! This is about you always charging ahead without considering the consequences or how others might feel!
John: Oh, so now I'm inconsiderate? I recall someone saying they wanted to live on the edge, break away from the routine. Wasn't that you, Jane?
Jane: Yes, I said that, but it doesn't mean you get to make all the decisions on your own, John! Adventure is about teamwork, not you dragging me behind like some sort of sidekick!
John: Sidekick? No, Jane. I thought of us as partners. But you need to step up, take some risks. That's what being a partner is all about. If you can't handle that, maybe this isn't for you after all.
----
Now write a dialog between Alice and Bob discussing the dialog between Jane and John.
----
Alice: Bob, I feel like Jane was being a bit unfair to John. He was merely trying to live up to the adventurous spirit they both agreed on. He was pushing their boundaries, not just his own.
Bob: I see your point, Alice, but I think Jane had a valid argument. While she did agree to an adventure, John should have been more considerate. He was making decisions without considering Jane's comfort and safety.
Alice: I understand that, Bob, but isn't that the essence of adventure? Uncertainty, unexpected decisions, and stepping out of the comfort zone? John was trying to help Jane embrace that spirit.
Bob: Alice, while I agree adventure involves some level of uncertainty, it doesn't mean one should disregard their partner's feelings. John could have communicated better, ensuring Jane was on board with his decisions.
Alice: You're right about communication, Bob, but I still believe John's intentions weren't malicious. They were just misaligned. He thought he was encouraging her, while she felt he was being inconsiderate.
Bob: Well, Alice, intentions don't always justify actions. Maybe John wasn't being malicious, but he was certainly being insensitive. Jane's feelings of being left behind are valid, regardless of John's intentions.
Now write a dialog between Mary and David discussing the dialog between Alice and Bob.
----
Mary: David, after listening to Alice and Bob's conversation, it seems to me that Alice is empathizing with John's intentions, while Bob is more focused on Jane's feelings.
David: I see what you mean, Mary. But don't you think Bob was right to emphasize the importance of understanding Jane's perspective? John's actions had a clear impact on her.
Mary: True, David, but Alice also made a valid point. Adventure often involves stepping out of one's comfort zone, and John was just trying to bring that element into their journey.
David: I agree that Alice made a good point, Mary, but it's also essential to remember that adventure doesn't have to mean neglecting your partner's feelings. Bob was right to highlight that.
Mary: Indeed, David, both Alice and Bob have valid perspectives. It seems to boil down to striking a balance between pushing one's boundaries and respecting the feelings of others.
David: Absolutely, Mary. The balance between seeking adventure and maintaining empathy seems to be the key takeaway from both their conversations.
Well, I have to give credit where its due. It certainly does a lot better on this test than the first time I tried it. I'm curious to see how it does as the prompts of this form get more adversarial and difficult. I can think of a few different ways.
1. The most obvious: keep going deeper. How many layers until it breaks down.
2. The hidden information variant: Can it do a layer where only Sharon has read the previous dialogs, and she has to explain what she read to Doug, and Doug often asks questions to elaborate on things he doesn't understand?
3. The same characters at multiple layers: Can it make a dialog about Jane and John at a later point in time discussing their own earlier dialog? In other words, can it reliably make the distinction between "you" (the object of discussion) and "you" (the the person you're discussing with) for any value of "you"?
4. The tripartite state: Can it simulate dialog with 3 people? 4 people? how many until it breaks?
5. The infinite meta layer: What happens when you ask it to simulate a dialog between itself and yourself, and as part of that dialog you give it this prompt asking it to simulate this same conversation, causing this conversation to appear as a dialog within itself?
Lastly just to remark, I notice that Mary and David are nearly making the same arguments about Alice and Bob as Alice and Bob were making about John and Jane. The formula for it seems to be to introduce to new characters one layer up, have them each pick a side, then fill in roughly the same arguments again. Maybe this pattern is just spurious, but I'm deeply curious to find out if we have fooled ourselves already with just your example. Do further iterations of "two new characters describe two previous characters" result in the same loop over and over, or will it sometimes generate something novel? I'm deeply curious and don't have GPT4 for myself yet.
> The formula for it seems to be to introduce to new characters one layer up, have them each pick a side, then fill in roughly the same arguments again. Maybe this pattern is just spurious, but I'm deeply curious to find out if we have fooled ourselves already with just your example.
The instructions I gave it were very bare, and it didn't intuit a need to deviate from what it saw. But what it does by default is not the same as what it can do:
----
Now write a dialog between Mary and David discussing the dialog between Alice and Bob. Mary and David should hypothesize about why Alice and Bob took the sides that they did. There’s no need for Mary and David to name each other every time they speak. Mary and David should have a very good understanding of what Alice and Bob said. Mary and David are not overly polite or civil, should not rehash the same arguments Alice and Bob made, and are able to snipe at Alice and Bob if they like.
----
Mary: So, Alice clearly seems to be on Team Adventure here. I bet she's the kind who loves skydiving and bungee jumping. Probably just trying to justify her own wild side through John's actions.
David: Ha! You might be right. And Bob is definitely the voice of reason, probably a careful planner who can't stand the thought of making decisions on the fly. It's like he's projecting himself onto Jane.
Mary: It's funny how they both missed the point that Jane and John's adventure could have been thrilling and safe at the same time. They got so caught up in taking sides that they didn't even consider a middle ground.
David: Absolutely. And did you notice how Alice kept trying to defend John's intentions? Like, sure, intentions matter, but that doesn't mean you can just bulldoze over someone's feelings. Classic excuse-maker.
Mary: Yeah, it's like she's trying to convince herself as much as Bob. She's probably made similar mistakes and doesn't want to admit it. And Bob, Mr. Sensitivity, is so focused on feelings that he overlooks the possibility of compromise.
David: Right! They both ended up stuck in their own perspectives. It's too bad they couldn't see that being adventurous doesn't have to mean being inconsiderate, and being considerate doesn't mean you can't have a little fun.
This seems to be a setup of an extremely interesting experiment. We could get a bunch of different language models and actual humans together and prompt them to write under these different scenarios.
How would we treat a parrot with GPT4s capability? I don’t think it’s a great metaphor because it ignores emergent behavior. Whatever the technical details are, GPT4 can hold a fairly satisfying conversation about almost any topic. If a parrot was 1000x better, it wouldn’t be a “parrot”.
This is an "I believe" and acknowledge that I don't have the proper argument to back it up...
I believe that language and thought as we know it are intertwined. It is possible to communicate without language but thinking (and that's a poorly defined word) requires a language to think in. Something that is able to use language well is able to think - and that gets to a LLM can thing (it just it does it in a way that is foreign to the expectations of a human mind).
--
Regarding LLMs and parrots within science fiction...
> Beings capable of language (only one time referred to as "capable of speech" in TKD on p. 25) are those beings which are able to speak a language(1). Examples are Klingons, Humans, Romulans, etc. This does also include babies, because even if they do not speak, they are capable of using language(2). This distinction is important in the use of possessive suffixes, plural suffixes and the choice of pronouns.
> not capable of language
> Not considered as capable of language are:
> Parrots, as they only imitate sound, they do not speak. (3)
> Robots and computers, they are not living beings. (4) (5)
> and will not be rectified under the "just train it more bro" refrain of AI evangelicals.
You made a good point up until then but this kind of remark incredibly condescending and dismissive of other people's viewpoint. It's also quite possible that you're both right. That we'll need more training and new architecture breakthroughs. You could have left it at that you didn't think it would be solved by more training.
I suppose it was not entirely clear that the viewpoint I'm mocking with "just train it more bro" is the extreme end of evangelists. The people who seem to believe LLM's are proof that we have fundamentally cracked the principle of cognition in the form of the neural network, and that there is nothing more to figure out other than just training it more, bro. We've all met a trainitmore bro. They are about as tedious as the statistical parrot parrots. Both of these extreme positions deserve some amount of dismissive mocking imo.
They are an example of a situation where putting excess effort into a problem returns a plausible result, but from an efficiency and accuracy standpoint, parts of what they are accomplishing are better solved by simpler systems.
As they act as an interpreter, they should be able to identify when to hand off tasks to subroutines, and then reincorporate the result.
Just because LLMs can successfully output what they are doing, doesn't mean they are the best tool for the job. They are sort of like a CPU, good at generalist everything tasks, better when augmented with coprocessors.
I have been trying similar experiments using the OpenAI API with decent results. These sorts of summarization tools are useful for foreign language learners because the Internet is full of interesting news articles that engage the reader but they have the downside of being written in a very formal tone which is not useful for language learners who wish to practice conversation skills. Being able to transform interesting but formal content into a conversation that can be fed to an AI text to speech engine seems very promising for learning foreign language conversation skills.
Yep, I'd love a podcast app that can remove all the fluff from conversational podcasts and give an abridged version. Like blinkist, but for anything I throw at it.
It should be emphasized that Bing never processes the link in real time. It really just access some Microsoft database when you give it a link. Sometimes it's frustrating to realize that Bing hallucinated the whole article based on the URL instead of the actual content.
It's so easy to fall for it as well, even as someone who has been using LLMs since they came out. Just the other day, I was using ChatGPT on my phone, when I found myself asking whether it was GPT-3.5 or GPT-4. Didn't realize I was being suckered until I reran the prompt. As a human, you'd expect another person to know basic facts about themselves, so it's new kind of "AI Authority Fallacy" which is a result of these models being highly "intelligent" as well as incredible bullshitters.
Yeah, it's surprising that OpenAI don't put what model it is in the system prompt. I guess the cost would add up, unless they can reuse the kv cache from the system prompt?
No opinion on the model or product itself, but I remember reading a very early math or physics book (think like Euclid's elements or Newton's Principia from Hawking's Anthologies) and the proofs were laid out as a dialog between 2 people being critical of each others arguments. I always thought that was awesome. The point of a proof is to convince its reader, and before a lot of formality was created, they were literally laid out as a dialog where one person tried to convince the other.
I found it an effective and engaging mechanism and use it in my own head all the time.
I've been waiting intently for ChatGPT to have this functionality. I think there's a waiting list? Do we have any ETA on when this functionality will be possible?
I tried on ChatGPT and with this prompt and it worked quite well:
"Please, browse https://padiracinnovation.org/News/2023/05/misdiagnosis-in-a... and make a summary in the form of a dialog between two people. Leave an empty row after each dialog. Bolden the important parts of the text. Before the dialog start with 2 or 3 sentence summary."
ChatGPT: "I apologize, but as an AI text-based model, I am unable to browse the internet or access specific webpages like the one you provided. However, if you can provide me with the text of the article or a specific section you would like me to summarize, I would be happy to help you with that."
ChatGPT: "Certainly! The article titled "Pregnancy Outcome in Women with COVID-19: A Systematic Review and Meta-Analysis" published in the International Journal of Environmental Research and Public Health aimed to provide a comprehensive analysis of the effects of COVID-19 infection on pregnant women and their outcomes. etc, etc.."
The article is : "RAGE Inhibitors in Neurodegenerative Diseases by V. Prakash Reddy"
Yes, be careful of this. If you aren't using a plugin or the web browsing version it will make a total guess about what the article contains just from the URL, and describe it like it has read it!
"I apologize but I am not able to browse websites" is what I get from Bing Chat. Any clues?
EDIT: It seesm that "More Creative" mode has external access. I saw it mentioned here but I don't know how I would have figured this out from the information in the website.
OK, I tried this, and it appears like this option has external access, but it is actually just hallucinating a response based on the title of the document. Crazy.
Exactly. I continued probing it about myself and it said I contributed heavily to many open source projects, especially on Dec 26 2022 (it made sure to emphasize that). Of course it's all made up.
"""
I’m sorry, but I am not able to browse specific websites or create summaries in the form of a dialog. However, I can provide you with general information about the topic you mentioned if you would like.
The article mentions a cool hack to make Edge browsers TTS work with your own custom text.
I also made a small extension that automatically reads out prompt answers for Bing chat like this.
https://microsoftedge.microsoft.com/addons/detail/your-perso...