Giving AI access to OS APIs is trivial for Apple and Google. The problem has bee...

troupo · on April 12, 2024

First, it's not as trivial to give access to OS APIs as you think.

Second, giving access to APIs... gives you access to APIs, and that's it. Even the most advanced LLMs are dumb as shit when it comes to anything non-trivial in programming. What makes you think that a modern LLM will manage to translate "move tab X to a different window" into a dozen or more API calls with proper structures filled in, and in the correct order?

aurareturn · on April 12, 2024

It's trivial. For example, most macOS apps can already get these permissions. Apple can grant special permissions for Siri quite easily.

I don't sense that GPT4 is "dumb as shit". I sense that it's extremely capable and very close to changing everything if, for example, macOS completely integrates GPT4.

troupo · on April 12, 2024

> It's trivial. For example, most macOS apps can already get these permissions. Apple can grant special permissions for Siri quite easily.

Of course it's not trivial if you think for more than a second about it.

LLMs produce output in exactly three ways:

- text

- images

- video

What you think is trivial is to convert that output into an arbitrary function call for any arbitrary OS-level API. And that is _before_ we start thinking about things like "what do we do with incomplete LLM output" and "what to do when LLM hallucinates".

You can literally try and implement this yourself, today, to see how trivial it is. You already have access to tens of thousands OS APIs, so you can try and implement a very small subset of what you're thinking about.

BTW, if your answer is "but function calls", they are not function calls. They are structured JSON responses that you have to manually convert to actual function calls.

everforward · on April 13, 2024

> BTW, if your answer is "but function calls", they are not function calls. They are structured JSON responses that you have to manually convert to actual function calls.

You could in theory send ASTs as JSON.

troupo · on April 13, 2024

In theory, yes. Someone still needs to compile them and match them to functions etc.

aurareturn · on April 14, 2024

Mapping JSON responses to OS APIs is the easiest part. It's trivial compared to understanding what users actually want.

troupo · on April 14, 2024

> Mapping JSON responses to OS APIs is the easiest part.

Please show me how you will do that for the tens of thousands of OS APIs and data structures.

Edit: because it's not just "a function call" is it? It's often:

   struct1 = set up/get hold of a complex struct 1

   struct 2 = set up/get hold of a complex struct 2

   struct 3 = set up/get hold of a complex struct 3 using one or both of the previous ones

   call some specific function 1

   call some specific function 2 using some or all of the structs above


   free the structs above, often in a specific order

So the question becomes how can this:

   { "function_name": "X", parameters: [...] }

be easily converted to all that?

Don't forget about failure modes. Where you have to check for the validity of some but not all structs passed around.

Repeat that for any combination of any of the 10k+ APIs and structures

aurareturn · on April 15, 2024

If Apple and Google wants to give access to the OS to an LLM such as Siri, they'd make it easy.

It's the trivial part of all this. You're getting too much into the details of what is available to developers today. Instead, you should focus on what Apple and Google would do internally to make things as easy for an LLM as possible.

Up until now, the hardest part was always understanding what exactly users want. GPT4 is a thousand times better than what Siri and Google Assistant are currently doing.

Again, mapping OS APIs is the easiest part. By far.

troupo · on April 15, 2024

> If Apple and Google wants to give access to the OS to an LLM such as Siri, they'd make it easy.

You keep skipping the question of how.

> It's the trivial part of all this. You're getting too much into the details of what is available to developers today.

Somehow you have this mystical magical idea of "oh, it's just this small insignificant little thing".

Literally this:

   LLMs understand human requests
   * magic *
   Things happen in the OS

You, yes you already have basically the same access as developers of the OS have. You already have access to tens of thousands of OS APIs and to the LLMs.

And yet we haven't seen a single implementation that does what you want.

> Again, mapping OS APIs is the easiest part. By far.

If it is, it would make it trivially easy how to do this trivial and easy task for a small subset of those APIs, wouldn't it? Can you show me how you would do it?

themoonisachees · on April 12, 2024

You should try it. Go Ask chtagpt to generate a program that finds all windows in XYZ project and move them to a 'ew workspace and I guarantee it'll fail. Maybe it'll get it right if you feed it back error messages, but you can't do that in production.

datadrivenangel · on April 12, 2024

The problem is that Google walked back their integrations for anti-trust reasons, and Apple is now being sued by the US government for having too many apple only integrations.

AI assistants only barely had enough data to understand your requests, and now have even less.