My understanding of this tech is pretty minimal, so please bear with me, but is the basic idea is something like this?
Before: Evaluate the image in a little region around each pixel against the prompt as a whole -- e.g. how well does a little 10x10 chunk of pixels map to a prompt about a "red sphere and blue cube". This is problematic because maybe all the pixels are red but you can't "see" whether it's the sphere or the cube.
After: Evaluate the image as a whole against chunks of the prompt. So now we're looking at a room, and then we patch in (layer?) a "red sphere" and then do it again with a "blue cube".
A more subtle aspect of customer service here is that, as the dev or PM responding to the customer, you have a lot more power to give the customer what they want.
A BigCo can hire a lot of CS people but the best they can do sometimes is "we hear you and we'll pass along your feedback".
Alas, you’ve found the reason this won’t scale. Doing customer support as the CTO is a superpower (up to a point!) for both user growth and product design. But there’s going be to come a point where we have to hand some portion of support over to a dedicated support team, and no matter how well we train those folks they’re just not going to be quite as empowered and effective. The longer I can kick that can down the road, though, the better!
(At least for the business. My sleep schedule would improve amazingly!)
You want Support Engineers who are up to speed on your code review processes and standards and given access to commit "straightforward" bug fixes. They exist :)
From there, you need to build a culture that is welcoming of "strangers" contributing code. You get these two things, you get nits and gotchas fixed directly from pain points customers are having, while product engineering is (mostly) focusing on feature dev.
Tier 0 are effectively secretaries who can file structured info around issue.
Tier 1 are dedicated support people who can follow scenarios and guide customers through the product.
Tier 2 are what you call "support engineers". They know the product, features, code, upcoming features and so on. For an in house product they are capable of making straightforward bugfixes.
Tier 3 is sometimes called "vendor support". For an in-house product this is effectively product development team.
As you can see, good supports bleeds into or blends with product development. This is how you get support answers like "this feature is planned to go live Y24Q1, but you can sign up to beta in exchange for feedback" or at least "This is not supported, but you can use features x and y to achieve similar result", instead of "Sorry, such workflow is not supported"
I tried that but it's just really hard to keep up over time -- e.g. I used a rule based on the domain name but domain names change somewhere often. Toss in things like "ugh, which of my three emails did I use on this site" or "which high school teacher did I say was my favorite for this site" and it ends up being a big hairy mess that screams for an encrypted place to stick my notes.
Also, what I consider "non obvious" isn't that non-obvious. Given enough of a sample size, a committed attacker can guess a lot of rules. And if the prize (a crypto wallet) is big enough, they might be motivated enough to give it a go.
So under U.S. law, the more "transformative" something is (as opposed to "derivative"), the more likely it is to be deemed fair use. The line between derivative and transformative is fuzzy but generally, something like a movie adaptation of a book is derivative whereas parody is transformative.
Given that, suppose I have a cat pic. Google creates a thumbnail of that cat pic, which is, by itself, obviously derivative. But Google includes my cat pic alongside many other cat pics in response to a query for cats, the thing as a whole starts looking more transformative.
Now suppose my cat pic gets sucked into a generative AI. I suppose it could be used to create a merely derivative copy of the original cat pic, albeit with reduced quality, like a thumbnail. But the whole point of these models is to recombine features from millions of images to create something unique. That is, if I tell the model to draw a cat, it combines features from thousands of other cat pics. Which seems at least as transformative as simply showing the same cat pics in a grid.
From an ethics standpoint, the main difference between Image Search and a generative AI is attribution. Google Image Search is just links whereas Stable Diffusion is sort of opaque about its sources. But attribution isn't one of the factors of fair use -- a parody which makes fun of the original without ever directly identifying the original is still parody.
That said, I suppose you could argue it affects the economic impact of the copying, one of the other fair use factors -- it seems plausible to me that AI generated images impacts the market for, say, Getty images in a way that does image search does not. But it's anyone's guess how a court would balance those two things -- courts very often pretend all of the fair use point one way to discussing how factors are balanced.
The key aspects that the supreme courts in the US looked at when they decided on the author guild vs google, was if the use of the work supersede, supplant or become a replacement for the original works. They also looked if google sold portions of the copies, and if the activity enhanced the sell of the original work for the benefit of the copyright holder.
Some parodies has been found as non-fair use when they supplant or becomes a replacement for the original work. People often get upset when that happens with head lines like "they are outlawing parody!", but from the laws perspective the outcome of a situation is actually very important when determining fair use. In the general case, parodies do not replace original works so the law works generally fine. I guess one could also view it as a transformative work in general do not supplant the original work, while a derivative usually do. A generative AI version of grumpy cat might behave more like a derivative than a generative AI version of a generic cat, even if both are using the same technology.
Even if the AI didn't make stuff up, it's unclear to me what value it adds over...
- The existing human edited text already on the page
- The playground to run the code in question
- One of the many existing autocomplete / static analysis tools for HTML / CSS / JS out there. If you want a tooltip explaining what the second argument to a built in JS function in a code snippet does, this is a solved problem!
If you're going to use an ML model, you might as well use an existing language translation model rather than asking ChatGPT or the like to convert words one by one. You'd probably get better results treating American English and British English as entirely different languages rather than assuming one is the same as the other but with a handful of different words.
I'm still annoyed they got rid of the ultrasonic sensors on the latest model. I test drove a 2022 (or maybe a 2021) and the park assist was pretty good. And then I get the 2023 delivered and there's no park assist for the first 6 months because they removed the ultrasonics but the vision-only software wasn't ready yet. A vision-based park assist finally came in via an OTA update but it's nowhere near as precise as the ultrasonic version was. Like, the estimates of how much distance is in front of me seem to jump around a lot more than I'm actually moving, and it sometimes reports it's degraded when trying to pull out of a tight spot.
Well, it sort of is OpenAI's fault that it presented the interface as a chat bot though.
> It was given a sequence of words and tasked with producing a subsequent sequence of words that satisfy with high probability the constraints of the model.
This is just autocorrect / autocomplete. And people are pretty good at understanding the limitations of generative text in that context (enough that "damn you autocorrect" is a thing). But for whatever reason, people assign more trust to conversational interfaces.