Prompt Engineering Is Snake Oil

TheAceOfHearts · on Sept 27, 2023

The arguments for not limiting yourself to prompt engineering are compelling, but I wouldn't call the whole practice snake oil. Fundamentally, it's about learning to communicate your desires and requirements effectively.

There's definitely a fair number of simple tricks with existing models which drastically improve output quality. It's usually simple things like making requirements explicit, e.g. "generate a few foo" vs "generate 5 foo". The single most effective phrase I've picked up has been to tell the model that whatever is being output was written by an expert. But a lot of my experimentation has been dealing with generating and exploring fantasy ideas.

toddmorey · on Sept 27, 2023

I would call this article a bit of snake oil. It’s basically an ad.

And you are right—it’s amazing how well the “you are an expert in…” has worked when added to a prompt.

gverrilla · on Sept 28, 2023

exactly, that's all it is: an ad.

BoorishBears · on Sept 27, 2023

The article is silly: LLMs are autoregressive.

Dismissing prompting is dismissing one of the biggest value propositions of RLHF in that you can affect future output in a steerable way using natural language.

Meanwhile the SOTA model in question didn't support fine tuning until last month, still only supports SFT, and costs significantly more than the base model.

The fact the article concludes with "hire us" says it all: great marketing strategy with the title though.

olddustytrail · on Sept 27, 2023

> Fundamentally, it's about learning to communicate your desires and requirements effectively.

Yes, that's how you get the most bang for your buck, if you'll excuse the expression...

Imnimo · on Sept 27, 2023

>In just five days, through thorough data collection and fine-tuning, we not only reduced the system message to under 200 tokens but also transformed the LLM into a robust reasoning tool using a custom JSON format.

That sure sounds like it involved prompt engineering.

peteradio · on Sept 28, 2023

Sounds like snake oil is the real deal then, take that atheists.

NoZebra120vClip · on Sept 28, 2023

Well yeah, for eons, snake oils were known for their efficacious healing properties, wherever those snakes lived. It only took a few Wild West colonists to ruin it for everyone, and most of us forgot how snake oil is the real deal. So now re-read the headline with this in mind...

minimaxir · on Sept 27, 2023

> We've all heard complaints about GPT-3.5 Turbo, particularly when compared to its successor, GPT-4, seemingly struggling to follow instructions. Guess what? In our experience, this is a non-issue with a properly fine-tuned GPT-3.5 Turbo model. In fact, GPT-4 can serve as the "prompt engineer" that assists in generating the training data.

This is omitting the very very important detail that a finetuned gpt-3.5-turbo is 8x the cost of a normal gpt-3.5-turbo, and the output is not 8x better especially with, you know it, prompt engineering. (such as gpt-3.5-turbo's function calling/structured data support, which is prompt engineering at its core)

It's also missing the detail that properly finetuning a model is very hard to do well.

COAGULOPATH · on Sept 27, 2023

And I can't fine tune GPT-4 yet. It's not a case that I'm ignoring it. I literally can't do it.

medbrane · on Sept 27, 2023

Indeed, a reduced prompt size is unlikely to compensate for 8x inference cost increase.

Also fine tuning OpenAI's black box with data generated using gpt4 does not really require a lot more skills than prompt engineering.

throwanem · on Sept 27, 2023

This article is snake oil. Its entire semantic value nets out to: "Prompt engineering is bad! You should hire us to do something else instead." It's a long-form advertisement.

abrookewood · on Sept 28, 2023

Fully agree. Produced no evidence, pure conjecture and just happened to mention their products.

simonw · on Sept 27, 2023

I would find this article a lot more convincing if it went into more detail about the fine-tuning process they used and the results they got.

As it stands, it reads more like lead generation content for their agency as opposed to being a genuinely convincing explanation of what kind of problems can be better solved with a fine-tuned GPT 3.5 model.

I'm disappointed, because I'm desperately hungry for useful, detailed real-world success stories that use the new OpenAI fine-tuning mechanisms. This isn't that - it lacks the detail.

hinkley · on Sept 27, 2023

Here comes the Trough of Disillusionment. That wasn’t supposed to be here until 6:15.

ilaksh · on Sept 28, 2023

Fine tuning for gpt-3.5 came out like a month ago. Saying that anyone who doesn't default to that doesn't know what they are doing is disingenuous. I don't think he really believes that. The fact that they had 700+ tokens in the prompt is not a reason to discount the previous attempt without fine tuning.

I feel like I should be trying out fine tuning on more problems. But this kind of article is just insulting in an inaccurate way.

I think it's fair enough if you are getting better results with fine tuning because not a lot of people do that with OpenAI models. But I would like to see some details and proof in terms of the performance benefits.

I would love to see an article like this with details that was reasonable, fair and truthful. But I think this is so far off the mark, I am flagging it.

baobabKoodaa · on Sept 27, 2023

This is an extremely bad article, written with fancy words, filled with bad advice, delivered with a holier-than-though tone. Are people just upvoting this submission based on the headline?

Have fun fine tuning a new model every 6 months as ClosedAI deprecates models. Have fun paying 8x the cost for inference.

COAGULOPATH · on Sept 27, 2023

It's absolutely the case that LLMs give output that's inconsistent in quality, and the determining factor is your prompt. This is true for base models, fine tuned models, and everything else.

Obviously other things matter too, but the lowest hanging fruit is usually "prompt better."

shayanjm · on Sept 27, 2023

I think the author of this post probably meant to caveat that what we call "prompt engineering" TODAY might tend towards snake oil, but prompt engineering _doesn't have to_ be snake oil, and it _doesn't have to_ promote black box mentalities[1]. What's more is that fine-tuning is certainly not a panacea - it's not particularly great an injecting net-new context into these foundation models. It's great when you want to "close the aperture" a bit in model outputs. Even suggesting that fine-tuning is somehow a replacement for crafting prompts is just incorrect.

[1] https://heatmap.demos.watchful.io/

GuB-42 · on Sept 28, 2023

Of course it is not snake oil!

A prompt is everything to a LLM, there is no other interaction. It would be like saying that knowing the details of bash is snake oil for a sysadmin. If you are not doing some sort of prompt engineering, you are not making the best out of your LLM.

And sure, prompt engineering doesn't replace fine-tuning. To continue with the bash analogy, you can't do everything effectively with bash, sometimes you need to write code in C or other languages. But that you can write C code doesn't make using bash effectively snake oil.

wmf · on Sept 27, 2023

Fine tuning for GPT came out a whole month ago. Before that, the choice was mostly between prompt engineering with GPT or fine-tuning dramatically inferior "open source" models, right?

jasongill · on Sept 27, 2023

... Fine tuning for OpenAI's GPT LLM's has been available for years now, at least since the GPT-3 private beta if not earlier (and obviously you could train the open models yourself)

goodside · on Sept 28, 2023

That's true, but it was expensive and until recently you could only tune older versions of GPT-3 lacking both instruction tuning and the code pre-training of the Codex models (from which GPT-3.5 is thought to descend). You had to want tuning so badly you were willing to 6x the token cost and go back in time 2 years.

medbrane · on Sept 27, 2023

But not for gpt3.5 family.

thisiswater · on Sept 28, 2023

So... the article says that bad prompt engineering is bad and they engineered the prompt to be better and therefore prompt engineering is snake oil? I'm confused.

Der_Einzige · on Sept 27, 2023

No it isn't, most of what we call "Prompt Engineering" simply isn't Prompt engineering. The real prompt engineering is extremely important.

https://gist.github.com/Hellisotherpeople/45c619ee22aac6865c...

iambateman · on Sept 28, 2023

Ehhh this article basically says “we are better at prompt engineering than some random other guy who sucked. Hire us.”

It’s a clever ad, but an ad nonetheless.

The author tries to turn the idea of prompt engineering into something so banal as to be meaningless, but then describes their follow up work in a way which I would consider to also be classified under “prompt engineering.”

soultrees · on Sept 28, 2023

Prompt engineering directly correlates with good communication and knowing how to steer a conversation. There’s no tricks to the game, but the amount of people I talk to that don’t know how to get what they want from an LLM, clearly shows that it’s a skill set to be acquired.

The issue is with the name. Perhaps prompt tinkering would have been a better choice.

000ooo000 · on Sept 28, 2023

Yeah in general I'm a little repulsed by things with 'engineering' tacked on the end. Tech seems to love borrowing credibility by trading on existing names.

Signed, a comment engineer

ftxbro · on Sept 27, 2023

It's not snake oil any more than 'technical writing' or other writing proficiencies are snake oil.

LASR · on Sept 28, 2023

I am sorry, but $5000 behind LangChain - you entered a failure mode.

You fixed it and went further etc. but most teams do not. So it may seem definitive - the conclusions about what Prompt Engineering is or isn’t. But what you’ve described barely scratches the surface on Prompt Engineering.

hermiod · on Sept 27, 2023

Hasn't gpt-3.5-turbo fine tuning been out for only a couple weeks?

How fast are people supposed to pick up these bleeding edge capabilities (like productized fine-tuning) when even the class of technology is so new?

jasongill · on Sept 27, 2023

Fine tuning of other "completions" type models has been available for a couple years

hermiod · on Sept 27, 2023

I get that but it's been off many organizations' radar completely until 3.5 became available.

ftxbro · on Sept 27, 2023

this is a clickbait marketing blogspam for their company

actuallyrizzn · on Sept 28, 2023

This whole article is quora-level self-promo stuff. Clown shoes.

friend_and_foe · on Sept 28, 2023

If it's snake oil then what is a prompt injection attack?

dartos · on Sept 27, 2023

Something isn’t a silver bullet.

How surprising.

That doesn’t make it snake oil.

rhuru · on Sept 27, 2023

Glad someone says this this clearly.