My work gives us Copilot access since We're a Microsoft Shop™ It's too bad Copil...

NordSteve · 2025-06-24T21:38:52 1750801132

I pasted your prompt:

  ffmpeg command to convert movie.mov into a reasonably sized mp4

into the Copilot app just now. Here's the response:

  ffmpeg -i movie.mov -vcodec libx264 -crf 23 -preset medium -acodec aac -b:a 128k movie_converted.mp4

with this explanation

  Here’s the breakdown:
  - -vcodec libx264: Uses the H.264 video codec, great for compression and compatibility.
  - -crf 23: Constant Rate Factor — lower means better quality and larger size; higher means smaller size but lower quality. 23 is a nice middle ground.
  - -preset medium: Controls encoding speed vs. compression efficiency. You can try slow or faster depending on your needs.
  - -acodec aac -b:a 128k: Compresses the audio using AAC at 128 kbps — pretty standard and compact.
  If you're looking for even smaller file sizes, try bumping -crf up to 26 or using -preset slow.
  Want to bake in scaling, remove metadata, or trim the video too? Happy to tailor the command further.

Aurornis · 2025-06-25T01:56:50 1750816610

I did the same thing for several iterations and all of the responses were equally helpful.

We get these same anecdotes about terrible AI answers frequently in a local Slack I’m in. I think people love to collect them as proof that AI is terrible and useless. Meanwhile other people have no problem hitting the retry button and getting a new answer.

Some of the common causes of bad or weird responses that I’ve learned from having this exact same conversation over and over again:

- Some people use one never-ending singular session with Copilot chat, unaware that past context is influencing the answer to their next question. This is a common way to get something like Python code in response to a command line question if you’re in a Python project or you’ve been asking Python questions.

- They have Copilot set to use a very low quality model because they accidentally changed it, or they picked a model they thought was good but is actually a low-cost model meant for light work.

- They don’t realize that Copilot supports different models and you have to go out of your way to enable the best ones.

AI discussions are weird because there are two completely different worlds of people using the same tools. Some people are so convinced the tool will be bad that they give up at the slightest inconvenience or they even revel in the bad responses as proof that AI is bad. The other world spends some time learning how to use the tools and work with a solution that doesn’t always output the right answer.

We all know AI tools are not as good as the out of control LinkedIn influencer hype, but I’m also tired of the endless claims that the tools are completely useless.

muppetman · 2025-06-25T04:34:31 1750826071

The "pick your model" thing is so stupid.

"How dumb do you want your AI to be?"

"Why do I have to select?"

"Because smart costs money"

"So... I can have dumb AI but it's cheaper?"

"Yes"

"How would the average person know which to pick?"

"Oh you can't know."

I hope they can invent an AI that knows which AI model my question should target cheaply.

runako · 2025-06-25T13:30:46 1750858246

And then the model names & descriptions are virtually useless at providing any guidance.

ChatGPT lets me choose between GPT-4o ("Great for most tasks"), o3 ("Uses advanced reasoning"), o4-mini ("Fastest at advanced reasoning"), and o4-mini-high ("Great at coding and visual reasoning").

Is what I'm doing "most tasks"? How do I know when I want "advanced reasoning"? Great, I want advanced reasoning, so I should choose the faster one with the higher version number, right? etc.

antonkochubey · 2025-06-25T15:15:09 1750864509

Then there's GPT-4.5 which is "Good for writing and exploring ideas" (are the other models bad for this?), and GPT-4.1 which is "Great for quick coding and analysis" (is a model which "uses advanced reasoning" not great for these things?)

tzs · 2025-06-25T15:49:08 1750866548

Can you describe your task and then ask ChatGPT which model you should use?

runako · 2025-06-25T16:33:29 1750869209

This presents the same problem, since none of the models are indicated to be best at choosing the model to use for a task.

hluska · 2025-06-25T17:38:17 1750873097

Try different ones out and learn which works best for what type of work?

runako · 2025-06-25T18:57:06 1750877826

Without getting too much into semantics, I would suspect that most individuals would have trouble classifying their "type of work" against an opaque set of "type of work" classifiers buried in a model.

elbear · 2025-06-25T19:23:18 1750879398

He was suggesting that you try different models for the same thing and see which output you like best. It's tedious but at least you get an answer.

dataflow · 2025-06-25T13:38:11 1750858691

Can't you just run a few examples by hand to see how they perform for your tasks, before committing to any for production?

runako · 2025-06-25T14:40:23 1750862423

> before committing to any for production

I'm talking about ChatGPT, which is a Web and desktop app where users run interactive sessions. What does "production" mean in this sense?

hluska · 2025-06-25T17:39:03 1750873143

It’s simple - practice using them instead of complaining. Maybe you’ll figure out the differences on your own.

runako · 2025-06-25T18:54:42 1750877682

As a person who uses LLMs daily, I do in fact do this. Couple problems with this approach:

- there are billions of people who are not accustomed to using software this way, who are in the expected target market for this software. Most people cannot tell you the major version number of their mobile OS.

- this approach requires each individual to routinely perform experiments with the expanding firmament of models and versions. This is obviously user-hostile.

Anyway, my hot take here is that making things easier for users is better. I understand that is controversial on this site.

BobaFloutist · 2025-06-25T17:47:31 1750873651

Imagine if this is what people suggested when I asked what kind of screwdriver I should use for a given screw, because they're all labelled, like, "Phillips. Phillips 2.0. Phillips.2.second. Phillips.2.second.version 2.0. Phillips Head Screwdriver. Phillips.2.The.Second.Version. Phillips.2.the.second.Version 2.0"

dataflow · 2025-06-25T15:34:13 1750865653

I think I misunderstood what people were talking about. Somehow I thought it was about their APIs, for specific uses in other apps.

runako · 2025-06-25T16:34:58 1750869298

To their credit, they did get this part correct. "ChatGPT" is the user-facing apps. The models have terrible names that do not include "ChatGPT".

Anthropic, by contrast, uses the same name for the user-facing app and the models. This is confusing, because the user-facing apps have capabilities not native to the models themselves.

HappMacDonald · 2025-06-25T06:54:38 1750834478

You bring up the important point that for a company who earns money off of tokens wasted, a confusing selection of models can translate into extra spend to experiment with tweaking them.

Some users may not appreciate that, but many more might be drawn to the "adjust the color balance on the TV" vibes.

setopt · 2025-06-25T13:16:22 1750857382

> I hope they can invent an AI that knows which AI model my question should target cheaply.

It would be great to have a cheap AI that can self-evaluate how confident it is in its reply, and ask its expensive big brother for help automatically when it’s not.

taikahessu · 2025-06-25T13:24:11 1750857851

That would actually be the AGI we are waiting for, since we - as humans, in surprisingly big portion of all cases - don't know how or can't seem to do that either!

setopt · 2025-06-26T08:16:28 1750925788

On the other hand, ChatGPT seems to be getting better at knowing when it should Google something for me rather than hallucinate something.

Shouldn’t asking a more expensive model for input be a similar level of «tool use»?

reilly3000 · 2025-06-25T04:50:38 1750827038

I think you make a good point. Cursor is doing a basic “auto” model selection feature and it could probably get smarter, but to gauge the complexity of the response you might need to run it first. You could brute force it with telemetry and caching if you can trust the way you measure success.

bastard_op · 2025-06-25T18:49:15 1750877355

I usually feel with chatgpt picking a model is like "Which of the three stooges would you like to talk to, curly, larry, or moe (or worse, curly joe)?" I usually only end up using o3 because gpt-40 is just that bad, so why would I ever want to talk to a lesser stooge?

If paying by API use it probably makes more sense to talk to a lesser stooge where possible, but for a standard pro plan I just find the lesser models aren't worth the time to use in frustration they cause.

prepend · 2025-06-25T11:48:35 1750852115

I imagine that we need a bootstrap ai to help you optimize the right ai for each task.

I don’t think I’d trust the vendor’s ai to optimize when they will likely bias toward revenue. So a good case for a local ai that only has my best interests at heart.

Currently, the guidance from vendors is “try it and see which yields the best results” which is kind of like “buy this book, read it, and see if you like it” and how of course the publisher wants you to take this action because they get their money.

addandsubtract · 2025-06-25T08:18:00 1750839480

> I hope they can invent an AI that knows which AI model my question should target cheaply.

Isn't that the idea of OpenRouter?

oersted · 2025-06-25T09:38:49 1750844329

Not exactly, but yeah. OpenRouter is a unified API, directory and billing system for LLM providers.

I think you are getting confused by the term "Model Routing", which to be fair OpenRouter does support, but it's a secondary feature and it's not their business focus. Actually OpenRouter is more focused on helping you choose the best provider for a specific open model based on their history of price, speed, reliability, privacy...

The model routing is simply provided by NotDiamond.ai, there are a number of other startups in this space.

https://openrouter.ai/docs/features/model-routing

JohnMakin · 2025-06-25T02:51:11 1750819871

The thing responses like this miss I am pretty sure is that this is a nondeterministic machine, and nondeterministic machines that are hidden by a complete blackbox wrapper can produce wildly different results based on context and any number of independent unknown variables. so pasting “i did the same thing and it worked fine” is essentially this argument’s version of “it worked on my local.” Or it essentially boils down to “well sure, but you’re just not doing it right” when the “right” way is undefined and also context specific.

josephg · 2025-06-25T04:35:25 1750826125

You’re both right. Some problems should be solved with better user education. And some should be solved with better UX. It’s not always clear which is which. It’s too simple to blame everything on user error, and it’s too simple to blame everything on the software.

Cell phones are full of examples. So much of this stuff is obvious now we’ve been using them for awhile, but it wasn’t obvious when they were new. “My call dropped because I went in a tunnel” is user error. “My call cut out randomly and I had to call back” is a bug. And “my call cut out because my phone battery ran out” is somewhere in the middle.

For chatbots, lots of people don’t know the rules yet. And we haven’t figured out good conventions. It’s not obvious that you can’t just continue a long conversation forever. Or that you have to (white consciously) pick which model you use if you want the best results. When my sister first tried ChatGPT, she asked it for YouTube video recommendations that would help when teaching a class. But none of the video links worked - they were all legitimate looking hallucinations.

We need better UX around this stuff. But also, people do just need to learn how to use chatbots properly. Eventually everyone learns that calls will probably drop when you go into a tunnel. It’s not one or the other. It’s both.

lukan · 2025-06-25T10:42:04 1750848124

"I’m also tired of the endless claims that the tools are completely useless."

Who claimed that here?

I read a claim that Copilot is dumber than claude and ChatGPT and I tend to confirm this.

"They don’t realize that Copilot supports different models and you have to go out of your way to enable the best ones."

So possible that none of us who thinks that, went out of our way to find outy when there were working alternatives, but it would be still on Microsoft making it hard to make good use of their tool.

hbn · 2025-06-25T16:35:05 1750869305

Yeah I'm not sure why they'd think my point was that LLMs are useless. Clearly I'm integrating them into my work, I just think Copilot is the dumbest. It's given me the most nonsensical answers like the example I provided, and it's the one I use the least. Which is even crazier when you consider we're on a paid version of Copilot and I just use free ChatGPT and Claude.

hluska · 2025-06-25T17:37:09 1750873029

Your entire comment sure read a lot like you were calling the tools useless. You even used the worst possible prompt to make your point. That’s likely why people are reacting badly.

hbn · 2025-06-25T20:57:02 1750885022

I said the Copilot is the worst competitor in the space.

Where did I say anything in general about LLMs being useless?

msgodel · 2025-06-25T10:21:30 1750846890

This is part of why I really like local models. I always use the same random seed with mine so unless I'm using aider the responses are 100% deterministic. I can actually hit c-r in my shell to reproduce them without having to do anything special.

muzani · 2025-06-25T11:58:43 1750852723

Some are more deterministic than others, e.g. Gemini Flash.

msgodel · 2025-06-25T15:29:49 1750865389

The non-determinism comes from the sampler not the model.

muzani · 2025-06-26T02:36:29 1750905389

I always thought it was packaged with the model.

otabdeveloper4 · 2025-06-25T13:31:34 1750858294

"Spin the chatroulette again and see if you vibe something better" is not a foundation for a business.

Well, unless your business is selling vibes.

ryanar · 2025-06-25T12:58:00 1750856280

The memory feature also can be a problem, it injects stuff into the prompt context that you didnt explicitly write with the intent it will help because it knows you are a python programmer so lets respond with a python script instead of our usual ffmpeg cli command.

jiggawatts · 2025-06-25T03:44:55 1750823095

Everything is like this.

I saw an IT professional google “My PC crashed” to diagnose a server bluescreen stop error.

rienbdj · 2025-06-25T05:16:21 1750828581

Reminds me of

I’m Feeling Lucky -> bad result -> Google search is useless

darkwater · 2025-06-25T10:14:20 1750846460

1. I would say that nobody did that, so you are making up a straw man

2. The Copilot or ChatGPT or Claude "Ask" buttons should then be renamed to "I'm feeling lucky". And that would be the only button available.

krige · 2025-06-25T06:57:15 1750834635

Yeah except Feeling Lucky is the only button you can press and people blame you if they got lucky

sunaookami · 2025-06-25T04:12:26 1750824746

  Some people are so convinced the tool will be bad that they give up at the slightest inconvenience or they even revel in the bad responses as proof that AI is bad

AI derangement syndrome

csomar · 2025-06-25T06:00:35 1750831235

I am 67.87% certain they make it dumber/smarter during the day. I think it gets faster/better during non-business hours. This needs to be tested more to confirmed, though. However, they have exactly ZERO transparency (especially the subscription model) into how much you are consuming and what you are consuming. So it doesn't really help with the suspicions.

DHRicoF · 2025-06-25T08:48:14 1750841294

I remember reading an article about different behavior between summer and winter. So, working better/worst in business hours doesn't sound completely crazy.

But they turning some knobs based on load also looks razonable.

0points · 2025-06-25T12:16:22 1750853782

What you and many other seem to miss is that the LLM is not deterministic.

glenstein · 2025-06-24T22:01:21 1750802481

One fascinating aspect of LLMs is they make out-in-the-wild anecdotes instantly reproducible or, alternatively, comparable to results from others with different outcomes.

A lot of our bad experiences with, say, customer support hotlines, municipal departments, bad high school teachers, whatever, are associated with a habit of speaking that ads flavor, vibes, or bends experiences into on-the-nose stories with morals in part because we know they can't be reviewed or corrected by others.

Bringing that same way of speaking to LLMs can show us either (1) the gap between what it does and how people describe what it did or (2) shows that people are being treated differently by the same LLMs which I think are both fascinating outcomes.

celeritascelery · 2025-06-24T23:35:16 1750808116

LLMs are definitely not instantly reproducible. The temperature setting adjust randomness and the models are frequently optimized and fine tuned. You will very different results depending on what you have in your context. And with a tool like Microsoft copilot, you have no idea what is in the context. There are also bugs in the tools that wrap the LLM.

Just because other people on here say “worked for me” doesn’t invalidate OPs claim. I have had similar times where an LLM will tell me “here is a script that does X” and there is no script to be found.

glenstein · 2025-06-25T01:45:43 1750815943

I was intentionally broad in my claim to account for those possibilities, but also I would reject the idea that instant reproducibility is generally out of reach on account of contextual variance for a number of reasons.

Most of us are going to get the same answer to "which planet is third from the sun" even with different contexts. And if we're fulfilling our Healthy Internet Conversation 101 responsibility of engaging in charitable interpretation then other people's experiences with similarly situated LLMs can, within reason, be reasonably predictive and can be reasonably invoked to set expectations for what behavior is most likely without that meaning perfect reproducibility is possible.

reilly3000 · 2025-06-25T04:53:51 1750827231

I think it really depends on the UI, like if it was in some desktop native experience maybe it accidentally produced a response assuming there would have a code canvas or something and sent the code response under a different JSON key.

myhf · 2025-06-24T22:28:06 1750804086

We're also seeing a new variant of Cunningham's law:

The best way to get the right answer from an LLM is not to ask it the right question; it's to post online that it got the wrong answer.

cdblades · 2025-06-25T15:29:13 1750865353

> One fascinating aspect of LLMs is they make out-in-the-wild anecdotes instantly reproducible

How? I would argue they do the exact opposite of that.

glenstein · 2025-06-25T20:50:34 1750884634

Asking the number of Rs in the word Strawberry is probably the most famous one.

bluSCALE4 · 2025-06-25T01:39:02 1750815542

AI probably hates him so it acts dumb.

baxtr · 2025-06-25T05:21:44 1750828904

Same here (MS Shop).

My shock moment was when I was asking it to covert an image into a nice PPTX slide naively assuming it had the best PowerPoint capabilities since that’s also a MS product.

It returned a non formatted text box on one slide. I had to laugh so hard people on the office noticed.

hbn · 2025-06-25T16:38:45 1750869525

Oh yeah, one time I uploaded a couple of files to figure out an issue I was having, and it offered to rewrite the files to fix the issue. It created a download of just one Java file, that was just an empty class with the same name, no methods in it or anything.

DannyBee · 2025-06-25T13:23:49 1750857829

Gemini-in-sheets is the same way

i asked it to make a sheet look nicer and it couldn't without me explicitly telling it what i wanted done.

When i told it to do certain things, it's like "that's not implemented yet, stay tuned!"

TeMPOraL · 2025-06-25T09:28:04 1750843684

Yeah, working in an MS shop in the past couple years, that's what I've been saying ever since first iteration Copilot for MS Office came out, and it's true to this very day: you want AI to help you with your MS Office files? Ask ChatGPT. Or get API keys and use some frontend with a half-assed third party plugin that knows how to convert Markdown into .docx or such. Results are still an order of magnitude better than the Copilot in MS Office.

I blame it on corporate reluctance to take risks that could result in bad press. They put a lot of work into screwing Copilot up. I mean, they had it running GPT-4 back when GPT-4 was the new hot thing. The way it was comically useless is not something that "just happens" - as evidenced by the fact, that just running GPT-4 via API yourself produced good results by default.

hypercube33 · 2025-06-25T13:41:37 1750858897

Thats a good note. I have all of my documentation in markdown (which Microsoft parades on with Github, VSCode, docs.microsoft.com etc) but Copilot cant or wont read these files. I had to pandoc everything over to docx files before it even saw them. Pretty wild.

Also in line with this, Copilot 365 seems to not get how charts work. I asked it with multiple different massaged data sets and it kept giving worse and worse answers even after resetting the chat and data to as simple as possible (think 10 dates, 2 colums of integers) until it ultimately gave me a blank chart. I gave up and asked GPT.

hangonhn · 2025-06-24T23:00:55 1750806055

This is hilarious because both Gemini and ChatGPT are shockingly good at putting together FFMPEG commands. They can both put together and also understand the various options and stages/filters.

canyp · 2025-06-25T01:01:46 1750813306

I really like the final remark, "or is there anything else that I can help you with"?

Yeah, like how about answering the fucking question? lol

35123123 · 2025-06-25T02:58:04 1750820284

sed, awk, docker, ffmpeg, etc... are problebly the most Google-ed commands. It's kinda shocking bad that MS LLM model is bad at this .

what-the-grump · 2025-06-24T21:36:16 1750800976

I cannot reproduce this in any version of copilot?

Copilot with outlook.com

Copilot base one that comes with M365,

And the add-on one for 30$/mo.

Copilot in VS code

All produce: ffmpeg -i movie.mov -vcodec libx264 -crf 23 -preset medium -acodec aac -b:a 128k output.mp4

Which is not surprising because its just an Open AI 4o call... so how are you getting this?

cdblades · 2025-06-25T15:31:05 1750865465

> I cannot reproduce this in any version of copilot?

Because LLM-based service outputs are fundamentally not-reproduceable. We have no insight into any of the model settings, the context, what model is being run, etc.

nolok · 2025-06-25T13:14:40 1750857280

The fact they have so many different version is hilarious to me; whern they're in the context of getting every day entreprise customers.

Already gemini giving me flash or pro like i'm supposed to know and decide which i need is missing the point, but 4 of them is crazy.

what-the-grump · 2025-06-26T14:27:09 1750948029

Wait whats so confusing?

GPT3.5, 4.0, 4o, 4o-mini, 4.1 o3, 4,5?

I want to slap every person at OpenAI.

call freaking tuned models what they are, basic, advanced, coding, research, etc. etc. Slow, Fast. Like cmon.

svachalek · 2025-06-24T22:23:20 1750803800

Copilot is not OpenAI

District5524 · 2025-06-25T12:25:37 1750854337

The pipeline Microsoft is using for Copilot products hides actually what models they are using, and you have no influence over it. Sometimes they use smaller models, but I have no clear source from Microsoft saying this ...

adastra22 · 2025-06-24T22:32:05 1750804325

It is OpenAI weights under the hood, unless something changed recently?

cubefox · 2025-06-25T00:11:54 1750810314

At least Bing Chat was GPT-4-base with Microsoft's own fine-tuning.

KoolKat23 · 2025-06-25T09:17:26 1750843046

And there's been a number of iterations of 4o. So it could be a really old, pruned one if upgraded from gpt-4.

0points · 2025-06-25T12:21:05 1750854065

Copilot is OpenAI in branding since day 1.

https://en.wikipedia.org/wiki/GitHub_Copilot#Implementation

hu3 · 2025-06-24T21:02:26 1750798946

What model? It got right first try here with your exact prompt using the free GPT-4.1 model.

https://i.imgur.com/toLzwCk.png

ffmpeg -i movie.mov -c:v libx264 -preset medium -crf 23 -c:a aac -b:a 128k movie.mp4

BUT, I have this in my copilot-instructions.md file:

    # Always follow these steps when responding to any request
    1. Please do a round of thinking in <thinking></thinking> tags
    2. Then a round of self-critique in <critique></critique> tags
    3. Then a final round of <thinking>, before responding.
    4. If you need more information, ask for it.

easton · 2025-06-24T21:05:00 1750799100

That's GitHub Copilot, not "microsoft copilot", the bot built into the Microsoft 365 landing site. it doesn't give you model options, for example.

clvx · 2025-06-24T21:12:04 1750799524

Whoever decided to use the same brand for different experiences should be demoted at minimum. What a way to destroy trust.

airstrike · 2025-06-24T21:17:01 1750799821

I suggest you never visit https://www.office.com/

9cb14c1ec0 · 2025-06-25T01:05:06 1750813506

Microsoft Office is one of the most recognizable and valuable brands ever. I'm quite terrible at marketing, and even I can recognize how stupid the rebrand was.

OccamsMirror · 2025-06-25T07:26:29 1750836389

This is the same company that thought it would be an awesome idea to rename "Microsoft Remote Desktop" to "Windows App" on MacOS.

It wasn't. It's the dumbest name ever.

thewebguyd · 2025-06-25T20:34:51 1750883691

Don't forget Windows Mail->Outlook (New), replacing Outlook with Outlook (Classic). Same with "Teams (Classic)" and "Teams (New)"

Or the Teams download page that had two different versions - Teams for Home and Teams for Work/School.

Or .NET->.NET Core & .NET Framework->Back to .NET again.

TeMPOraL · 2025-06-25T09:31:47 1750843907

Maybe they figured their brand was too recognizable and valuable, and had to knee-cap it to restore the cosmic balance of the Great Material Continuum.

EDIT:

There's even a rule of acquisition that could possibly apply here: #239 - Never be afraid to mislabel a product.

airstrike · 2025-06-25T01:05:56 1750813556

Literally this. It's one of the strongest names in all of software. It really boggles the mind.

olyjohn · 2025-06-25T07:39:56 1750837196

I thought that renaming Active Directory to Entra ID was bad. Every single tech person who ever touched a Windows server knows what AD is. Then they change to name to something that sounds like it's going to give you an anal probe. What a dumpster fire...

bradstewart · 2025-06-25T14:24:06 1750861446

Thank you for this. As someone who recently had to stumble back into turning a few knobs in (what I thought would be) AD for Office 365 licensing needs, after ~10 years outside of the MS sandbox, I had no earthly idea what Entra was. Until right now.

TeaBrain · 2025-06-25T18:38:58 1750876738

I think there's little chance it won't be changed back. Changing the name was probably motivated by someone in management pushing the name change so that they could list it as a personal achievement as one of the "new" AI products they'd overseen the release of in the current zeitgeist.

DangitBobby · 2025-06-24T22:35:05 1750804505

Good lord

https://news.ycombinator.com/item?id=42751726

onemoresoop · 2025-06-25T00:47:28 1750812448

Microsoft is the worst offender at renaming their products and services with such bad confusing names I don't think it's helping anyone, including Microsoft.

airstrike · 2025-06-25T01:03:19 1750813399

Imagine literally squandering the brand name "OFFICE"

aydyn · 2025-06-25T09:52:47 1750845167

Next they are going to rename Windows to Microsoft Copilot.

snickerbockers · 2025-06-25T07:57:58 1750838278

I keep missing meetings because I foolishly confuse Teams with Teams (classic).

kpil · 2025-06-24T22:55:58 1750805758

I got unreasonably triggered by this.

tiluha · 2025-06-25T00:55:01 1750812901

You are not alone. It evoked a physical reaction in me

paradite · 2025-06-25T01:07:12 1750813632

"To continue, please install the Microsoft 365 Copilot app"

I got this on mobile. Seems to be pretty apt.

Lutger · 2025-06-25T13:07:58 1750856878

Wait, what? Is this a joke? Did they really rebrand Office to '365 Copilot app'? I feel like I'm missing the plot, they can't be serious.

airstrike · 2025-06-25T16:26:40 1750868800

Yes, they literally did that. It's absolutely moronic. Whoever came up with this idea should be fired along with everyone else who greenlit it.

mike_hearn · 2025-06-25T14:15:14 1750860914

Don't worry, the other Office homepage says that it's actually been renamed to just Microsoft 365, not Microsoft 365 Copilot App:

https://www.microsoft.com/en-us/microsoft-365/microsoft-offi...

I got there by going to office.com and clicking Products > Microsoft Office. Lol. Rofl, even. This has made my day. And we all thought calling their third generation console Xbox One was the worst possible branding decision.

darkwater · 2025-06-25T10:20:06 1750846806

Oh. My. Fucking. God.

Are they aware that people will struggle to find if Office is installed and that they will keep calling it Office til the end of times (aka the next rebranding that will revert back things) anyway?

johnebgd · 2025-06-24T22:34:16 1750804456

Microsoft has the worst branding in tech. Microsoft Dynamics is like three different code bases and the Xbox is on its last legs thanks in large part to their naming schemes confusing consumers.

ackfoobar · 2025-06-24T22:29:36 1750804176

Ha. Microsoft does it all the time.

https://news.ycombinator.com/item?id=40419292

shermantanktop · 2025-06-24T21:19:02 1750799942

Having established brand awareness is a double-edged sword. Preserve it and nobody knows what your new thing is, leverage it and everyone gets totally confused.

dcminter · 2025-06-24T22:37:20 1750804640

Ah yes, I call this "brandfucking."

IBM used to be a repeat offender. I recall trying to buy the WebSphere (Java) application server for a client and then finding out that IBM had slapped "WebSphere" on all their products including things like¹ MQ Series (a queue) and DB/2 (a database). It took me an hour to figure out the right item and it was an online purchase!

¹I might be misremembering the exact products but it was similarly absurd.

SoftTalker · 2025-06-25T01:41:59 1750815719

They were sticking “Watson” on all their product names for a while too.

dcminter · 2025-06-25T07:12:40 1750835560

Yep, and they got very overexcited about "VisualAge" for this, that, and the other at one point. "VisualAge for C++ for OS/2" being one of the more coherent examples I guess...

vdfs · 2025-06-24T22:00:53 1750802453

Probably the same one responsible for Office -> 365 naming

DangitBobby · 2025-06-24T22:35:27 1750804527

It gets worse

https://news.ycombinator.com/item?id=42751726

wodenokoto · 2025-06-25T03:50:29 1750823429

It’s because in Microsoft terminology a “copilot” is a chatbot or LLM agent.

So you get your copilot for m365 subscription and add copilot studio which you use to create co pilots

tsimionescu · 2025-06-25T05:37:56 1750829876

This almost makes sense, but it is certainly not how Microsoft marketing did things. "Microsoft 365 Copilot app" is a suite of productivity apps, most well known for Word, Excel, and PowerPoint. It was formerly known as "Office". Microsoft 365 Copilot app includes Copilot as one of the apps.

This is all information taken from office.com, not some joke or exaggeration...

TeMPOraL · 2025-06-25T09:44:58 1750844698

Can confirm - I'm looking at my Android phone now; the "Office Hub" app I knew as "Office" or "Microsoft 365" has, at some point, renamed itself to "M365 Copilot". To make things more obvious and less confusing, it's sitting right next to an app named "Copilot", which is their ChatGPT interface, and as far as I can tell, doesn't do anything copiloty with the Office suite.

Looking at the two side by side in an app manager, I see:

- Copilot - com.microsoft.copilot

- M365 Copilot - com.microsoft.office.officehubrow

And they both have identical icon, except the latter has a tiny black rectangle with tiny white "M365" label tucked in the corner.

jerf · 2025-06-25T13:37:12 1750858632

It's amazing to me how too much marketing education and/or experience seems to rot the brain. You learn on like day 4 of Marketing 101 that your brands should be distinct and recognizable, and hopefully some solid tips on how to do that. Cool. Solid. Seems obvious but there's plenty of things that seem obvious in hindsight that education can help you with.

Somewhere between that and a master's degree and 10 years at a prestigious marketing firm, though, apparently there's some lessons about how you should smear all your brands all over each other in some bid to, I presume, transfer any good will one may have had to all of them, but it seems to me that they could stand to send those people back to MKT101 again, because the principle of labeling what your product actually is seems to elude them after Too Much Education.

TeMPOraL · 2025-06-26T07:05:53 1750921553

Think is, it's the latter lessons that are correct, because the ultimate arbiter of which marketing practices work or not is the market itself.

If anything, Marketing 101 works as a scaffolding but you learn the real lessons later on (basically like with every other vocational training wrapped in a degree, including especially computer science) - but also, and perhaps more importantly, it serves as a fig leaf. You can point to that and say, it's a Science and an Art and is Principled and done well It Is For The Good Of All Mankind, and keep the veneer of legitimacy over what's in practice a more systematized way of bringing harm to your fellow humans.

Also specifically wrt. brands - brands as quality signals mostly died out a decade or more ago; mixing them up is just a way to get their decaying corpses to trick more people for a little longer.

hu3 · 2025-06-24T22:59:32 1750805972

I see. Still worked for me.

Openned: https://copilot.microsoft.com

Same prompt: ffmpeg command to convert movie.mov into a reasonably sized mp4

https://i.imgur.com/CuaxIlL.png

jug · 2025-06-25T00:20:19 1750810819

Yeah it’s really annoying how opaque they are about the model there. Always just ”GPT 4 based” or ”GPT 4o based” if you dig in their blog archives. Makes one unable to check it against benchmarks or see when it’s updated. Setting expectations. Is it a distill? Lower precision quant? An old revision? Who knows.

vel0city · 2025-06-24T22:15:55 1750803355

I put your exact prompt into Copilot and it gave me the command

ffmpeg -i movie.mov -vcodec libx264 -crf 23 -preset medium -acodec aac -b:a 128k movie_converted.mp4

Along with a pretty detailed and decently sounding reasoning as to why it picked these options.

12345hn6789 · 2025-06-25T02:11:15 1750817475

It's been increasingly more obvious people on hacker news literally do not run these supposed prompts through LLMs. I bet you could run that prompt 10 times and it would never give up without producing a (probably fine) sh command.

Read the replies. Many folks have called gpt-4.1 through copilot and get (seemingly) valid responses.

Deukhoofd · 2025-06-25T07:36:57 1750837017

What is becoming more obvious is that people on Hacker News apparently do not understand the concept of non-determinism. Acting as if the output of an LLM is deterministic, and that it returns the same result for the same prompt every time is foolish.

12345hn6789 · 2025-06-25T14:17:27 1750861047

Run the prompt 100 times. I'll wait. I'll estimate you won't get a shell command 1-2% of the time. Please post snark on reddit. This site is for technical discussion.

cedws · 2025-06-24T23:53:35 1750809215

Microsoft has really screwed up on branding yet again. Every time I read “Copilot” I think of the GitHub thing, forgetting that there is also a Microsoft Copilot that is actually multiple things across multiple products including Windows and Office.

It’s also a website like ChatGPT apparently? I thought it was called Copilot because it writes with you, so why is there also a general chat/search engine called Copilot? Jesus.

thedougd · 2025-06-25T00:07:48 1750810068

I think you may be confusing Microsoft Copilot with Microsoft365 Copilot? The first doesn’t give you access to Microsoft Copilot Studio but that might also be available with Microsoft Copilot Pro.

cubefox · 2025-06-25T00:25:57 1750811157

I believe you might be confusing Copilot Search with Copilot+? Which is of course different from Copilot Pro though not necessarily entirely distinct.

cedws · 2025-06-25T00:16:14 1750810574

My confusion has only increased.

jug · 2025-06-25T00:26:17 1750811177

They have even renamed Office to Microsoft 365 Copilot. Yes. Microsoft Office.

https://www.windowslatest.com/2025/01/18/microsoft-just-rena...

mrweasel · 2025-06-25T12:55:36 1750856136

Yeah that's not going to work.

jerf · 2025-06-25T13:39:14 1750858754

Someone within Microsoft really needs to forward this entire thread over to their marketing department.

worldsayshi · 2025-06-25T00:27:54 1750811274

And Sam Altman thought that they were bad at naming things. Good thing they were bought up by the king of naming things. /s

Edit: They are doubling down on bad naming conventions so hard that it makes me think it's some kind of dark pattern sales strategy..

canyp · 2025-06-25T01:07:56 1750813676

Classic HN psychology would say: Occam's razor would suggest mediocrity before an elaborate dark pattern scheme.

And I would agree with them in this case.

worldsayshi · 2025-06-25T07:39:44 1750837184

I agree with you but i also think mediocrity and semi accidental dark patterns can go hand in hand. In a "if it sells don't fix it" kind of way.

conradfr · 2025-06-25T08:37:34 1750840654

You just made me realize Copilot does not always refer to (Microsoft's) Github Copilot ... apparently.

jollyllama · 2025-06-25T17:03:30 1750871010

I call the GitHub Copilot "Copilot" and the Microsoft Copilot "Bing/Copilot." I address it as that in my prompts. It works pretty well for me.

jay_kyburz · 2025-06-25T01:33:33 1750815213

People are responding with..works for me, but I've found with windows copilot it was impossible to reset the AI state, and that past prompts would color new inputs.

The new chat, or new conversation buttons seem to do nothing.

esskay · 2025-06-24T22:53:24 1750805604

This is pretty interesting, I had a very similar experience with Github Copilot's plugin inside Jetbrains IDE today (set to use 4o). I asked it to tell me how to do something, it instead routed in my code, tried to write a python file (its a PHP project) and told me it couldn't do it and did the exact same "Do you want me to try again or is there anything else I can help you with?"

Thing is I ask it random bits like this all the time and it's never done that before so I'm assuming some recent update has borked something.

robots0only · 2025-06-24T21:00:29 1750798829

ohh wow, that's bad, just tried this with Gemini 2.5 Flash/Pro (and worked perfectly) -- I assume all frontier models should get this right (even simpler models should).

quantadev · 2025-06-24T21:07:18 1750799238

I'd be willing to bet a more clear prompt would've given a good answer. People generally tend to overlook the fact that AIs aren't like "google". They're not really doing pure "word search" similar to Google. They expect a sensible sentence structure in order to work their best.

roywiggins · 2025-06-24T21:12:31 1750799551

Maybe, but this sort of prompt structure doesn't bamboozle the better models at all. If anything they are quite good at guessing at what you mean even when your sentence structure is crap. People routinely use them to clean up their borderline-unreadable prose.

quantadev · 2025-06-24T23:00:26 1750806026

I wish I had a nickle for every time I've seen someone get a garbage response from a garbage prompt and then blame the LLM.

macNchz · 2025-06-24T21:41:24 1750801284

I'm all about clear prompting, but even using the verbatim prompt from the OP "ffmpeg command to convert movie.mov into a reasonably sized mp4", the smallest current models from Google and OpenAI (gemini-2.5-flash-lite and gpt-4.1-nano) both produced me a working output with explanations for what each CLI arg does.

Hell, the Q4 quantized Mistral Small 3.1 model that runs on my 16GB desktop GPU did perfectly as well. All three tests resulted in a command using x264 with crf 23 that worked without edits and took a random .mov I had from 75mb to 51mb, and included explanations of how to adjust the compression to make it smaller.

quantadev · 2025-06-24T23:04:13 1750806253

There's as much variability in LLM AI as there is in human intelligence. What I'm saying is that I bet if that guy wrote a better prompt his "failing LLM" is much more likely to stop failing, unless it's just completely incompetent.

What I always find hilarious too is when the AI Skeptics try to parlay these kinds of "failures" into evidence LLMs cannot reason. If course they can reason.

Kiro · 2025-06-25T14:09:50 1750860590

I get better result when I intentionally omit parts and give them more playroom to figure it out.

quantadev · 2025-06-25T14:54:38 1750863278

Less clarity in a prompt _never_ results in better outputs. If the LLM has to "figure out" what your prompt likely even means its already wasted a lot of computations going down trillions of irrelevant neural branches that could've been spent solving the actual problem.

Sure you can get creative interesting results from something like "dog park game run fun time", which is totally unclear, but if you're actually solving an actual problem that has an actual optimal answer, then clarity is _always_ better. The more info you supply about what you're doing, how, and even why, the better results you'll get.

Kiro · 2025-06-25T15:06:47 1750864007

I disagree. Less clarity gives them more freedom to choose and utilize the practices they are better trained on instead of being artificially restricted to something that might not be a necessary limit.

quantadev · 2025-06-25T16:06:49 1750867609

The more info you give the AI the more likely it is to utilize the practices it was trained on as applied to _your_ situation, as opposed to random stereotypical situations that don't apply.

LLMs are like humans in this regard. You never get a human to follow instructions better by omitting parts of the instructions. Even if you're just wanting the LLM to be creative and explore random ideas, you're _still_ better off to _tell_ it that. lol.

Kiro · 2025-06-25T18:02:20 1750874540

Not true and the trick for you to get better results is to let go of this incorrect assumption you have. If a human is an expert in JavaScript and you tell them to use Rust for a task that can be done in JavaScript, the results will be worse than if you just let them use what they know.

quantadev · 2025-06-25T18:25:25 1750875925

The only way that analogy remotely maps onto reality in the world of LLMs would be in a `Mixture of Experts` system where small LLMs have been trained on a specific area like math or chemistry, and a sort of 'Router pre-Inference' is done to select which model to send to, so that if there was a bug in a MoE system and it routed to the wrong 'Expert' then quality would reduce.

However _even_ in a MoE system you _still_ always get better outputs when your prompting is clear with as much relevant detail as you have. They never do better because of being unconstrained as you mistakenly believe.

pid-1 · 2025-06-25T03:51:02 1750823462

I think the biggest issue is M365 Copilot was sold as something that would integrate with business data (teams, files, mail, etc.) and that never worked out quite well.

So you end up with a worse ChatGPT that also doesn't have work context.

mvATM99 · 2025-06-25T05:44:12 1750830252

When you do have that work context MS copilot performs quite well. But outside of that usecase it's easy to see their model is pretty bad.

aydyn · 2025-06-25T09:44:53 1750844693

It absolutely does not perform well with work context.

wkat4242 · 2025-06-25T02:52:12 1750819932

Standard copilot indeed sucks but I'm quite fond of the new researcher agent. It spends much more time than any of the others I've tried, like Perplexity pro and openai.

From a one line question it made me a relevant document of 45 pages examining the issue from all different sides, many of which I hadn't even thought of. It spent 30 mins working. I've never seen Perplexity spend more than 5.

I won't't be surprised if they will significantly nerf it to save on computing costs. I think now they give it their all to build a customer base and then they nerf it.

quantadev · 2025-06-24T21:04:57 1750799097

I love Copilot in VSCode. I always select model "Claude Sonnet 3.7", when in Copilot since it lets me choose the LLM. What I love about Copilot is the tight integration with VSCode. I can just ask it to do something and it relies on the intelligence of Claude to get the right code generated, and then all Copilot is really doing is editing my code for me, reading whatever code Claude tells it to, to build context, etc.

KoolKat23 · 2025-06-24T21:10:04 1750799404

That's a different, more useable copilot.

quantadev · 2025-06-24T22:53:45 1750805625

That's why I said "in VSCode" because I have no idea what this guy is running, but it's almost a certainty the problem isn't copilot but it's a bad LLM and/or his bad prompt.

kemotep · 2025-06-24T23:05:02 1750806302

The Copilot integrated with Microsoft 365 doesn’t have a model switcher it just is what it is. You are talking about a completely different product that Microsoft calls the same names.

quantadev · 2025-06-25T00:30:31 1750811431

I'll say it for the third time: "in VSCode". There's no ambiguity about which Copilot that is.

willhslade · 2025-06-25T00:45:46 1750812346

VSCode Copilot or Copilot Chat?

quantadev · 2025-06-25T02:53:51 1750820031

imo, any VSCode user needs both extensions: "GitHub Copilot" for inline completions, and "GitHub Copilot Chat" for interactive, multi-turn coding chat/agent.

KoolKat23 · 2025-06-25T07:55:01 1750838101

This discussion isnt about that all, completely unrelated. It's talking about a chatbot.

quantadev · 2025-06-25T13:56:21 1750859781

And "GitHub Copilot Chat" VSCode extension is _also_ a chatbot.

KoolKat23 · 2025-06-25T14:52:02 1750863122

But it's not Microsoft 365 copilot

quantadev · 2025-06-25T16:32:58 1750869178

--OR-- I mentioned a Copilot Product in a conversation about Copilot Products.

MangoCoffee · 2025-06-24T21:26:38 1750800398

Claude Sonnet 3.7 is my default as well in Visual Studio. i have been playing with their new default GPT-4.1. its not bad.

quantadev · 2025-06-25T00:33:18 1750811598

I haven't tried GPT-4.1 yet in VSCode Copilot. I was using 'Claude Sonnet 4' until it was struggling on something yesterday which 3.7 seemed to easily do. So I reverted back to 3.7. I'm not so sure Sonnet 4 was a step forward in coding. It might be a step back.

mrangle · 2025-06-25T12:55:40 1750856140

I don't have experience with CoPilot, but I do with other LLMs. I'm not sure that omitting "provide me with" is enough to get the job done, generally, aside from being lucky that it correctly interprets the prompt. In my experience, other LLMs are just as prone to incorrect divination of what one means given telegraphic prompts.

rzzzt · 2025-06-24T22:51:09 1750805469

Your conversations are notebooks and the code it conjured up should be behind a dropdown arrow. For visualization it seems to work fine (i.e. Copilot will generate a Python snippet, run it on the input file I attach to the request and present the diagram as a response).

dakiol · 2025-06-25T07:11:45 1750835505

In my experience Microsoft Copilot (free version in Deep Think mode) is way better than ChatGPT (free version) in most of things I throw at them (improving text, generating code, etc).

moelf · 2025-06-24T21:31:27 1750800687

even gemma3:12b gets it correct:

~> ollama run gemma3:12b-it-qat >>> ffmpeg command to convert movie.mov into a reasonably sized mp4

Here's a good ffmpeg command to convert `movie.mov` to a reasonably sized MP4, along with explanations to help you adjust it:

```bash ffmpeg -i movie.mov -c:v libx264 -crf 23 -preset medium -c:a aac -b:a 128k movie.mp4 ```

*Explanation of the command and the options:*

magicalhippo · 2025-06-24T22:22:02 1750803722

Even the 1B variant gave me that one, along with good explanations of the various options and what to play with to tweak the result.

cmrx64 · 2025-06-24T22:32:35 1750804355

i’m pretty surprised 1B parameters is enough for it to still fluently remember ffmpeg-fu

magicalhippo · 2025-06-24T23:27:43 1750807663

Yeah the newer small models continue to surprise me as well. I uploaded the full output from gemma3:1b-it-q8_0 here[1].

[1]: https://rentry.co/yu36i4d3

IronyMan100 · 2025-06-25T08:02:36 1750838556

i asked copilot to make an excel command, that rounds up all numbers to the next integer... took 4 back and forth messages and 15 minutes until it was working... Google was 5 minutes

whirlwin · 2025-06-24T20:44:26 1750797866

So much for the quality testing done by Microsoft...

hluska · 2025-06-25T17:34:48 1750872888

First off, that’s a really bad prompt - LLMs don’t have this magic ability to read your mind. Second, despite how bad it is, Copilot just gave me the correct code.

FergusArgyll · 2025-06-24T21:06:28 1750799188

People think that llms are an excuse to be lazy. You have to put some effort into the prompt!

KoolKat23 · 2025-06-24T21:08:41 1750799321

Ironically copilot is lazy, have to prompt it to death, whereas the others are aligned and actually provide answers with the same prompt.

(Perhaps copilot is not lazy, just stupid relative to its peers.)

roywiggins · 2025-06-24T21:09:55 1750799395

No, you don't. ChatGPT easily answers this question posed verbatim.