Re:trust. It just works using Sonnet 3.5. It's gained my trust. I do read it after (again, I'm more a code reviewer role). People make mistakes too, and I think it's error rate for repeititve tasks is below most people's. I also learned how to prompt it. I'd tell it to just add formatting without changing content in the first pass. Then in a separate pass ask it to fix spelling/grammar issues. The diffs are easy to read.
Re:Pandoc. Sure, if that's the only task I used it for. But I used it for 10 different ones per day (write a JSON schema for this json file, write a Pydantic validator that does X, write a GitHub workflow doing Y, add syntax highlighting to this JSON, etc). Re:this specific case - I prefer real HTML using my preferred tools (DaisyUI+tailwind) so I can edit it after. I find myself using a lot less boilerplate-saving libraries, and knowing a few tools more deeply.
Why are you comparing its error rate for repetitive tasks with most people? For such mechanical tasks we already have fully deterministic algorithms to do it, and the error rate of these traditional algorithms is zero. You aren't usually asking a junior assistant to manually do such conversion, so it doesn't make sense to compare its error rate with humans.
Normalizing this kind of computer errors when there should be none makes the world a worse place, bit by bit. The kind of productivity increase you get from here does not seem worthwhile.
Re:Pandoc. Sure, if that's the only task I used it for. But I used it for 10 different ones per day (write a JSON schema for this json file, write a Pydantic validator that does X, write a GitHub workflow doing Y, add syntax highlighting to this JSON, etc). Re:this specific case - I prefer real HTML using my preferred tools (DaisyUI+tailwind) so I can edit it after. I find myself using a lot less boilerplate-saving libraries, and knowing a few tools more deeply.