I made exactly this point at the inaugural Portland AI tinkerers meetup. I had been messing with large document understanding. Converting PDF to text and then sending to gpt was too expensive. It was cheaper to just upload the image and ask it questions directly. And about as accurate.
https://portland.aitinkerers.org/talks/rsvp_fGAlJQAvWUA