There's a picture which Google's Vision API returned Goo\u011fle I was wondering...

goshx · on Dec 19, 2017

IMHO, only Google's API got that correct.

The test they did is similar to providing:

MICROSOET and expecting the API's to find MICROSOFT.

nothrabannosir · on Dec 19, 2017

If I put in a “typo” (read: ambiguous letter), I want the meaning that was originally intended. Technically correct doesn’t but me anything; we can play that game all day but I just don’t have much use for a document full of errors. In a way: I want the AI to do what a human would do.

This is like speech recognition using context to fill in the gaps. Without this it would be unusable.

goshx · on Dec 19, 2017

Then it will need to be informed what is the valid character set and language.

nothrabannosir · on Dec 19, 2017

Yes. By its own training data, and context from the image. E.g. from this very thread: “Google” is going to be Google :)

spunker540 · on Dec 19, 2017

Pretty funny though that google couldn’t recognize its own name as well as the other engine!

mohi13 · on Dec 19, 2017

seems these API might work better if they took expected language as input :)

boulos · on Dec 19, 2017

They do! For example, Vision takes languageHints [1]. With the Speech API, I took an hour long tour of Rouffignac (in French) and translated it into English. I considered adding some SpeechContext [2] for things like mammoth and so on, but actually it did a fine enough job as is (besides, I had to listen to it later, as there was obviously no coverage in the cave).

Disclosure: I work on Google Cloud (but not these APIs).

[1] https://cloud.google.com/vision/docs/reference/rest/v1/image...

[2] https://cloud.google.com/speech/reference/rest/v1/Recognitio...

mattnewton · on Dec 19, 2017

That, or had some cannonicalization algorithm/lookup table on top of the model. Should be able to attempt to map them to roman characters if wanted.