Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There's a picture which Google's Vision API returned Goo\u011fle

I was wondering why there's a unicode char in the middle:

    In [5]: c = '\u011f'
    
    In [6]: c
    Out[6]: 'ğ'
This does closely resemble the characters in the picture :-)


IMHO, only Google's API got that correct.

The test they did is similar to providing:

MICROSOET and expecting the API's to find MICROSOFT.


If I put in a “typo” (read: ambiguous letter), I want the meaning that was originally intended. Technically correct doesn’t but me anything; we can play that game all day but I just don’t have much use for a document full of errors. In a way: I want the AI to do what a human would do.

This is like speech recognition using context to fill in the gaps. Without this it would be unusable.


Then it will need to be informed what is the valid character set and language.


Yes. By its own training data, and context from the image. E.g. from this very thread: “Google” is going to be Google :)


Pretty funny though that google couldn’t recognize its own name as well as the other engine!


seems these API might work better if they took expected language as input :)


They do! For example, Vision takes languageHints [1]. With the Speech API, I took an hour long tour of Rouffignac (in French) and translated it into English. I considered adding some SpeechContext [2] for things like mammoth and so on, but actually it did a fine enough job as is (besides, I had to listen to it later, as there was obviously no coverage in the cave).

Disclosure: I work on Google Cloud (but not these APIs).

[1] https://cloud.google.com/vision/docs/reference/rest/v1/image...

[2] https://cloud.google.com/speech/reference/rest/v1/Recognitio...


That, or had some cannonicalization algorithm/lookup table on top of the model. Should be able to attempt to map them to roman characters if wanted.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: