I once tried a few examples of eye-tracking as an input device at a visit to Tobii some years ago. One of these was a eye-tracking shooting game. It felt completely wrong and unnatural.
My eyes are my input devices. If I have to constrain my gaze and look only at the object to control then I would have to strain myself to perceive the blurry objects outside the centre of my vision just to be able to see those objects at all.
IMHO, eye tracking should only be applied to make the experience feel morenatural, not the opposite. For example: in combination with gesture controls to find the object that the gesture is intended for. Or in VR headsets to render at highest resolution only where the user is watching, to keep the FPS rate up.
I used to work in eeg and eye tracking software (for neuro-motor rehabilitation and control of robotic arms). Using gaze for fine control is very difficult and terribly unnatural due to saccades and general scene scanning. One easy way to demonstrate how difficult it is is to have a cursor appear at the point of gaze. You're vision "chases" the cursor away from your target like it does a floater. Controlling anything directly with gaze is very very hard.
> One easy way to demonstrate how difficult it is is to have a cursor appear at the point of gaze. You're vision "chases" the cursor away from your target like it does a floater.
That sounds like a slight miscalibration of the eye tracker. Assuming perfect eye control, if the tracker estimates your gaze slightly incorrectly then the cursor will always move away when you try to look at it.
No such thing. The eye does constant micro movements and the brain filters this out. You can compensate for this in software but it lowers the target accuracy.
- setting the focus plane of the virtual cameras, this way when you look at a close virtual object the far plane blurs as it would naturally. This is of course dependent on the brightness of the screen and picture. But any blurring will already feel better than no blurring at all
- interactive stories can make use of knowing where you looked at and for how long. This must not necessarily be very obvious
- virtual NPCs can react to you gazing at or past them
Tobii's UI stuff is lame. But it is possible to make good UI for eye tracking. It requires rethinking how UI should work.
What everyone has been doing so far is like slapping a touchscreen on a desktop operating system. Doesn't work! You need to redesign all the interactions for touch, like the iPhone did. Change how scrolling works, how selection works, how menus work. Redo the foundations. Eye tracking requires the same treatment.
Touchscreens sucked before the iPhone. Eye tracking can have an iPhone moment too, but someone needs to spend years rethinking everything first.
Is that conjecture, or are there UIs you have in mind that work?
I made an eye tracking UI for my masters thesis, with a few different components. Although they all worked, having my eyes tracked to control the UI felt very intense, it caused a lot of mental and physical strain.
I'm not saying that my thesis is evidence it absolutely can't be done, but it was surprisingly uncomfortable even for what seemed like basic functionality like for example autofocusing the input boxes you are looking at. But maybe there are tweaks that can make it work, I think the main issue is the screen giving feedback on where you happen to be looking, which gives this unnatural feeling that where you look is causing side affects.
I'd be interested to see your masters thesis! I made my own rudimentary eye tracker many years ago [1] which got me a job at a startup called Eyefluence, where I made a much better eye tracker. The deep learning revolution was very new at the time, and this was possibly the first deep learning based eye tracker.
We did some really cool experimentation with eye controlled UI, in some ways more advanced than anything I've seen since. We were acquired by Google and I believe the technology is now shelved. But I still think that there is a path to a good eye controlled UI. Comfort is a matter of knowing the constraints of the eye and designing to them.
This is really cool, the hardware for this is far in advance of what I was using. I was using the classic webcam type setup, with the 1/4 inch accuracy you elluded to in your post.
My thesis focused more on the UI components than the hardware. I had my gaze setup to appear as a cursor input, and used some CSS to hide the cursor on a webpage. Then I used hover effects to open menus, focus inputs etc.
My question was, could it replace the mouse on a desktop? And I wanted to build something for anyone to use, not as an accessibility input. I used eye gaze with the spacebar on the keyboard as a primary input action.
The components had large targets, so the accuracy didn't matter too much. I used some basic components to build an email UI, which worked purely through gaze and the keyboard.
The UI was perfectly functional, but it really drained me/others to use it. Possible it could be due to accuracy, or some of the UI component design. My gut feeling was that the UI reacting to my eyes was the real problem though, there was a strange feeling knowing that you needed to look at certain components to use them. The way your eye works, it wants to jump around to whatever is interesting, and having a UI that needs the eye to look at a certain place isn't pleasant.
I think any UI that wants to utilize the eye would have to be very subtle, and designed not to feel restrictive to your focus. I'm not convinced that the mouse/your fingers could be replaced by eye tracking but for rendering higher res in VR goggles and that sort of thing makes a lot of sense.
Really cool solution with the hot mirror! I always wondered if we will end up with cameras behind microdisplays. Some phones are already doing this, but doing that for VR is probably years away.
The point you made about rethinking all UI interactions for eye-tracking — bull’s eye! Have you done some work along these lines in Eyefluence? I think the fruit company will gladly introduce new UI paradigm. They did some impressive stuff of improving eye-tracking accuraccy via refining synthetic images to real-looking ones with GANs. [0]
Yes, rethinking UI for eye tracking is what Eyefluence was working on. And in fact we showed Apple all of our stuff before the acquisition. I believe we were in acquisition talks with them as well as Google. I spent a lot of time on the Apple campus tweaking our neural nets to their liking. This was prior to their publishing of that eye tracking work.
The technique called "dual gaze" in the article has some similarity to some of the stuff we were doing. This was long before that paper was published, and I think there were several aspects of our design that were better than the one in that paper.
Holy shit, that looks like magic! The slide to confirm interaction utilizing smooth pursuit was really nice touch. I added the extended demo to the article.
You say the dual gaze worked similarly to your implementation, however I don't see any confirmation flags. I really can't figure out how this works. :) Is the tutorial available somewhere please?
Thanks! I should say I didn't have much to do with that specific demo; I mostly worked on reimplementing everything from the ground up for VR. I don't think there's any good video of that system, but there are some accounts in the press from journalists trying it. (Any journalist that tried it was using my VR system. We didn't let journalists try the AR version because the calibration wasn't reliable enough on new people, but the deep learning based tracking of the VR version was more reliable).
As far as exactly how it works, I probably shouldn't go around disclosing all the secret sauce. After all, Google bought it fair and square. AFAIK there's nobody left in Google VR that knows much about it, but I haven't worked there for many years so I don't know the current state of things.
Ok, I read 20+ articles describing the VR demo you worked on. Some of them explicitly state that they can’t disclose how the interaction works. One of them mentioned that extra saccade is used to “click”, but it isn’t very revealing. Some demos, e.g. 360 virtual displays, has an explicit gaze target to trigger the action, but the AR demo lacks them, so my take is that the 2min tutorial teaches person that there is an invisible, but standardized target (say upper-right corner) that is the trigger. No idea about scrolling. But damn, everybody that tried the demo was super-convinced that this is the way to go and that in 2017 there will be headsets with such tech. Here we are, 6 years later, and I am still waiting.
Google should start open sourcing their canned and dead projects.
Thank you for pointing me towards this research and for your work. :) Very cool!
Yeah I wish Google would do something with it too. The good news is headsets with good eye tracking are finally about to become generally available and I expect that within a few years people will be using it in very creative ways. Wide availability will trigger more experimentation than we could ever do as a single startup.
Did you try having your gaze put elements into easy reach of a cursor or pointer instead of using the gaze as a pointer?
Gaze is often an indicator of intent so rather than have it do something it would be more natural for it to make something else easier to do.
An example, using a laser pointer style interface for VR controllers can sometimes be awkward because small motions can move the pointer a great distance making it hard to pick out a precise target. Gaze could be used as an aim mode where motion of the pointer would be limited/mapped to a smaller area surrounding the spot being looked at.
It could also be used as a meta key to change the function of buttons, look to the right of a window and a button click fast forwards, look left and the same button rewinds.
I would also be interested in reading that thesis. I was thinking a while ago that going to grad school to work on something like that would be cool.
One thing that occurred to me is that shifting focus may work better as a gradual activation than an immediate one. In other words, focus would be an average of eye position over time, and after reaching a certain threshold on a certain element, the focus would shift.
Not sure it would work in practice, but perhaps it could ameliorate the chaotic effects you were mentioning.
Interesting. I'll have to ask my friend who covers Canon to some degree about this. Yeah, they dropped it after trying it out in one of two film camera models in the 90s. They were always a bit cagey about why although, in my experience, it didn't work all that well.
Yes, what you describe is the Golden Gaze problem (related to golden touch of the King Midas). There are several techniques mentioned in the article to mitigate the problem. Combination of gaze and hand gestures (and voice) seems like the most efficient and natural.
Foveated rendering is also interesting, but that is pretty straightforward application of gaze contingency.
Talon allows using the Tobii with other triggers so that gaze does nothing without a trigger (a keyword or noise), which then zooms the area looked at, then allows using another trigger to perform the click.
Eyes are very important in non-verbal communication so I don't think this is the full story and they are also used for output with deliberate control, but as far as something like being bad as a fine-grained cursor type thing it may hold.
My eyes are my input devices. If I have to constrain my gaze and look only at the object to control then I would have to strain myself to perceive the blurry objects outside the centre of my vision just to be able to see those objects at all.
IMHO, eye tracking should only be applied to make the experience feel more natural, not the opposite. For example: in combination with gesture controls to find the object that the gesture is intended for. Or in VR headsets to render at highest resolution only where the user is watching, to keep the FPS rate up.