There's a lot more between the actions there image to display, mouse to computer that have nothing to do with internal meatbag reactions. Especially when we're already measuring in milliseconds.
I'm referring to the paper cited in the linked site, which quotes about 50ms of difference in reaction time between auditory and visual, not 200ms like stated in the comment above.