Instruction alignment is very important for performance. I remember a similar slow down when working on a VM for Itanium. The architecture manuals for processors usually describe this in detail.
The Itanium was a somewhat special case. It was very difficult to optimise for, which is why it performed so poorly in practice. In general x86 is far less sensitive to alignment than other architectures, and has been becoming more so with each new generation.