Exactly, And that's what happens in LayerNorm too. So if figured the best base for comparison would have been to leave that bit out when looking at their difference or similarity, because obviously the bits that have the same implementation will be the same.