-march=native + mimalloc (or jemalloc) should be sufficient without causing significant undefined behavior like -O3 or most extra optimization related compiler arguments.
Nope, I'm not sure about it. I remember when I was using Gentoo about 10 years ago, this was the common reason given for using -O2 instead of -O3 in your build flags, and I'm just speaking from that memory.