To add on top of the tutorial, it's recommended to compile with AVX, SSE and FMA instructions enabled if you are using a modern Intel chipset. It has a pretty big boost for calculations that needs to be done on the CPU.
The pip version of TF does not come with AVX and FMA for some reason, so this is one of perks from compiling from source
I've been playing around with a couple DL frameworks recently, so was wondering, what is the performance tradeoff between what you mention and PyTorch? Is it significantly different? Because I enjoy the pyhonic "style" of PyTorch way more than the graph creation/precomputation method of TensorFlow.
It looks like they didn’t register python37.com though, which is probably a big deal as Python 3.7 is in beta.