it's possible, but unlikely. The issue is your training examples are essentially a noisy representation of the general function you are trying to get it to learn. Generally any representation that fits too well will be incorporating the noise and that will distort the general function (in the case of NN it'll generally mean memorising the input data). Most function-fitting approaches are vulnerable to this.
Hm. I see. But, ultimately, overfitting is a consequence of too many parameters absorbing the noise. Perhaps one could fit smaller models and add artificial noise.
The global optimum would be taken in reference to the training data (because that's all you have to set the weights). Unless the training data represents all real world data perfectly, fully optimizing for it will pessimize the model in relation to some set of real world data.