A MLP must be compilable to some arrangement of logic gates, so you could always try a tack like initializing everything as randomly-wired/connected MLPs, and perhaps doing some pretraining, before compiling to the logic gate version and training the logic gates directly. Or take the MLP random initialization, and imitate its distributions as your logic gate distribution for initialization.