The principled way of doing this is via ensemble learning, combining the predictions of multiple separately-trained models. But perhaps there are ways of improving that by including "global" training as well, where the "separate" models are allowed to interact while limiting overall training costs.