Yes, that’s kind of a given. The model has to have all the knowledge components to solve a task, so a capable base model is needed and only thing thats being learned here is how to stitch base knowledge to plan an attack.
No amount of RL with a dumb base model would have worked for example.