Otherwise, if you want to reuse an existing LLM (or just see how a large one would be implemented in practice) you can check out the models from KerasNLP. For instance, this is BERT, basically just a stack of TransformerEncoders. https://github.com/keras-team/keras-nlp/blob/master/keras_nl...
Stupid question - can this also be used for composing transformer based LLMs?
[0]. https://keras.io/getting_started/intro_to_keras_for_engineer...