Supports data, tensor, pipeline, sequence parallelisms, activation checkpointing, distributed optimizers, fused kernels and more.
Supports data, tensor, pipeline, sequence parallelisms, activation checkpointing, distributed optimizers, fused kernels and more.