Resolving that issue, would help reduce (not eliminate) the size of the context. The model will still only just barely fit in 16 GB, which is what the parent comment asked.
Best to have two or more low-end, 16GB GPUs for a total of 32GB VRAM to run most of the better local models.
Best to have two or more low-end, 16GB GPUs for a total of 32GB VRAM to run most of the better local models.