Was it perhaps a multi-threaded task? Because that would almost definitely crawl.
In general, unmapping expensive, much more expensive than mapping memory, because you need to do a TLB-shootdown/flush/whatever to make sure a cached version of the old mapping is not used. A read/write does a copy, so no need to mess with mappings and TLBs, hence it can scale very well.
It was multi-processing. I guess mmap(or at least munmap) also needs to send an IPI even if no other processor currently has the same VM, to avoid race conditions.
In general, unmapping expensive, much more expensive than mapping memory, because you need to do a TLB-shootdown/flush/whatever to make sure a cached version of the old mapping is not used. A read/write does a copy, so no need to mess with mappings and TLBs, hence it can scale very well.