The reported benefit of using the custom threading implementation over rayon was 20% according to the article. So not nothing but not the biggest win. If they were able to rejig the algorithm so they could parralelize the outer loop there's probably a bigger win to be had.