I mostly like to evaluate them whenever I ask a remote model (Calude 3.7, ChatGP...

rubymamis 14 days ago | parent | context | favorite | on: Skywork-OR1: new SOTA 32B thinking model with open...

I mostly like to evaluate them whenever I ask a remote model (Calude 3.7, ChatGPT 4.5), to see how far they have progressed. From my tests qwen 2.5 coder 32b is still the best local model for coding tasks. I've also tried Phi 4, nemotron, mistral-small, and qwq 32b. I'm using a MacBook Pro M4 46GB RAM.