For those using Aider, it's performing extremely well in the benchmark. Scoring a 63.9% pass rate versus 51.9% with the old model weights. For reference, the 405b Llama 3.1 behemoth is scoring a 66.2% with the same edit format (whole). Can't wait to try out the 32b model!
If you want to compare how it stacks up to other models, here's a leaderboard for reference: https://aider.chat/docs/leaderboards/