Sonnet-3.7 is best non-thinking model in the Misguided Attention eval.
Misguided Attention is a collection of prompts to challenge the reasoning abilities of large language models in presence of misguiding information. It consists of slightly modified well known logical problems and riddles. Many model are overfit to these problems and will therefore report a response to the unmodified problem.
Claude-3.7-Sonnet was evaluated in the non-thinking mode in the long eval with 52 prompt. It almost beats o3-mini despite not using the thinking mode. This is a very impressive result.