https://huggingface.co/deepseek-ai/DeepSeek-R1-0528
雖然係minor upgrade但benchmark上進步似乎都幾大。
Aider Polygot個分已經追到啱啱出嘅Claude 4
https://www.reddit.com/r/LocalLLaMA/s/vMayDPvtDB
R1 original : 56.9
R1-0528 (official API) : 70.7
Claude 4 Opus Thinking : 72.0
Gemini Pro 0526 : 76.9
o3 High : 79.6
不過都要睇啲人實測,正如我自己host嘅Qwen3 235-A22雖然講到59.6,但coding上用落我覺得遠不及Claude 4 Sonnet (no think)。