Llama-3.3-70B-Instruct
128K context, multilingual, enhanced tool calling, outperforms Llama 3.1 70B and comparable to Llama 405B 🔥
Comparable performance to 405B with 6x LESSER parameters
Improvements (3.3 70B vs 405B):
GPQA Diamond (CoT): 50.5% vs 49.0%
Math (CoT): 77.0% vs 73.8%
Steerability (IFEval): 92.1% vs 88.6%
Improvements (3.3 70B vs 3.1 70B):
Code Generation:
HumanEval: 80.5% → 88.4% (+7.9%)
MBPP EvalPlus: 86.0% → 87.6% (+1.6%)
Steerability:
IFEval: 87.5% → 92.1% (+4.6%)
Reasoning & Math:
GPQA Diamond (CoT): 48.0% → 50.5% (+2.5%)
MATH (CoT): 68.0% → 77.0% (+9%)
Multilingual Capabilities:
MGSM: 86.9% → 91.1% (+4.2%)
MMLU Pro:
MMLU Pro (CoT): 66.4% → 68.9% (+2.5%)
https://www.reddit.com/r/LocalLLaMA/comments/1h85ld5/llama3370binstruct_hugging_face/