1。搵個心水model (e.g. Qwen2.5-coder-instruct 7B),然後搵quantized版本(e.g.
gguf
容許CPU+RAM+GPU溝埋行,exl2/gptq/awq
就純GPU行。gguf
既話唔好低過Q4)2。搵個OpenAI API compatible backend(e.g. ollama/text-generation-webui/tabbyAPI),下載model,再踢著佢
3。搵個Frontend(e.g. VSCode continue extension,sillytavern)然後指定佢用你個backend server