ValueError: The input size is not aligned with the quantized weight shape. This can be caused by too large tensor parallel size.

공부

4n3mone 2024. 6. 18. 17:27

GPTQ (8bit) 로 양자화한 QWEN2-72B를 vllm으로 서빙하려고 할 때 해당 에러 발생

llm = LLM(model=model_path, tokenizer=model_path, tensor_parallel_size=2, quantization='gptq')

tensor_parallel_size를 1,2,4 무엇으로 해도 똑같은 에러가 발생함

모델의 intermediate_size가 quantized group_size* tensor-parallel-size 의 배수가 되어야 한다.

QWEN2의 intermediate_size는 29568이고, 나는 GPTQ의 group_size를 128로 양자화했었다.(거의 모든 예시 코드에서 128로 설정함)

29568/128 = 231이고, 이는 1,2,4 아무것으로도 나누어지지 않는다.

group_size=64 로 다시 GPTQ 양자화를 하여 해결하였다.

chatglm LoRA 튜닝하기 (LoRA 안되는 LLM 모델 학습하는 방법) (0)	2024.07.09
[논문 잘 읽는 법]How to Read a Paper (0)	2023.01.07

현재글ValueError: The input size is not aligned with the quantized weight shape. This can be caused by too large tensor parallel size.

티스토리툴바