[Error] RuntimeError: Failed to import transformers.integrations.bitsandbytes because of the following error (look up to see its traceback): CUDA Setup failed despite GPU being available. Please run the following command to get more information:

LLM을 finetuning할 경우, 보통은 GPU가 있는 서버에서 돌리는 것이 현실적이다. llama 2의 경우도 가장 작은 경우가 70억 parameter를 가지므로, GPU가 있는 서버에서 진행하게 되고, GPU 메모리 사용을 줄이고 inference 속도를 높이기 위해 quantized 된 모델을 사용하게 된다.

리눅스에서 finetuning한 모델을 windows에 옮겨서 inference를 진행하려 했을 때 다음과 같은 Error를 마주했다.

[Error] RuntimeError: Failed to import transformers.integrations.bitsandbytes because of the following error (look up to see its traceback): CUDA Setup failed despite GPU being available. Please run the following command to get more information:

GPU가 감지됨에도 불구하고, Cuda setup에 실패했다고 나온다. 아래 코드와 같이 finetuning을 진행하면서 4bit로 quantized를 시켰기 때문에 bitsandbytes library를 사용했고, 이 bitsandbytes의 경우 현재 시점에서 windows에 사용할 시 위와 같은 error를 불러오게 된다.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel, PeftConfig

model_id = "meta-llama/Llama-2-7b-hf"
peft_model_id = "./<your_finetune_model>/checkpoint-4329"   # Data Augmentation 적용 X

config = PeftConfig.from_pretrained(peft_model_id)

bnb_config = BitsAndBytesConfig(
    load_in_8bit = False,
    load_in_4bit = True,
    llm_int8_threshold = 6.0,
    llm_int8_skip_modules = None,
    llm_int8_enalbe_fp32_cpu_offload = False,
    llm_int8_has_fp16_weight = False,
    bnb_4bit_quant_type = "nf4",
    bnb_4bit_use_double_quant= False,
    bnb_4bit_compute_dtype="float16",
)

model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map={"":0})
model = PeftModel.from_pretrained(model, peft_model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)

model.eval()

해결책은 의외로 심플하다. 아래의 코드를 실행해 보자. 이는 bitsandbytes windows 사용을 위해 베포된 버전이다.

pip uninstall bitsandbytes
pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.41.0-py3-none-win_amd64.whl

Technical한 자세한 부분은 아래의 링크를 참고해 보자.

https://github.com/TimDettmers/bitsandbytes/issues/175

CUDA Setup failed despite GPU being available. Inspect the CUDA SETUP outputs above to fix your environment! · Issue #175 · Ti

C:\ProgramData\Anaconda3\envs\novelai\lib\site-packages\bitsandbytes\cuda_setup\main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {Wind...

github.com

'ML & DL' 카테고리의 다른 글

Object Detection result metric AP(Average Precision) (1)	2023.03.10
[Label Studio] Labeling progrom for YOLO (0)	2023.03.07
[Error 확인]Skimage, Canceled future for execute_request message before replies were done (0)	2023.02.28

Zero to Nine's Blog

[Error] RuntimeError: Failed to import transformers.integrations.bitsandbytes because of the following error (look up to see its traceback): CUDA Setup failed despite GPU being available. Please run the following command to get more information:

'ML & DL' 카테고리의 다른 글

티스토리툴바

[Error] RuntimeError: Failed to import transformers.integrations.bitsandbytes because of the following error (look up to see its traceback): CUDA Setup failed despite GPU being available. Please run the following command to get more information:

'ML & DL' 카테고리의 다른 글

'ML & DL' Related Articles

티스토리툴바