PyTorch에서 8비트 옵티마이저를 사용하는 방법

이 짧은 튜토리얼에서는 PyTorch에서 8-bit 옵티마이저를 사용하는 방법을 배웁니다. 직접 시도해 볼 수 있도록 코드와 인터랙티브 시각화를 제공합니다. 이 글은 AI 번역본입니다. 오역이 있을 수 있으니 댓글로 알려 주세요.
Saurav Maheshkar
Created on September 15|Last edited on September 15
Comment
﻿
﻿
이 글에서는 8-bit 옵티마이저를 사용하는 방법을 살펴봅니다 PyTorch 메모리 효율적인 학습 루프를 작성하기 위해.
Facebook Research에서 훌륭한 레포지터리를 공개했습니다 (facebookresearch/bitsandbytes) 8-bit 옵티마이저와 양자화 루틴을 제공하며, 손쉽게 대체할 수 있습니다 pytorch.optim 메모리 효율�� 위한 옵티마이저
자세한 내용은 를 참조하세요 공식 리포지터리.﻿
💡
목차코드를 보여주세요요약추천 읽을거리
﻿
코드를 보여주세요PyTorch로 딥러닝 모델, 특히 대규모 언어 모델(LLM)을 한동안 학습해 보았다면, 아마 다음과 같은 CUDA 오류를 겪어 보셨을 것입니다:
RuntimeError: CUDA out of memory. Tried to allocate .. MiB (.. GiB total capacity; ... GiB already allocated; 
... MiB free; ... GiB reserved in total by PyTorch)
훈련 중 메모리 부족을 방지하는 한 가지 방법은 메모리 사용량이 작은 옵티마이저를 사용하는 것입니다. 기본적으로 PyTorch는 옵티마이저를 생성하고 그래디언트 업데이트를 수행할 때 32비트를 사용합니다. 하지만 bitsandbytes의 옵티마이저를 사용하면 PyTorch 옵티마이저를 8-bit 옵티마이저로 바꿔서 메모리 사용량을 줄일 수 있습니다.
올바른 CUDA 변형을 사용한 설치 방법은 다음을 참조하세요 공식 리포지터리.﻿
﻿
훈련 루프에 어떤 변경이 필요한지 살펴봅시다:
import bitsandbytes as bnb
﻿
model = ...
optimizer = bnb.optim.Adam8bit(model.parameters(), lr=0.001) # instead of torch.optim.Adam
﻿
for epoch in range(...):
    for i, sample in enumerate(dataloader):
        inputs, labels = sample
        optimizer.zero_grad()
﻿
	# Forward Pass
        outputs = model(inputs)
        # Compute Loss and Perform Back-propagation
	loss = loss_fn(outputs, labels)
        loss.backward()
	# Update Optimizer
        optimizer.step()
네, 정말 이게 전부입니다! 파이썬의 다른 패키��에서 옵티마이저만 바꿔 끼우면 끝. 생각보다 아주 간단하죠!
요약이 글에서는 PyTorch에서 8-bit 옵티마이저를 활용해 메모리 효율적인 훈련 루프를 작성하는 방법을 살펴보았습니다.
W&B의 전체 기능을 확인하려면, 이 짧은 내용을 참고하세요 5분 가이드수학적 내용과 처음부터 구현한 코드까지 다루는 더 많은 리포트를 원하신다면, 아래 댓글이나 저희의 곳에서 알려주세요 포럼 ✨!﻿
다음 주제에 관한 다른 리포트도 확인해 보세요 완전 연결 GPU 활용도와 모델 저장과 같은 다른 핵심 개발 주제도 다룹니다.
추천 읽을거리
Preventing The CUDA Out Of Memory Error In PyTorch
A short tutorial on how you can avoid the "RuntimeError: CUDA out of memory" error while using the PyTorch framework.
How To Use GradScaler in PyTorch
In this article, we explore how to implement automatic gradient scaling (GradScaler) in a short tutorial complete with code and interactive visualizations.
How To Use Autocast in PyTorch
In this article, we learn how to implement Tensor Autocasting in a short tutorial, complete with code and interactive visualizations, so you can try it yourself. 
How to Set Random Seeds in PyTorch and Tensorflow
Learn how to set the random seed for everything in PyTorch and Tensorflow in this short tutorial, which comes complete with code and interactive visualizations.
How To Calculate Number of Model Parameters for PyTorch and TensorFlow Models
This article provides a short tutorial on calculating the number of parameters for TensorFlow and PyTorch deep learning models, with examples for you to follow.
How To Implement Gradient Accumulation in PyTorch
In this article, we learn how to implement gradient accumulation in PyTorch in a short tutorial complete with code and interactive visualizations so you can try for yourself. 
﻿
﻿
 이 글은 AI가 번역한 기사입니다. 번역 오류가 있을 수 있으니 댓글로 자유롭게 알려주세요. 원문 리포트는 다음 링크에서 확인할 수 있습니다: 원문 리포트 보기﻿
﻿
Add a comment