PyTorch와 TensorFlow 모델의 매개변수 수 계산 방법

이 글은 TensorFlow와 PyTorch 딥러닝 모델의 파라미터 개수를 계산하는 방법을 간단히 설명하고, 따라 할 수 있는 예제를 제공합니다. 이 글은 AI 번역본입니다. 오역이 의심되는 부분이 있으면 댓글로 알려주세요.
Saurav Maheshkar
Created on September 15|Last edited on September 15
Comment
우리는 대형 모델에 쉽게 접근할 수 있는 시대에 살고 있습니다. 누구나 사전 학습된 모델로 Kaggle Kernel을 만들 수 있으며 DeBERTa v3 모델 그리고 임의의 데이터셋에 파인튜닝할 수 있습니다. 많은 사람이 모르는 사실은, 자신이 75–100M 파라미터를 가진 모델을 사용하고 있으며, 이 모델이 100 GB가 넘는 학습 데이터로 사전 학습되었다는 점입니다. 
물론 과도한 파라미터화가 성능 향상으로 이어질 수 있지만, 그만큼 저장 용량이 커지고 결과적으로 추론 시간이 길어집니다. 따라서 모델의 파라미터 개수를 기록해 두는 것이 좋습니다.
파라미터 수가 10배 또는 20배 적으면서도 성능이 거의 동일한 모델을 보는 것은 흥미롭지 않을까요? 모델 파라미터 대비 성능 그래프를 그리든, 단순한 벤치마킹을 하든, 이는 반드시 알아야 할 핵심 사실입니다.
몇 가지 예시를 통해 모델의 파라미터 개수를 어떻게 계산하는지 차근차근 살펴보겠습니다. PyTorch 그리고 TensorFlow 모델입니다.
목차코드PyTorchTensorFlow요약추천 읽을거리
﻿
﻿
DeepMind's Flamingo: Visual & Language Communication Combined
DeepMind recently released a combined visual and language model (a VLM) called Flamingo, capable of a variety of tasks taking text and image input simultaneously.
Meta AI Releases OPT-175B, Set Of Free-To-Use Pretrained Language Models
Meta AI announced a blog post today that they have released a new set of language models under the name "Open Pretrained Transformer". These models aim to replicate GPT-3 while being freely available for local use and training.
﻿
코드
PyTorchPyTorch에는 현재 기준으로 모델의 파라미터 수를 세는 유틸리티 함수가 없지만, 모델 클래스의 속성을 사용해 파라미터를 가져와 계산할 수 있습니다.
다음 스니펫을 사용해 모든 모델 파라미터를 가져오세요:
total_params = sum(
	param.numel() for param in model.parameters()
)
이 스니펫을 빠르게 살펴보겠습니다.
model.parameters() PyTorch 모듈에는 다음과 같은 메서드가 있습니다 parameters() 이를 통해 다음에 대한 이터레이터를 반환합니다 모든 파라미터. 
param.numel() 반환된 Iterator 객체를 사용하여 model.parameters() 그리고 이를 사용해 그 안에 포함된 요소의 개수를 계산합니다 .numel() 함수
sum(...): 모든 파라미터 그룹을 합산합니다(하나의 Module에는 레이어로서 하위 모듈이 포함될 수 있습니다).
참고: 이 스니펫은 Module에 포함된 모든 파라미터를 반환합니다. 학습 가능한 파라미터와 학습 불가능한 파라미터가 모두 포함됩니다. 학습 가능한 파라미터만 원한다면 다음 스니펫을 사용하세요. 
💡
trainable_params = sum(
	p.numel() for p in model.parameters() if p.requires_grad
)
추가 요소 .requires_grad Tensor의 속성은 해당 텐서가 학습 가능한 파라미터인지 판단하는 데 사용됩니다. 텐서의 requires_grad가 true로 설정되어 있으면 autograd 엔진이 이 텐서를 수정할 수 있으므로, 즉 “학습 가능”합니다. 
TensorFlowTensorFlow에는 파라미터 수를 계산하기 위한 유틸리티 함수가 제공됩니다. 이름은 다음과 같습니다 count_params keras utils에서 사용 가능합니다 (keras.utils.layer_utils).
다음 스니펫을 사용하여 TensorFlow 모델의 학습 가능한 파라미터와 학습 불가능한 파라미터를 모두 계산하세요:
from keras.utils.layer_utils import count_params
﻿
model = ...
﻿
trainable_params = sum(count_params(layer) for layer in model.trainable_weights)
non_trainable_params = sum(count_params(layer) for layer in model.non_trainable_weights)
﻿
이제 이 정보를 어디에 활용할 수 있을까요? Weights & Biases를 사용하면 파라미터 수를 다음과 같이 기록할 수 있습니다. wandb.config 파라미터 또는 a 요약 나중에 검토하고 비교할 수 있도록 W&B 실행에 기록하세요.
wandb.config.update({"Model Parameters": trainable_model_params})
######################           OR          #####################
wandb.run.summary["Model Parameters"] = trainable_model_params
요약이 글에서는 TensorFlow와 PyTorch 모델 모두에 대해 파라미터 수를 계산하는 방법을 살펴보았습니다. W&B의 모든 기능을 확인하려면 다음을 참조하세요. 짧은 5분 가이드수학적 내용과 “처음부터 직접 구현”한 코드 예제를 더 다룬 리포트를 원하신다면, 아래 댓글이나 저희의에서 알려주세요. 포럼 ✨!
다음 주제에 대한 다른 리포트도 확인해 보세요: 완전 연결 GPU 활용률, 모델 저장과 같은 다른 기본 개발 주제도 함께 다룹니다.
추천 읽을거리
Setting Up TensorFlow And PyTorch Using GPU On Docker
A short tutorial on setting up TensorFlow and PyTorch deep learning models on GPUs using Docker.
How to Compare Keras Optimizers in Tensorflow for Deep Learning
A short tutorial outlining how to compare Keras optimizers for your deep learning pipelines in Tensorflow, with a Colab to help you follow along.
Preventing The CUDA Out Of Memory Error In PyTorch
A short tutorial on how you can avoid the "RuntimeError: CUDA out of memory" error while using the PyTorch framework.
How to Initialize Weights in PyTorch
A short tutorial on how you can initialize weights in PyTorch with code and interactive visualizations.
Recurrent Neural Network Regularization With Keras
A short tutorial teaching how you can use regularization methods for Recurrent Neural Networks (RNNs) in Keras, with a Colab to help you follow along.
Tutorial: Regression and Classification on XGBoost
A short tutorial on how you can use XGBoost with code and interactive visualizations.
﻿
﻿
 이 글은 AI로 번역되었습니다. 번역 오류가 의심되면 댓글로 알려주세요. 원문 리포트는 아래 링크에서 확인할 수 있습니다: 원문 리포트 보기﻿
﻿
Add a comment