Skip to main content

国产算力与Nvidia的测评

Created on March 4|Last edited on March 5

训练详情:

  • 数据集:gw_train_data_new3.jsonl (4000条包含大纲和正文的数据集)
  • epoch:10

训练参数&训练时长 & Loss & 吞吐率:


meta
["--stage","sft","--model_name_or_path","/workspace/sunjinfeng/models/qwen/Qwen-7B-Chat","--do_train","--dataset","gw_train_data_new3","--template","chatml","--finetuning_type","lora","--lora_alpha","30","--lora_rank","8","--lora_target","c_attn","--output_dir","/workspace/sunjinfeng/qwen-torch/Qwen-7B-Chat-title-haiguang","--overwrite_output_dir","--overwrite_cache","--per_device_train_batch_size","1","--gradient_accumulation_steps","4","--max_length","32000","--lr_scheduler_type","cosine","--logging_steps","1","--save_steps","800","--learning_rate","5e-5","--num_train_epochs","10","--warmup_ratio","0.05","--weight_decay","0.01","--plot_loss","--run_name","Qwen-7B-Chat-title-haiguang","--fp16"]
["--stage","sft","--model_name_or_path","/home/ma-user/work/sunjinfeng/base_llms/Qwen-7B-Chat","--do_train","--dataset","gw_train_data","--template","qwen","--finetuning_type","lora","--lora_alpha","30","--lora_rank","8","--lora_target","c_attn","--output_dir","/home/ma-user/work/sunjinfeng/LLaMA-Factory/Qwen-7B-Chat-gw1225","--overwrite_output_dir","--overwrite_cache","--per_device_train_batch_size","1","--gradient_accumulation_steps","4","--max_length","32000","--lr_scheduler_type","cosine","--logging_steps","1","--save_steps","100","--learning_rate","5e-5","--num_train_epochs","10","--warmup_ratio","0.05","--weight_decay","0.01","--plot_loss","--report_to","wandb","--fp16"]
["--stage","sft","--model_name_or_path","/workspace/share_data/base_llms/Qwen-7B-Chat","--do_train","--dataset","gw_train_data_new3","--template","qwen","--finetuning_type","lora","--lora_alpha","30","--lora_rank","8","--lora_target","c_attn","--output_dir","/workspace/sunjinfeng/github_projet/LLaMA-Factory/Qwen-a100-test-chat","--overwrite_output_dir","--overwrite_cache","--per_device_train_batch_size","1","--gradient_accumulation_steps","4","--max_length","32000","--lr_scheduler_type","cosine","--logging_steps","1","--save_steps","100","--learning_rate","5e-5","--num_train_epochs","10","--warmup_ratio","0.05","--weight_decay","0.01","--plot_loss","--report_to","wandb","--fp16"]
"src/train_bash.py"
"src/train_bash.py"
"src/train_bash.py"
128
192
128
192
128
{"remote":"http://developer.hpccube.com/codes/modelzoo/qwen-torch.git","commit":"ddd8bac0afe5f3d985983413ae87d5c1a28d8e83","__typename":"GitInfo"}
{"remote":"https://mirror.ghproxy.com/https://github.com/hiyouga/LLaMA-Factory.git","commit":"f86857bd9ef456e77ad79a584f1fa08a129e5270","__typename":"GitInfo"}
{"remote":"https://gitclone.com/github.com/Tendo33/LLaMA-Factory.git","commit":null,"__typename":"GitInfo"}
NVIDIA A100-SXM4-80GB
8
Linux-3.10.0-957.el7.x86_64-x86_64-with-glibc2.17
Linux-4.19.90-vhulk2211.3.0.h1543.eulerosv2r10.aarch64-aarch64-with-glibc2.28
Linux-4.19.72-300.el7.x86_64-x86_64-with-glibc2.31
/workspace/sunjinfeng/qwen-torch/src/train_bash.py
/home/ma-user/work/sunjinfeng/LLaMA-Factory/src/train_bash.py
/workspace/sunjinfeng/github_projet/LLaMA-Factory/src/train_bash.py
3.9.12
3.10.13
3.9.18
1d 11h 35m 24s
8h 12m
4h 30m 36s
config
auto_map
configuration_qwen.QWenConfig
configuration_qwen.QWenConfig
configuration_qwen.QWenConfig
modeling_qwen.QWenLMHeadModel
modeling_qwen.QWenLMHeadModel
modeling_qwen.QWenLMHeadModel
label2id
0
0
0
1
1
1
/workspace/sunjinfeng/models/qwen/Qwen-7B-Chat
/home/ma-user/work/sunjinfeng/base_llms/Qwen-7B-Chat
/workspace/share_data/base_llms/Qwen-7B-Chat
Run set
3