Skip to main content

vllm vs Transformers推論速度表

Created on September 27|Last edited on September 27

L4_24GB_vllmSection 1


8B
 
2048
8
6857.371
6822.747
34.624
0.4853
8B
 
2048
4
7242.26
7209.679
32.581
0.4859
8B
 
2048
2
8770.189
8721.984
48.205
0.4874
8B
 
2048
4
7042.752
7010.338
32.414
0.4879
8B
 
2048
1
19450.007
19417.978
32.029
0.4865
model size
precision
max_model_len
tensor parallel
run
(inference)
(init)
avg score
1
2
3
4
5
Run: warm-brook-386
1


L4_24GB_transformers


Run: warm-brook-386
1

List<File<(table)>>