Nebuly
Search…
Benchmarks
We tested nebullvm on popular AI models and hardware from leading vendors. For each model-hardware coupling, we evaluated the response time, as an average over 100 predictions.
The experiments consist of benchmarking baseline performance against a nebullvm optimization without accuracy loss.
  • Baseline: nebullvm optimization is not applied
  • Optimized - nebullvm optimization without accuracy loss (metric_drop_ths = 0%) and otpimization_time is set to "unconstrained"
The average response time is evaluated in milliseconds (ms) and the speedup is nondimensional. The speedup is defined as the average response time of the optimized nebullvm over the average response time of the baseline model.

Complete overview of the results

Text
<--
M1 Pro
-->
<--
Intel Xeon
-->
<--
AMD EPYC
-->
<--
Nvidia T4
-->
​
Baseline
Optimized
Speedup
Baseline
Optimized
Speedup
Baseline
Optimized
Speedup
Baseline
Optimized
Speedup
EfficientNetB0
214.95 ms
24.4 ms
8.8x
36.07 ms
12.15 ms
3.0x
86.29 ms
38.64 ms
2.2x
12.92 ms
9.59 ms
1.3x
EfficientNetB1
278.81 ms
33.62 ms
8.3x
50.47 ms
17.33 ms
2.9x
96.65 ms
59.93 ms
1.6x
17.99 ms
14.19 ms
1.3x
EfficientNetB2
284.88 ms
36.77 ms
7.7x
50.33 ms
19.06 ms
2.6x
97.32 ms
65.93 ms
1.5x
36.91 ms
13.46 ms
2.7x
EfficientNetB3
370.11 ms
50.37 ms
7.3x
67.98 ms
26.74 ms
2.5x
207.95 ms
89.61 ms
2.3x
20.26 ms
14.33 ms
1.4x
EfficientNetB4
558.86 ms
70.99 ms
7.9x
91.43 ms
35.89 ms
2.5x
274.93 ms
119.17 ms
2.3x
24.89 ms
17.08 ms
1.5x
EfficientNetB5
704.25 ms
99.84 ms
7.1x
125.69 ms
53.91 ms
2.3x
481.7 ms
188.63 ms
2.6x
31.23 ms
17.94 ms
1.7x
EfficientNetB6
1124 ms
157.38 ms
7.1x
165.15 ms
71.99 ms
2.3x
630.95 ms
256.65 ms
2.5x
35.79 ms
21.27 ms
1.7x
EfficientNetB7
1521.71 ms
212.12 ms
7.2x
223.15 ms
106.86 ms
2.1x
766.61 ms
395.57 ms
1.9x
45.65 ms
23.32 ms
2.0x
Resnet18
18.48 ms
15.75 ms
1.2x
32.2 ms
17.79 ms
1.8x
147.04 ms
93.43 ms
1.6x
25.23 ms
12.39 ms
2.0x
Resnet34
42.06 ms
34.4 ms
1.2x
61.67 ms
36.54 ms
1.7x
180.18 ms
166.13 ms
1.1x
27.41 ms
5.36 ms
5.1x
Resnet50
62.22 ms
54.25 ms
1.1x
83.1 ms
46.81 ms
1.8x
311.44 ms
197.68 ms
1.6x
10.5 ms
7.81 ms
1.3x
Resnet101
118.95 ms
92.01 ms
1.3x
152.52 ms
82.99 ms
1.8x
545.65 ms
364.74 ms
1.5x
20.22 ms
12.82 ms
1.6x
Resnet152
166.89 ms
129.81 ms
1.3x
220.78 ms
129.86 ms
1.7x
810.95 ms
540.86 ms
1.5x
32.51 ms
17.86 ms
1.8x
SqueezeNet
15.25 ms
7.86 ms
1.9x
23.63 ms
8.7 ms
2.7x
86.78 ms
43.49 ms
2.0x
3.48 ms
2.7 ms
1.3x
Convnext tiny
305.58 ms
95.55 ms
3.2x
79.91 ms
62.01 ms
1.3x
404.75 ms
220.91 ms
1.8x
38.29 ms
9.58 ms
4.0x
Convnext small
615.25 ms
167.78 ms
3.7x
145.05 ms
110.69 ms
1.3x
735037 ms
544.47 ms
1.4x
24.31 ms
17.02 ms
1.4x
Convnext base
815.01 ms
240.4 ms
3.4x
230.72 ms
187.39 ms
1.2x
1237.36 ms
966.58 ms
1.3x
76.53 ms
25.79 ms
3.0x
Convnext large
1266.87 ms
394.85 ms
3.2x
444.82 ms
396.62 ms
1.1x
2537.23 ms
1868.43 ms
1.4x
108.12 ms
38.41 ms
2.8x
GPT2 - 10 tokens
29.67 ms
10.75 ms
2.8x
38.45 ms
31.88 ms
1.2x
138.11 ms
55.31 ms
2.5x
15.31 ms
4.42 ms
3.5x
GPT2 - 1024 tokens
546.74 ms
-
-
1564.67 ms
924.58 ms
1.7x
9423.16 ms
5076.11 ms
1.9x
84.47 ms
-
-
Bert - 8 tokens
39.39 ms
6.2 ms
6.4x
31.31 ms
14.87 ms
2.1x
164.9 ms
38.12 ms
4.3x
10.35 ms
3.78 ms
2.7x
Bert - 512 tokens
489.52 ms
276.35 ms
1.8x
494.21 ms
376.13 ms
1.3x
2985.27 ms
1847.31 ms
1.6x
31.25 ms
27.37 ms
1.1x
​
Below are the model and hardware tested during the experiment.
Models
  • EfficientNet B0, B1, B2, B3, B4, B5, B6, B7
  • Resnet 18, 34, 50, 101, 152
  • SqueezeNet
  • Convnext tiny, small, base, large
  • GPT2 with 10 and 1024 tockens
  • Bert with 8 and 512 tockens
Hardware
  • M1 Pro: Apple M1 Pro 16GB of RAM
  • Intel Xeon: EC2 Instance on AWS - t2.large
  • AMD EPYC: EC2 Instance on AWS - t4a.large
  • NVIDIA T4: EC2 instance on AWS - g4dn.xlarge
Export as PDF
Copy link