Edge TPU performance benchmarks
An individual Edge TPU is capable of performing 4 trillion operations (tera-operations) per second (TOPS), using 0.5 watts for each TOPS (2 TOPS per watt). How that translates to performance for your application depends on a variety of factors. Every neural network model has different demands, and if you're using the USB Accelerator device, total performance also varies based on the host CPU, USB speed, and other system resources.
With that said, table 1 below compares the time spent to perform a single inference with several popular models on the Edge TPU. For the sake of comparison, all models running on both CPU and Edge TPU are the TensorFlow Lite versions.
This represents a small selection of model architectures that are compatible with the Edge TPU (they are all trained using the ImageNet dataset with 1,000 classes). If you want to test your own models, read the model architecture requirements.
Model architecture | Desktop CPU 1 | Desktop CPU 1 + USB Accelerator (USB 3.0) with Edge TPU |
Embedded CPU 2 | Dev Board 3 with Edge TPU |
---|---|---|---|---|
Unet Mv2 (128x128) |
27.7 | 3.3 | 190.7 | 5.7 |
DeepLab V3 (513x513) |
394 | 52 | 1139 | 241 |
DenseNet (224x224) |
380 | 20 | 1032 | 25 |
Inception v1 (224x224) |
90 | 3.4 | 392 | 4.1 |
Inception v4 (299x299) |
700 | 85 | 3157 | 102 |
Inception-ResNet V2 (299x299) |
753 | 57 | 2852 | 69 |
MobileNet v1 (224x224) |
53 | 2.4 | 164 | 2.4 |
MobileNet v2 (224x224) |
51 | 2.6 | 122 | 2.6 |
MobileNet v1 SSD (224x224) |
109 | 6.5 | 353 | 11 |
MobileNet v2 SSD (224x224) |
106 | 7.2 | 282 | 14 |
ResNet-50 V1 (299x299) |
484 | 49 | 1763 | 56 |
ResNet-50 V2 (299x299) |
557 | 50 | 1875 | 59 |
ResNet-152 V2 (299x299) |
1823 | 128 | 5499 | 151 |
SqueezeNet (224x224) |
55 | 2.1 | 232 | 2 |
VGG16 (224x224) |
867 | 296 | 4595 | 343 |
VGG19 (224x224) |
1060 | 308 | 5538 | 357 |
EfficientNet-EdgeTpu-S* | 5431 | 5.1 | 705 | 5.5 |
EfficientNet-EdgeTpu-M* | 8469 | 8.7 | 1081 | 10.6 |
EfficientNet-EdgeTpu-L* | 22258 | 25.3 | 2717 | 30.5 |
1 Desktop CPU: Single 64-bit Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz
2 Embedded CPU: Quad-core Cortex-A53 @ 1.5GHz
3 Dev Board: Quad-core Cortex-A53 @ 1.5GHz + Edge TPU
* Latency on CPU is high for these models because the TensorFlow Lite runtime is not fully
optimized for quantized models on all platforms.
Is this content helpful?