Nebuly
Search…
⌃K

Supported features and roadmap

Here we explore the model Input frameworks supported by nebullvm, its 4 building blocks (Model converter, Compressor, Optimizer and Learner) and supported Hardware.

Input frameworks

nebullvm supports deep learning models in the following frameworks.
  • Hugging Face
  • ONNX
  • PyTorch
  • TensorFlow

Model converter

The Converter (source code) converts the input model from its original framework to the framework backends supported by nebullvm. This allows the Compressor and Optimizer to apply any optimization technique to the neural network.
As an example, let's assume that for your specific use case, the best optimization technique is a specific type of dynamic quantization only supported by PyTorch. If you feed a Hugging Face model into nebullvm, the Converter will first transform your model into a PyTorch model. nebullvm will then quantize it and finally return it as a Hugging Face model.

Supported backends and conversions

To date, not all cross-conversions from input frameworks to each nebullvm backend are supported. More precisely, nebullvm now includes 3 backends:
  • ONNX backend, which supports models in any input framework.
  • PyTorch backend, which supports input models in PyTorch and ONNX and Hugging Face.
  • TensorFlow backend, which supports input models in TensorFlow and ONNX.
For example, when a PyTorch model is optimized, first of all nebullvm will try the compilers available in the PyTorch pipeline, then it will convert it to ONNX and will try also the ones available on this pipeline. Finally, the best one among them will be chosen and returned as the optimized model.

Compressor

The compressor (source code) applies various compression techniques to the model.

Supported high-level optimization techniques

  • Block-wise un/structured sparsity (🎉 launched in 0.4.0 🎉)
  • Knowledge distillation (to be supported)
  • Layer replacement (to be supported)
  • Low-rank compression (to be supported)
  • Quantization-aware training (to be supported)
  • SparseML (🎉 launched in 0.4.0 🎉)

Optimizer

The optimizer (source code) converts the compressed models to the intermediate representation (IR) of Supported deep learning compilers. The compilers perform both Supported low-level optimizations, which mostly consist of various quatization techniques supported by compilers, and graph optimizations, to then produce compiled binaries.

Supported deep learning compilers

  • Apache TVM
  • BladeDISC (🎉 launched in 0.4.0 🎉)
  • DeepSparse (🎉 launched in 0.4.0 🎉)
  • MLIR (open pull request 👩‍💻)
  • ONNX Runtime
  • OpenVINO
  • TensorRT
  • TF Lite / XLA
  • TorchScript

Supported low-level optimizations

  • Static quantization.
  • Dynamic quantization.
  • Half-precision.
  • Low-bit quantization on TVM (to be supported)

Support Matrix

Each framework has its own implementations of compilers, and sometimes a same compiler could be available in more than one framework. For example, Tensor RT is available in both PyTorch and ONNX pipelines, with some possible differences. In this section we give more details about which compilers are available for each framework and about which low-level optimizations are supported.
  • PyTorch
Optimizer
No quantization
Half precision
Dynamic quantization
Static quantization
TorchScript
YES
YES
YES (No GPU)
YES (No GPU)
DeepSparse
YES
NO
NO
NO
TensorRT
YES (No CPU)
YES (No CPU)
NO
YES (No CPU)
Apache TVM
YES
YES
YES
YES
BladeDISC
YES
YES
YES
YES
Intel Neural Compressor
NO
YES (No GPU)
YES (No GPU)
YES (No GPU)
  • TensorFlow
Optimizer
No quantization
Half precision
Dynamic quantization
Static quantization
TF Lite / XLA
YES
YES
YES
YES
  • ONNX
Optimizer
No quantization
Half precision
Dynamic quantization
Static quantization
TensorRT
YES (No CPU)
YES (No CPU)
NO
YES (No CPU)
ONNX Runtime
YES
YES
YES
YES
Apache TVM
YES
YES
YES
YES
OpenVINO
YES (No GPU)
YES (No GPU)
NO
YES (No GPU)

Learner

The Learner (source code), or Inference Learner, selects the most performing compiled model on your hardware and converts it to the same interface as the original input model.

Hardware

nebullvm has been mostly tested on Nvidia GPUs and Intel and AMD CPUs. The library may also work with other hardware on which has not been tested. Please let us know if you find out that nebullvm works well on other hardware or if you find issues.

Supported hardware

  • AMD CPU
  • Intel CPU
  • Intel GPU (open issue 👩‍💻)
  • Nvidia GPU
  • Apple M1