nebullvmis shaped around 4 building blocks and leverages a modular design to foster scalability and integration of new acceleration components across the stack:
- Converter: converts the input model from its original framework to the framework backends supported by
nebullvm, namely PyTorch, TensorFlow, and ONNX. This allows the Compressor and Optimizer modules to apply any optimization technique to the model.
- Compressor: applies various compression techniques to the model, such as pruning, knowledge distillation, or quantization-aware training.
- Optimizer: converts the compressed models to the intermediate representation (IR) of the supported deep learning compilers. The compilers apply both post-training quantization techniques and graph optimizations, to produce compiled binary files.
- Inference Learner: takes the best performing compiled model and converts it to the same interface as the original input model.