Nebuly
Search…
Get started
From what do you want to start?
Nebullvm is very simple to use. Just input the deep learning model and some input_data (a sample data or a dataset) and your preferred metric (e.g., "accuracy"). You also need to enter the metric_drop_ths, which is the maximum threshold of the metric you are willing to trade off to achieve a further increase in response time (could also be zero, i.e. optimization with no loss of "accuracy"), and the optimization_time, an indicator of how long you want nebullvm to take to identify the best optimization strategy for the model.
Given these inputs, nebullvm will search for the best strategy to optimize the model taking into account your preferences on optimization_time. Nebullvm will test the optimized models on your input_data to verify that the optimization has not reduced the selected metric below the metric loss threshold (metric_drop_ths). Nebullvm will then output the fastest model on your hardware, which will be an optimized model with the same interface as the input model.

Nebullvm API

def optimize_model(
model: Any,
input_data: Union[Iterable, Sequence],
metric_drop_ths: float,
metric: Union[str, Callable],
optimization_time: str,
dynamic_info: Dict,
config_file: str,
)

Arguments

model: Any
The input model.
input_data: Iterable or Sequence
Input data to be used for model optimization, which can be one or more data samples. Note that if optimization_time is set to "unconstrained," it would be preferable to provide at least 100 data samples to also activate nebullvm techniques that require data (pruning, etc.). The data can be entered either as a sequence (data accessible by "element", e.g. data[i]) or as an iterable (data accessible with a loop, e.g. for x in data). In the case of a input model in PyTorch, TensorFlow and ONNX, a tensor must be passed in the torch.Tensor, tf.Tensor and np.ndarray formats, respectively. Note that each input sample must be a tuple containing a tuple as the first element, the inputs, and the label as second element. Inputs must be passed as a tuple, even in the case of a single input sample; in such a case, the input tuple will contain only one element. Hugging Face models can take both dictionaries and strings as data samples. In the case of a list of strings passed as input_data, a tokenizer must also be entered as extra arguments with the keyword 'tokenizer'. The strings will then be converted into data samples by Hugging Face tokenizer.
metric_drop_ths: float, optional
Maximum drop in the specified metric accepted. No model with a higher error will be accepted, i.e. all optimized model having a larger error with respect to the original one will be discarded, without even considering their possible speed-up. Default: 0.
metric: Callable, optional
Metric to be used for estimating the error that may arise from using optimization techniques and for evaluating if the error exceeds the metric_drop_ths and therefore the optimization has to be rejected. metric accepts as input a string, a user-defined metric, or none. Metric accepts a string containing the name of the metric; it currently supports "numeric_precision" and "accuracy". It also supports a user-defined metric that can be passed as a function that takes as input two tuples of tensors, which will be generated from the base model and the optimized model, and their original labels. For more information, see nebullvm.measure.compute_relative_difference and nebullvm.measure.compute_accuracy_drop. If none is given but a metric_drop_ths is received, the nebullvm.measure.compute_relative_difference metric will be used as the default one. Default: "numeric_precision".
optimization_time: OptimizationTime, optional
The optimization time mode. It can be "constrained" or "unconstrained". In "constrained" mode, nebullvm takes advantage only of compilers and precision reduction techniques, such as quantization. "unconstrained" optimization_time allows it to exploit more time-consuming techniques, such as pruning and distillation. Note that most techniques activated in "unconstrained" mode require fine-tuning, and therefore it is recommended that at least 100 samples be provided as input_data. Default: "constrained".
dynamic_info: Dict, optional
Dictionary containing dynamic axis information. It should contain as keys both "input" and "output" and as values two lists of dictionaries, where each dictionary represents dynamic axis information for an input/output tensor. The inner dictionary should have an integer as a key, i.e. the dynamic axis (also considering the batch size) and a string as a value giving it a tag, e.g., "batch_size.". Default: None.
config_file: str, optional
Configuration file containing the parameters needed to define the CompressionStep in the pipeline. Default: None.
ignore_compilers: List, optional
List containing compilers to be ignored when running the OptimizerStep. Default: None.

Returns

InferenceLearner
Optimized version with the same interface of the input model. For example, optimizing a PyTorch model will return an InferenceLearner object that can be called exactly like a PyTorch model (either with model.forward(input) or model(input)). The optimized model will therefore take as input a torch.Tensors and return a torch.Tensors.

Examples of use

Here is an example of an application of nebullvm with a PyTorch ResNet50.
import torch
import torchvision.models as models
from nebullvm.api.functions import optimize_model
​
# Load a resnet as example
model = models.resnet50()
​
# Provide an input data for the model
input_data = [((torch.randn(1, 3, 256, 256), ), 0)]
​
# Run nebullvm optimization in one line of code
optimized_model = optimize_model(
model, input_data=input_data, optimization_time="constrained"
)
​
# Try the optimized model
x = torch.randn(1, 3, 256, 256)
res = optimized_model(x)
We are building examples for the other frameworks. In the meantime, if you want to test nebullvm and check its performance, check out Notebooks for testing nebullvm.
Export as PDF
Copy link
On this page
Nebullvm API
Arguments
Returns
Examples of use