Nebuly

Searchβ¦

Get started

From what do you want to start?

Nebullvm is very simple to use. Just input the deep learning **model** and some **input_data **(a sample data or a dataset) and your preferred **metric** (e.g., "accuracy"). You also need to enter the **metric_drop_ths**, which is the maximum threshold of the **optimization_time**, an indicator of how long you want nebullvm to take to identify the best optimization strategy for the model.

`metric`

you are willing to trade off to achieve a further increase in response time (could also be zero, i.e. optimization with no loss of "accuracy"), and the Given these inputs, nebullvm will search for the best strategy to optimize the **model **taking into account your preferences on **optimization_time**. Nebullvm will test the optimized models on your **input_data** to verify that the optimization has not reduced the selected **metric** below the metric loss threshold (**metric_drop_ths**). Nebullvm will then output the fastest model on your hardware, which will be an optimized model with the same interface as the input model.

Nebullvm API

def optimize_model(

model: Any,

input_data: Union[Iterable, Sequence],

metric_drop_ths: float,

metric: Union[str, Callable],

optimization_time: str,

dynamic_info: Dict,

config_file: str,

)

Arguments

The input model.

Input data to be used for model optimization, which can be one or more data samples. Note that if

`optimization_time`

is set to "unconstrained," it would be preferable to provide at least 100 data samples to also activate nebullvm techniques that require data (pruning, etc.). The data can be entered either as a sequence (data accessible by "element", e.g. `data[i]`

) or as an iterable (data accessible with a loop, e.g. `for x in data`

). In the case of a input model in PyTorch, TensorFlow and ONNX, a tensor must be passed in the `torch.Tensor`

, `tf.Tensor`

and `np.ndarray `

formats, respectively. Note that each input sample must be a tuple containing a tuple as the first element, the `inputs`

, and the `label`

as second element. Inputs must be passed as a tuple, even in the case of a single input sample; in such a case, the input tuple will contain only one element. Hugging Face models can take both dictionaries and strings as data samples. In the case of a list of strings passed as input_data, a tokenizer must also be entered as extra arguments with the keyword 'tokenizer'. The strings will then be converted into data samples by Hugging Face tokenizer.Maximum drop in the specified metric accepted. No model with a higher error will be accepted, i.e. all optimized model having a larger error with respect to the original one will be discarded, without even considering their possible speed-up. Default: 0.

Metric to be used for estimating the error that may arise from using optimization techniques and for evaluating if the error exceeds the

`metric_drop_ths`

and therefore the optimization has to be rejected. `metric`

accepts as input a string, a user-defined metric, or `none`

. Metric accepts a string containing the name of the metric; it currently supports "numeric_precision" and "accuracy". It also supports a user-defined metric that can be passed as a function that takes as input two tuples of tensors, which will be generated from the base model and the optimized model, and their original labels. For more information, see `nebullvm.measure.compute_relative_difference`

and `nebullvm.measure.compute_accuracy_drop`

. If `none`

is given but a `metric_drop_ths`

is received, the `nebullvm.measure.compute_relative_difference`

metric will be used as the default one. Default: "numeric_precision". The optimization time mode. It can be "constrained" or "unconstrained". In "constrained" mode, nebullvm takes advantage only of compilers and precision reduction techniques, such as quantization. "unconstrained"

`optimization_time`

allows it to exploit more time-consuming techniques, such as pruning and distillation. Note that most techniques activated in "unconstrained" mode require fine-tuning, and therefore it is recommended that at least 100 samples be provided as `input_data`

. Default: "constrained".Dictionary containing dynamic axis information. It should contain as keys both "input" and "output" and as values two lists of dictionaries, where each dictionary represents dynamic axis information for an input/output tensor. The inner dictionary should have an integer as a key, i.e. the dynamic axis (also considering the batch size) and a string as a value giving it a tag, e.g., "batch_size.". Default: None.

Configuration file containing the parameters needed to define the

`CompressionStep`

in the pipeline. Default: None.List containing compilers to be ignored when running the

`OptimizerStep`

. Default: None.Returns

Optimized version with the same interface of the input model. For example, optimizing a PyTorch model will return an InferenceLearner object that can be called exactly like a PyTorch model (either with

`model.forward(input)`

or `model(input)`

). The optimized model will therefore take as input a `torch.Tensors`

and return a `torch.Tensors`

.Examples of use

Here is an example of an application of nebullvm with a PyTorch ResNet50.

import torch

import torchvision.models as models

from nebullvm.api.functions import optimize_model

β

# Load a resnet as example

model = models.resnet50()

β

# Provide an input data for the model

input_data = [((torch.randn(1, 3, 256, 256), ), 0)]

β

# Run nebullvm optimization in one line of code

optimized_model = optimize_model(

model, input_data=input_data, optimization_time="constrained"

)

β

# Try the optimized model

x = torch.randn(1, 3, 256, 256)

res = optimized_model(x)

We are building examples for the other frameworks. In the meantime, if you want to test nebullvm and check its performance, check out Notebooks for testing nebullvm.

Export as PDF

Copy link

On this page

Nebullvm API

Arguments

Returns

Examples of use