Automatically apply SOTA optimization techniques to achieve the maximum inference speed-up on your hardware.
Speedster is an open-source module designed to speed up AI inference in just a few lines of code. The library boosts your model to achieve the maximum acceleration that is physically possible on your hardware.
We are building a new AI inference acceleration product leveraging state-of-the-art open-source optimization tools enabling the optimization of the whole software to hardware stack. If you like the idea, give us a star to support the project ⭐
The core Speedster workflow consists of the following steps:
  • Select: you input your model in your preferred DL framework and express your preferences regarding:
    • Accuracy loss: do you want to trade off a little accuracy for much higher performance?
    • Optimization time: stellar accelerations can be time-consuming. Can you wait, or do you need an instant answer?
  • Search: the library automatically tests every combination of optimization techniques across the software-to-hardware stack (sparsity, quantization, compilers, etc.) that is compatible with your needs and local hardware.
  • Serve: finally, Speedster chooses the best configuration of optimization techniques and returns an accelerated version of your model in the DL framework of your choice (just on steroids 🚀).