
ONNX Runtime supports both DNN and traditional ML models and integrates with accelerators on different hardware such as TensorRT on NVidia GPUs, OpenVINO on Intel processors, DirectML on Windows, and more. Written in C++, it also has C, Python, C#, Java, and JavaScript (Node.js) APIs for usage in a variety of environments. It's optimized for both cloud and edge and works on Linux, Windows, and Mac. ONNX Runtime is a high-performance inference engine for deploying ONNX models to production. Once the models are in the ONNX format, they can be run on a variety of platforms and devices.

Models from many frameworks including TensorFlow, PyTorch, SciKit-Learn, Keras, Chainer, MXNet, MATLAB, and SparkML can be exported or converted to the standard ONNX format. Microsoft and a community of partners created ONNX as an open standard for representing machine learning models. A solution to train once in your preferred framework and run anywhere on the cloud or edge is needed. It's very time consuming to optimize all the different combinations of frameworks and hardware. The complexity increases if you have models from a variety of frameworks that need to run on a variety of platforms. The problem becomes extremely hard if you want to get optimal performance on different kinds of platforms (cloud/edge, CPU/GPU, etc.), since each one has different capabilities and characteristics. Optimizing machine learning models for inference (or model scoring) is difficult since you need to tune the model and the inference library to make the most of the hardware capabilities. Inference, or model scoring, is the phase where the deployed model is used for prediction, most commonly on production data.

Learn how using the Open Neural Network Exchange (ONNX) can help optimize the inference of your machine learning model.
