如果把triton-inference-server当做推理后端使用

CPU-Only Build#

If you want to build without GPU support you must specify individual feature flags and not include the --enable-gpu and --enable-gpu-metrics flags. Only the following backends are available for a non-GPU / CPU-only build: identity, repeat, ensemble, square, tensorflow2, pytorch, onnxruntime, openvino, python and fil.

To include the TensorFlow2 backend in your CPU-only build, you must provide this additional flag to build.py: --extra-backend-cmake-arg=tensorflow2:TRITON_TENSORFLOW_INSTALL_EXTRA_DEPS=ON.

CPU-only builds of the TensorFlow and PyTorch backends require some CUDA stubs and runtime dependencies that are not present in the CPU-only base container. These are retrieved from a GPU base container, which can be changed with the --image=gpu-base,nvcr.io/nvidia/tritonserver:<xx.yy>-py3-min flag.

Triton Inference Server provides a cloud and edge inferencing solution optimized for both CPUs and GPUs. Triton supports an HTTP/REST and GRPC protocol that allows remote clients to request inferencing for any model being managed by the server. For edge deployments, Triton is available as a shared library with a C API that allows the full functionality of Triton to be included directly in an application. Five Docker images are available:

  • The xx.yy-py3 image contains the Triton inference server with support for Tensorflow, PyTorch, TensorRT, ONNX and OpenVINO models.
  • The xx.yy-py3-sdk image contains Python and C++ client libraries, client examples, and the Model Analyzer.
  • The xx.yy-py3-min image is used as the base for creating custom Triton server containers as described in Customize Triton Container.
  • The xx.yy-pyt-python-py3 image contains the Triton Inference Server with support for PyTorch and Python backends only.
  • The xx.yy-tf2-python-py3 image contains the Triton Inference Server with support for TensorFlow 2.x and Python backends only.

For more information, refer to Triton Inference Server GitHub.

Need enterprise support? NVIDIA global support is available for Triton Inference Server with the NVIDIA AI Enterprise software suite. Check out NVIDIA LaunchPad for free access to a set of hands-on labs with Triton Inference Server hosted on NVIDIA infrastructure.