triton-inference-server相关信息

这里总结有关triton-inference-server相关的技术文章、最近更新等。

使用新版本triton需要注意显卡驱动 https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html

更新版本

官方release信息:

triton-inference-server的重要更新日志(可以用过更新的功能来选择需要的版本):

23.05

23.05 may support the decoupled mode

最新技术文章:

pytriton

triton tutorials

https://github.com/triton-inference-server/tutorials

triton新版本也提供了TMS-EA的申请:

This release of the management service is the pre-release version under early-access program. It is considered alpha quality software and not recommended for production deployment. For instance, security features such as TLS are not supported at the moment.

New releases happen every month. Currently supported functionalities for alpha release include:

  • Automates deploying and managing Triton on Kubernetes (k8s) with requested models
  • Avoids unnecessary Triton Inference Server instances by loading models onto already running Triton instances when possible.
  • Enables more efficient GPU utilization by allowing multiple models to share the same Triton instance in a single pod.
  • Unloads models when not in use
  • Groups models from different frameworks together to ensure they coexist efficiently without out of memory issues
  • Allows for loading models from multiple sources such as secure registry, HTTPS, etc
  • Allows custom resource allocation per model or a set of models
  • REST and JSON gPRC service

申请地址:https://developer.nvidia.com/tms-early-access