这里总结有关triton-inference-server相关的技术文章、最近更新等。
使用新版本triton需要注意显卡驱动 Frameworks Support Matrix - NVIDIA Docs
更新版本
官方release信息:
triton-inference-server的重要更新日志(可以用过更新的功能来选择需要的版本):
- 23.01 custom batching strategies
- 23.02 Support for ensemble models in Model Analyzer.
- 23.04 Triton’s ragged batching support has been extended to PyTorch backend.
- 23.05 Python backend supports Custom Metrics allowing users to define and report counters and gauges similar to the C API.
- 23.06 The statistics extension now includes the memory usage of the loaded models This statistics is currently implemented only for TensorRT and ONNXRuntime backends.
Added support for batch inputs in ragged batching for PyTorch backend. - 23.08 Python backend supports directly loading and serving PyTorch models with torch.compile().
- 23.09 TensorRT backend now supports TensortRT version compatibility across models generated with the same major version of TensorRT. Use the
--backend-config=tensorrt,version-compatible=true
flag to enable this feature. - 23.11 The backend API has been enhanced to support rescheduling a request. Currently, only Python backend and Custom C++ backends support request rescheduling.