TensorRT相关技术新闻更新

imoldpan · 2023 年3 月 28 日 03:23

这里总结TensorRT的相关技术新闻。

参考

Release Notes :: NVIDIA Deep Learning TensorRT Documentation

imoldpan · 2023 年3 月 28 日 03:24

TensorRT-8.6

NVIDIA TensorRT

New features in TensorRT include multi-GPU multi-node inference, performance and hardware optimizations, and more.

Multi-GPU multi-node inference

TensorRT can be used to run multi-GPU multi-node inference for large language models (LLMs). It supports GPT-3 175B, 530B, and 6.7B models. These models do not require ONNX conversion; rather, a simple Python API is available to optimize for multi-GPU inference. Now available in private early access. Contact your NVIDIA account team for more details.

TensorRT 8.6

TensorRT 8.6 is now available in early access and includes the following key features:

Performance optimizations for generative AI diffusion and transformer models
Hardware compatibility to build and run on different GPU architectures (NVIDIA Ampere architecture and later)
Version compatibility to build and run on different TensorRT versions (TensorRT 8.6 and later)
Optimization levels to trade between build time and inference performance