干翻TensorRT系列（一）——了解trt、熟悉trt、成为trt

imoldpan · 2023 年9 月 24 日 14:44

这是一系列文章

推理库可以依赖tensorrt以及torchscript，编译库可以依赖torch-inductor+triton。

主要的功能

用户是否需要安装python环境库？

在linux端，如果不安装的话，可以通过curl来请求

operator fusion, graph rewriting, and memory optimization
High-Performance Fusion Operator Library: Implemented a series of high-performance fusion operators based on CUDNN, CUBLAS, CUDA C++, and OpenAI Triton. These operators support both forward and backward propagation, thus can also accelerate training.
CUDA Graph Capture, Optimized BeamSearch, Optimized Attention Layer, further enhanced performance when combined with the aforementioned technologies.