TensorRT系列——Polygraph工具使用指北

基本操作

查看

Inspect A ONNX Network

(pytorch) oldpan@oldpan-ROG-Flow-X13-GV301QE-GV301QE:~/software/tensorrt/TensorRT-8.4.1.5/bin$ polygraphy inspect model /home/oldpan/code/data/orignal_model/Resnet34_3inputs_448x448_20200609.onnx 
[I] Loading model: /home/oldpan/code/data/orignal_model/Resnet34_3inputs_448x448_20200609.onnx
[I] ==== ONNX Model ====
    Name: torch-jit-export | ONNX Opset: 9
    
    ---- 3 Graph Input(s) ----
    {input.1 [dtype=float32, shape=(1, 3, 448, 448)],
     input.4 [dtype=float32, shape=(1, 3, 448, 448)],
     input.7 [dtype=float32, shape=(1, 3, 448, 448)]}
    
    ---- 4 Graph Output(s) ----
    {504 [dtype=float32, shape=(1, 48, 28, 28)],
     499 [dtype=float32, shape=(1, 24, 28, 28)],
     530 [dtype=float32, shape=(1, 2016, 28, 28)],
     516 [dtype=float32, shape=(1, 672, 28, 28)]}
    
    ---- 337 Initializer(s) ----
    
    ---- 191 Node(s) ----

Inspecting A TensorRT Network

  • 显示未优化前tensorrt-network的层信息:
(pytorch) oldpan@oldpan-ROG-Flow-X13-GV301QE-GV301QE:~/software/tensorrt/TensorRT-8.4.1.5/bin$ polygraphy inspect model /home/oldpan/code/data/orignal_model/Resnet34_3inputs_448x448_20200609.onnx --show layers --display-as=trt
[08/11/2022-22:13:05] [TRT] [W] onnx2trt_utils.cpp:369: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[I] ==== TensorRT Network ====
    Name: Unnamed Network 0 | Explicit Batch Network
    
    ---- 3 Network Input(s) ----
    {input.1 [dtype=float32, shape=(1, 3, 448, 448)],
     input.4 [dtype=float32, shape=(1, 3, 448, 448)],
     input.7 [dtype=float32, shape=(1, 3, 448, 448)]}
    
    ---- 4 Network Output(s) ----
    {504 [dtype=float32, shape=(1, 48, 28, 28)],
     499 [dtype=float32, shape=(1, 24, 28, 28)],
     530 [dtype=float32, shape=(1, 2016, 28, 28)],
     516 [dtype=float32, shape=(1, 672, 28, 28)]}
    
    ---- 191 Layer(s) ----
    Layer 0    | node_of_340 [Op: LayerType.CONVOLUTION]
        {input.1 [dtype=float32, shape=(1, 3, 448, 448)]}
         -> {340 [dtype=float32, shape=(1, 64, 224, 224)]}
    
    Layer 1    | node_of_341 [Op: LayerType.SCALE]
        {340 [dtype=float32, shape=(1, 64, 224, 224)]}
         -> {341 [dtype=float32, shape=(1, 64, 224, 224)]}
    
    Layer 2    | node_of_342 [Op: ActivationType.RELU]
        {341 [dtype=float32, shape=(1, 64, 224, 224)]}
         -> {342 [dtype=float32, shape=(1, 64, 224, 224)]}
    
    ...
    Layer 189  | node_of_529 [Op: LayerType.CONVOLUTION]
        {528 [dtype=float32, shape=(1, 1344, 28, 28)]}
         -> {529 [dtype=float32, shape=(1, 2016, 28, 28)]}
    
    Layer 190  | node_of_530 [Op: ActivationType.TANH]
        {529 [dtype=float32, shape=(1, 2016, 28, 28)]}
         -> {530 [dtype=float32, shape=(1, 2016, 28, 28)]}
  • 显示已经优化后(生成engine后)的trt层信息
(pytorch) oldpan@oldpan-ROG-Flow-X13-GV301QE-GV301QE:~/software/tensorrt/TensorRT-8.4.1.5/bin$ polygraphy inspect model Resnet34_3inputs_448x448_20200609.trt --model-type engine
[I] Loading bytes from /home/oldpan/software/tensorrt/TensorRT-8.4.1.5/targets/x86_64-linux-gnu/bin/Resnet34_3inputs_448x448_20200609.trt
[I] ==== TensorRT Engine ====
    Name: Unnamed Network 0 | Explicit Batch Engine
    
    ---- 3 Engine Input(s) ----
    {input.1 [dtype=float32, shape=(1, 3, 448, 448)],
     input.4 [dtype=float32, shape=(1, 3, 448, 448)],
     input.7 [dtype=float32, shape=(1, 3, 448, 448)]}
    
    ---- 4 Engine Output(s) ----
    {499 [dtype=float32, shape=(1, 24, 28, 28)],
     504 [dtype=float32, shape=(1, 48, 28, 28)],
     516 [dtype=float32, shape=(1, 672, 28, 28)],
     530 [dtype=float32, shape=(1, 2016, 28, 28)]}
    
    ---- Memory ----
    Device Memory: 25690112 bytes
    
    ---- 1 Profile(s) (7 Binding(s) Each) ----
    - Profile: 0
        Binding Index: 0 (Input)  [Name: input.1] | Shapes: min=(1, 3, 448, 448), opt=(1, 3, 448, 448), max=(1, 3, 448, 448)
        Binding Index: 1 (Input)  [Name: input.4] | Shapes: min=(1, 3, 448, 448), opt=(1, 3, 448, 448), max=(1, 3, 448, 448)
        Binding Index: 2 (Input)  [Name: input.7] | Shapes: min=(1, 3, 448, 448), opt=(1, 3, 448, 448), max=(1, 3, 448, 448)
        Binding Index: 3 (Output) [Name: 499]     | Shape: (1, 24, 28, 28)
        Binding Index: 4 (Output) [Name: 504]     | Shape: (1, 48, 28, 28)
        Binding Index: 5 (Output) [Name: 516]     | Shape: (1, 672, 28, 28)
        Binding Index: 6 (Output) [Name: 530]     | Shape: (1, 2016, 28, 28)
    
    ---- 73 Layer(s) ----

运行

 2021  polygraphy run /home/oldpan/code/models/GPT/LLAMA/alpaca.onnx/decoder-merge-5.onnx --onnxrt 
 2022  pip install colored
 2023  polygraphy run /home/oldpan/code/models/GPT/LLAMA/alpaca.onnx/decoder-merge-5.onnx --onnxrt --data-loader-script /home/oldpan/code/convert/tools/data_loader.py
 2024  polygraphy run /home/oldpan/code/models/GPT/LLAMA/alpaca.onnx/decoder-merge-5.onnx --onnxrt --data-loader-script /home/oldpan/code/convert/tools/data_loader.py --save-results=/home/oldpan/code/data/debug_data/output-debug/decoder-merge-5-onnx-output.json
 2025  history

调试

保存模型的输出用于对比debug

# 输出ONNX模型的所有输出节点结果
polygraphy run /home/oldpan/code/data/orignal_model/yolov5s.onnx --onnxrt --onnx-outputs mark all --save-results=onnx_output_all.json


polygraphy run --onnxrt /home/oldpan/code/models/GPT/LLAMA/alpaca.onnx/decoder-merge-4.onnx --save-results=/home/oldpan/code/data/debug_data/output-debug/decoder-merge-4-onnx-output.json --data-loader-script /home/oldpan/code/convert/tools/data_loader.py

polygraphy run /home/oldpan/code/models/normal/tensorrt_engine/llama-trt/decoder-merge-4.trt --model-type engine --trt --data-loader-script /home/oldpan/code/convert/tools/data_loader.py --load-outputs /home/oldpan/code/data/debug_data/output-debug/decoder-merge-4-onnx-output.json  --atol 1e-2 --rtol 1e-3
from polygraphy.comparator import RunResults

results = RunResults.load("/home/oldpan/code/data/debug_data/onnx_output_all.json")

for runner_name, iterations in results.items():
    for iteration in iterations:
        for tensor_name, value in iteration.items():
            print(tensor_name, value)

检查模型输出是否有nan

polygraphy run --onnxrt identity_fp16.onnx --validate
  • 标记所有TRT-Network中的每一个层作为输出层,因为network是未做优化的层,

polygraphy run /home/oldpan/code/data/orignal_model/yolov5s.onnx --trt --validate --trt-outputs mark all --save-results=trt_out.json

[08/14/2022-10:13:32] [TRT] [W] onnx2trt_utils.cpp:369: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[08/14/2022-10:13:32] [TRT] [W] onnx2trt_utils.cpp:395: One or more weights outside the range of INT32 was clamped
[08/14/2022-10:13:32] [TRT] [W] onnx2trt_utils.cpp:395: One or more weights outside the range of INT32 was clamped
[08/14/2022-10:13:32] [TRT] [W] onnx2trt_utils.cpp:395: One or more weights outside the range of INT32 was clamped
[08/14/2022-10:13:33] [TRT] [W] TensorRT was linked against cuDNN 8.4.1 but loaded cuDNN 8.2.4
[08/14/2022-10:13:58] [TRT] [W] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
[08/14/2022-10:13:58] [TRT] [W] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
[I] trt-runner-N0-08/14/22-10:13:29     | Activating and starting inference
[I]     Configuring with profiles: [Profile().add('images', min=[1, 3, 640, 640], opt=[1, 3, 640, 640], max=[1, 3, 640, 640])]
[I] Building engine with configuration:
    Workspace            | 16777216 bytes (16.00 MiB)
    Precision            | TF32: False, FP16: False, INT8: False, Obey Precision Constraints: False, Strict Types: False
    Tactic Sources       | ['CUBLAS', 'CUBLAS_LT', 'CUDNN', 'EDGE_MASK_CONVOLUTIONS']
    Safety Restricted    | False
    Profiles             | 1 profile(s)
[I] Finished engine building in 26.167 seconds
[I] trt-runner-N0-08/14/22-10:13:29    
    ---- Inference Input(s) ----
    {images [dtype=float32, shape=(1, 3, 640, 640)]}
[I] trt-runner-N0-08/14/22-10:13:29    
    ---- Inference Output(s) ----
    {onnx::Reshape_517 [dtype=int32, shape=(3,)],
     onnx::Reshape_507 [dtype=int32, shape=(3,)],
     onnx::Resize_487 [dtype=float32, shape=(4,)],
     onnx::Sigmoid_125 [dtype=float32, shape=(1, 32, 320, 320)],
     onnx::Mul_126 [dtype=float32, shape=(1, 32, 320, 320)],
    ...
     onnx::Concat_484 [dtype=float32, shape=(1, 1200, 85)],
     output [dtype=float32, shape=(1, 25200, 85)]}
[I] trt-runner-N0-08/14/22-10:13:29     | Completed 1 iteration(s) in 182.3 ms | Average inference time: 182.3 ms.
[I] Saving inference results to trt_out.json
[I] Output Validation | Runners: ['trt-runner-N0-08/14/22-10:13:29']

转模型

convert onnx to trt

Build an engine with 3 separate profiles

polygraphy convert dynamic_identity.onnx -o dynamic_identity.engine \
    --trt-min-shapes X:[1,3,28,28] --trt-opt-shapes X:[1,3,28,28] --trt-max-shapes X:[1,3,28,28] \
    --trt-min-shapes X:[1,3,28,28] --trt-opt-shapes X:[4,3,28,28] --trt-max-shapes X:[32,3,28,28] \
    --trt-min-shapes X:[128,3,28,28] --trt-opt-shapes X:[128,3,28,28] --trt-max-shapes X:[128,3,28,28]

TIP: If we want to use only a single profile where min == opt == max, we can leverage the runtime input shapes option: --input-shapes as a conveneint shorthand instead of setting min/opt/max separately.

进阶操作

修改ONNX结构

修改结构

源码解析

需要注意的是

/data/oldpan/code/TensorRT/tools/Polygraphy/polygraphy/tools/debug/subtool/precision.py

        marked_indices = set()
        for index in indices:
            layer = network.get_layer(index)
            if layer.type == trt.LayerType.SHAPE or layer.type == trt.LayerType.IDENTITY or layer.type == trt.LayerType.SLICE or \
                layer.type == trt.LayerType.SHUFFLE or layer.type == trt.LayerType.CONCATENATION or layer.type == trt.LayerType.CONSTANT \
                or layer.type == trt.LayerType.GATHER:

关于Debug

pip install colored polygraphy --extra-index-url https://pypi.ngc.nvidia.com

polygraphy run demo_simplify.onnx \ –trt --onnxrt \ –trt-outputs mark all\ –onnx-outputs mark all \ –atol 1e-2 --rtol 1e-3 \ –fail-fast\ –val-range[0,1]

很多人只是会使用,但是

实际在调用Polygraph的时候会:

G_LOGGER.verbose(f"Loaded extension modules: {self.loaded_plugins}")

if self.arg_groups[ModelArgs].path is None and self.arg_groups[RunnerSelectArgs].runners:
    G_LOGGER.critical(
        "One or more runners was specified, but no model file was provided. Make sure you've specified the model path, "
        "and also that it's not being consumed as an argument for another parameter"
    )

script = Script(
    summary=generate_summary(
        self.arg_groups[ModelArgs].path,
        self.arg_groups[RunnerSelectArgs].runners.values(),
        self.arg_groups[ComparatorCompareArgs].load_outputs_paths,
    )
)

self.arg_groups[LoggerArgs].add_to_script(script)

self.arg_groups[RunnerSelectArgs].add_to_script(script)

RESULTS_VAR_NAME = self.arg_groups[ComparatorRunArgs].add_to_script(script)
SUCCESS_VAR_NAME = self.arg_groups[ComparatorCompareArgs].add_to_script(script, results_name=RESULTS_VAR_NAME)

script.add_import(imports=["sys"])

cmd_run = inline(safe("' '.join(sys.argv)"))
exit_status = safe(
    "# Report Results\n"
    "cmd_run = {cmd}\n"
    "if not {success}:\n"
    '\tG_LOGGER.critical(f"FAILED | Command: {{cmd_run}}")\n'
    'G_LOGGER.finish(f"PASSED | Command: {{cmd_run}}")\n',
    cmd=cmd_run,
    success=SUCCESS_VAR_NAME,
)
script.append_suffix(exit_status)

if args.gen_script:
    script.save(args.gen_script)
else:
    exec(str(script))

其中

print(str(script))
#!/usr/bin/env python3
# Template auto-generated by polygraphy [v0.38.0] on 08/14/22 at 10:30:37
# Generation Command: /home/oldpan/anaconda3/envs/pytorch/bin/polygraphy run /home/oldpan/code/data/orignal_model/yolov5s.onnx --trt --validate --trt-outputs mark all --save-results=trt_out.json
# This script runs /home/oldpan/code/data/orignal_model/yolov5s.onnx using TensorRT.

from polygraphy.logger import G_LOGGER

from polygraphy import constants, util
from polygraphy.backend.trt import EngineFromNetwork, ModifyNetworkOutputs, NetworkFromOnnxPath, TrtRunner
from polygraphy.comparator import Comparator
import sys

# Loaders
parse_network_from_onnx = NetworkFromOnnxPath('/home/oldpan/code/data/orignal_model/yolov5s.onnx')
modify_network = ModifyNetworkOutputs(parse_network_from_onnx, outputs=constants.MARK_ALL)
build_engine = EngineFromNetwork(modify_network)

# Runners
runners = [
    TrtRunner(build_engine),
]

# Runner Execution
results = Comparator.run(runners)

# Save results
results.save('trt_out.json')

success = True
# Validation
success &= Comparator.validate(results, check_inf=True, check_nan=True)

# Report Results
cmd_run = ' '.join(sys.argv)
if not success:
    G_LOGGER.critical(f"FAILED | Command: {cmd_run}")
G_LOGGER.finish(f"PASSED | Command: {cmd_run}")

可以看到被翻译为了python字节码

LOAD_CONST(0), LOAD_CONST(('G_LOGGER',)), IMPORT_NAME(polygraphy.logger), IMPORT_FROM(G_LOGGER), STORE_NAME(G_LOGGER), POP_TOP

LOAD_CONST(0), LOAD_CONST(('constants', 'util')), IMPORT_NAME(polygraphy), IMPORT_FROM(constants), STORE_NAME(constants), IMPORT_FROM(util), STORE_NAME(util), POP_TOP
LOAD_CONST(0), LOAD_CONST(('EngineFromNetwork', 'ModifyNetworkOutputs', 'NetworkFromOnnxPath', 'TrtRunner')), IMPORT_NAME(polygraphy.backend.trt), IMPORT_FROM(EngineFromNetwork), STORE_NAME(EngineFromNetwork), IMPORT_FROM(ModifyNetworkOutputs), STORE_NAME(ModifyNetworkOutputs), IMPORT_FROM(NetworkFromOnnxPath), STORE_NAME(NetworkFromOnnxPath), IMPORT_FROM(TrtRunner), STORE_NAME(TrtRunner), POP_TOP
LOAD_CONST(0), LOAD_CONST(('Comparator',)), IMPORT_NAME(polygraphy.comparator), IMPORT_FROM(Comparator), STORE_NAME(Comparator), POP_TOP
LOAD_CONST(0), LOAD_CONST(None), IMPORT_NAME(sys), STORE_NAME(sys)


parse_network_from_onnx = NetworkFromOnnxPath('/home/oldpan/code/data/orignal_model/yolov5s.onnx')
LOAD_NAME(ModifyNetworkOutputs), LOAD_NAME(parse_network_from_onnx), LOAD_NAME(constants.MARK_ALL), LOAD_CONST(('outputs',)), CALL_FUNCTION_KW{2}, STORE_NAME(modify_network)
build_engine = EngineFromNetwork(modify_network)


BUILD_LIST{1}, STORE_NAME(runners)
TrtRunner(build_engine)



LOAD_NAME(Comparator), LOAD_METHOD(run), LOAD_NAME(runners), CALL_METHOD{1}, STORE_NAME(results)


LOAD_NAME(results), LOAD_METHOD(save), LOAD_CONST('trt_out.json'), CALL_METHOD{1}, POP_TOP

success = True

LOAD_NAME(success), LOAD_NAME(Comparator.validate), LOAD_NAME(results), LOAD_CONST(True), LOAD_CONST(True), LOAD_CONST(('check_inf', 'check_nan')), CALL_FUNCTION_KW{3}, INPLACE_AND, STORE_NAME(success)


LOAD_CONST(' '), LOAD_METHOD(join), LOAD_NAME(sys.argv), CALL_METHOD{1}, STORE_NAME(cmd_run)
LOAD_NAME(success), POP_JUMP_IF_TRUE{188}
LOAD_NAME(G_LOGGER), LOAD_METHOD(critical), LOAD_CONST('FAILED | Command: '), LOAD_NAME(cmd_run), FORMAT_VALUE{(None, False)}, BUILD_STRING{2}, CALL_METHOD{1}, POP_TOP
LOAD_NAME(G_LOGGER), LOAD_METHOD(finish), LOAD_CONST('PASSED | Command: '), LOAD_NAME(cmd_run), FORMAT_VALUE{(None, False)}, BUILD_STRING{2}, CALL_METHOD{1}, POP_TOP, return None

参考资料

1 个赞