基本操作
查看
Inspect A ONNX Network
(pytorch) oldpan@oldpan-ROG-Flow-X13-GV301QE-GV301QE:~/software/tensorrt/TensorRT-8.4.1.5/bin$ polygraphy inspect model /home/oldpan/code/data/orignal_model/Resnet34_3inputs_448x448_20200609.onnx
[I] Loading model: /home/oldpan/code/data/orignal_model/Resnet34_3inputs_448x448_20200609.onnx
[I] ==== ONNX Model ====
Name: torch-jit-export | ONNX Opset: 9
---- 3 Graph Input(s) ----
{input.1 [dtype=float32, shape=(1, 3, 448, 448)],
input.4 [dtype=float32, shape=(1, 3, 448, 448)],
input.7 [dtype=float32, shape=(1, 3, 448, 448)]}
---- 4 Graph Output(s) ----
{504 [dtype=float32, shape=(1, 48, 28, 28)],
499 [dtype=float32, shape=(1, 24, 28, 28)],
530 [dtype=float32, shape=(1, 2016, 28, 28)],
516 [dtype=float32, shape=(1, 672, 28, 28)]}
---- 337 Initializer(s) ----
---- 191 Node(s) ----
Inspecting A TensorRT Network
- 显示未优化前tensorrt-network的层信息:
(pytorch) oldpan@oldpan-ROG-Flow-X13-GV301QE-GV301QE:~/software/tensorrt/TensorRT-8.4.1.5/bin$ polygraphy inspect model /home/oldpan/code/data/orignal_model/Resnet34_3inputs_448x448_20200609.onnx --show layers --display-as=trt
[08/11/2022-22:13:05] [TRT] [W] onnx2trt_utils.cpp:369: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[I] ==== TensorRT Network ====
Name: Unnamed Network 0 | Explicit Batch Network
---- 3 Network Input(s) ----
{input.1 [dtype=float32, shape=(1, 3, 448, 448)],
input.4 [dtype=float32, shape=(1, 3, 448, 448)],
input.7 [dtype=float32, shape=(1, 3, 448, 448)]}
---- 4 Network Output(s) ----
{504 [dtype=float32, shape=(1, 48, 28, 28)],
499 [dtype=float32, shape=(1, 24, 28, 28)],
530 [dtype=float32, shape=(1, 2016, 28, 28)],
516 [dtype=float32, shape=(1, 672, 28, 28)]}
---- 191 Layer(s) ----
Layer 0 | node_of_340 [Op: LayerType.CONVOLUTION]
{input.1 [dtype=float32, shape=(1, 3, 448, 448)]}
-> {340 [dtype=float32, shape=(1, 64, 224, 224)]}
Layer 1 | node_of_341 [Op: LayerType.SCALE]
{340 [dtype=float32, shape=(1, 64, 224, 224)]}
-> {341 [dtype=float32, shape=(1, 64, 224, 224)]}
Layer 2 | node_of_342 [Op: ActivationType.RELU]
{341 [dtype=float32, shape=(1, 64, 224, 224)]}
-> {342 [dtype=float32, shape=(1, 64, 224, 224)]}
...
Layer 189 | node_of_529 [Op: LayerType.CONVOLUTION]
{528 [dtype=float32, shape=(1, 1344, 28, 28)]}
-> {529 [dtype=float32, shape=(1, 2016, 28, 28)]}
Layer 190 | node_of_530 [Op: ActivationType.TANH]
{529 [dtype=float32, shape=(1, 2016, 28, 28)]}
-> {530 [dtype=float32, shape=(1, 2016, 28, 28)]}
- 显示已经优化后(生成engine后)的trt层信息
(pytorch) oldpan@oldpan-ROG-Flow-X13-GV301QE-GV301QE:~/software/tensorrt/TensorRT-8.4.1.5/bin$ polygraphy inspect model Resnet34_3inputs_448x448_20200609.trt --model-type engine
[I] Loading bytes from /home/oldpan/software/tensorrt/TensorRT-8.4.1.5/targets/x86_64-linux-gnu/bin/Resnet34_3inputs_448x448_20200609.trt
[I] ==== TensorRT Engine ====
Name: Unnamed Network 0 | Explicit Batch Engine
---- 3 Engine Input(s) ----
{input.1 [dtype=float32, shape=(1, 3, 448, 448)],
input.4 [dtype=float32, shape=(1, 3, 448, 448)],
input.7 [dtype=float32, shape=(1, 3, 448, 448)]}
---- 4 Engine Output(s) ----
{499 [dtype=float32, shape=(1, 24, 28, 28)],
504 [dtype=float32, shape=(1, 48, 28, 28)],
516 [dtype=float32, shape=(1, 672, 28, 28)],
530 [dtype=float32, shape=(1, 2016, 28, 28)]}
---- Memory ----
Device Memory: 25690112 bytes
---- 1 Profile(s) (7 Binding(s) Each) ----
- Profile: 0
Binding Index: 0 (Input) [Name: input.1] | Shapes: min=(1, 3, 448, 448), opt=(1, 3, 448, 448), max=(1, 3, 448, 448)
Binding Index: 1 (Input) [Name: input.4] | Shapes: min=(1, 3, 448, 448), opt=(1, 3, 448, 448), max=(1, 3, 448, 448)
Binding Index: 2 (Input) [Name: input.7] | Shapes: min=(1, 3, 448, 448), opt=(1, 3, 448, 448), max=(1, 3, 448, 448)
Binding Index: 3 (Output) [Name: 499] | Shape: (1, 24, 28, 28)
Binding Index: 4 (Output) [Name: 504] | Shape: (1, 48, 28, 28)
Binding Index: 5 (Output) [Name: 516] | Shape: (1, 672, 28, 28)
Binding Index: 6 (Output) [Name: 530] | Shape: (1, 2016, 28, 28)
---- 73 Layer(s) ----
运行
2021 polygraphy run /home/oldpan/code/models/GPT/LLAMA/alpaca.onnx/decoder-merge-5.onnx --onnxrt
2022 pip install colored
2023 polygraphy run /home/oldpan/code/models/GPT/LLAMA/alpaca.onnx/decoder-merge-5.onnx --onnxrt --data-loader-script /home/oldpan/code/convert/tools/data_loader.py
2024 polygraphy run /home/oldpan/code/models/GPT/LLAMA/alpaca.onnx/decoder-merge-5.onnx --onnxrt --data-loader-script /home/oldpan/code/convert/tools/data_loader.py --save-results=/home/oldpan/code/data/debug_data/output-debug/decoder-merge-5-onnx-output.json
2025 history
调试
保存模型的输出用于对比debug
# 输出ONNX模型的所有输出节点结果
polygraphy run /home/oldpan/code/data/orignal_model/yolov5s.onnx --onnxrt --onnx-outputs mark all --save-results=onnx_output_all.json
polygraphy run --onnxrt /home/oldpan/code/models/GPT/LLAMA/alpaca.onnx/decoder-merge-4.onnx --save-results=/home/oldpan/code/data/debug_data/output-debug/decoder-merge-4-onnx-output.json --data-loader-script /home/oldpan/code/convert/tools/data_loader.py
polygraphy run /home/oldpan/code/models/normal/tensorrt_engine/llama-trt/decoder-merge-4.trt --model-type engine --trt --data-loader-script /home/oldpan/code/convert/tools/data_loader.py --load-outputs /home/oldpan/code/data/debug_data/output-debug/decoder-merge-4-onnx-output.json --atol 1e-2 --rtol 1e-3
from polygraphy.comparator import RunResults
results = RunResults.load("/home/oldpan/code/data/debug_data/onnx_output_all.json")
for runner_name, iterations in results.items():
for iteration in iterations:
for tensor_name, value in iteration.items():
print(tensor_name, value)
检查模型输出是否有nan
polygraphy run --onnxrt identity_fp16.onnx --validate
- 标记所有TRT-Network中的每一个层作为输出层,因为network是未做优化的层,
polygraphy run /home/oldpan/code/data/orignal_model/yolov5s.onnx --trt --validate --trt-outputs mark all --save-results=trt_out.json
[08/14/2022-10:13:32] [TRT] [W] onnx2trt_utils.cpp:369: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[08/14/2022-10:13:32] [TRT] [W] onnx2trt_utils.cpp:395: One or more weights outside the range of INT32 was clamped
[08/14/2022-10:13:32] [TRT] [W] onnx2trt_utils.cpp:395: One or more weights outside the range of INT32 was clamped
[08/14/2022-10:13:32] [TRT] [W] onnx2trt_utils.cpp:395: One or more weights outside the range of INT32 was clamped
[08/14/2022-10:13:33] [TRT] [W] TensorRT was linked against cuDNN 8.4.1 but loaded cuDNN 8.2.4
[08/14/2022-10:13:58] [TRT] [W] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
[08/14/2022-10:13:58] [TRT] [W] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
[I] trt-runner-N0-08/14/22-10:13:29 | Activating and starting inference
[I] Configuring with profiles: [Profile().add('images', min=[1, 3, 640, 640], opt=[1, 3, 640, 640], max=[1, 3, 640, 640])]
[I] Building engine with configuration:
Workspace | 16777216 bytes (16.00 MiB)
Precision | TF32: False, FP16: False, INT8: False, Obey Precision Constraints: False, Strict Types: False
Tactic Sources | ['CUBLAS', 'CUBLAS_LT', 'CUDNN', 'EDGE_MASK_CONVOLUTIONS']
Safety Restricted | False
Profiles | 1 profile(s)
[I] Finished engine building in 26.167 seconds
[I] trt-runner-N0-08/14/22-10:13:29
---- Inference Input(s) ----
{images [dtype=float32, shape=(1, 3, 640, 640)]}
[I] trt-runner-N0-08/14/22-10:13:29
---- Inference Output(s) ----
{onnx::Reshape_517 [dtype=int32, shape=(3,)],
onnx::Reshape_507 [dtype=int32, shape=(3,)],
onnx::Resize_487 [dtype=float32, shape=(4,)],
onnx::Sigmoid_125 [dtype=float32, shape=(1, 32, 320, 320)],
onnx::Mul_126 [dtype=float32, shape=(1, 32, 320, 320)],
...
onnx::Concat_484 [dtype=float32, shape=(1, 1200, 85)],
output [dtype=float32, shape=(1, 25200, 85)]}
[I] trt-runner-N0-08/14/22-10:13:29 | Completed 1 iteration(s) in 182.3 ms | Average inference time: 182.3 ms.
[I] Saving inference results to trt_out.json
[I] Output Validation | Runners: ['trt-runner-N0-08/14/22-10:13:29']
转模型
convert onnx to trt
Build an engine with 3 separate profiles
polygraphy convert dynamic_identity.onnx -o dynamic_identity.engine \
--trt-min-shapes X:[1,3,28,28] --trt-opt-shapes X:[1,3,28,28] --trt-max-shapes X:[1,3,28,28] \
--trt-min-shapes X:[1,3,28,28] --trt-opt-shapes X:[4,3,28,28] --trt-max-shapes X:[32,3,28,28] \
--trt-min-shapes X:[128,3,28,28] --trt-opt-shapes X:[128,3,28,28] --trt-max-shapes X:[128,3,28,28]
TIP: If we want to use only a single profile where min == opt == max, we can leverage the runtime input shapes option: --input-shapes as a conveneint shorthand instead of setting min/opt/max separately.
进阶操作
修改ONNX结构
修改结构
源码解析
需要注意的是
/data/oldpan/code/TensorRT/tools/Polygraphy/polygraphy/tools/debug/subtool/precision.py
marked_indices = set()
for index in indices:
layer = network.get_layer(index)
if layer.type == trt.LayerType.SHAPE or layer.type == trt.LayerType.IDENTITY or layer.type == trt.LayerType.SLICE or \
layer.type == trt.LayerType.SHUFFLE or layer.type == trt.LayerType.CONCATENATION or layer.type == trt.LayerType.CONSTANT \
or layer.type == trt.LayerType.GATHER:
关于Debug
pip install colored polygraphy --extra-index-url https://pypi.ngc.nvidia.com
polygraphy run demo_simplify.onnx \ –trt --onnxrt \ –trt-outputs mark all\ –onnx-outputs mark all \ –atol 1e-2 --rtol 1e-3 \ –fail-fast\ –val-range[0,1]
很多人只是会使用,但是
实际在调用Polygraph的时候会:
G_LOGGER.verbose(f"Loaded extension modules: {self.loaded_plugins}")
if self.arg_groups[ModelArgs].path is None and self.arg_groups[RunnerSelectArgs].runners:
G_LOGGER.critical(
"One or more runners was specified, but no model file was provided. Make sure you've specified the model path, "
"and also that it's not being consumed as an argument for another parameter"
)
script = Script(
summary=generate_summary(
self.arg_groups[ModelArgs].path,
self.arg_groups[RunnerSelectArgs].runners.values(),
self.arg_groups[ComparatorCompareArgs].load_outputs_paths,
)
)
self.arg_groups[LoggerArgs].add_to_script(script)
self.arg_groups[RunnerSelectArgs].add_to_script(script)
RESULTS_VAR_NAME = self.arg_groups[ComparatorRunArgs].add_to_script(script)
SUCCESS_VAR_NAME = self.arg_groups[ComparatorCompareArgs].add_to_script(script, results_name=RESULTS_VAR_NAME)
script.add_import(imports=["sys"])
cmd_run = inline(safe("' '.join(sys.argv)"))
exit_status = safe(
"# Report Results\n"
"cmd_run = {cmd}\n"
"if not {success}:\n"
'\tG_LOGGER.critical(f"FAILED | Command: {{cmd_run}}")\n'
'G_LOGGER.finish(f"PASSED | Command: {{cmd_run}}")\n',
cmd=cmd_run,
success=SUCCESS_VAR_NAME,
)
script.append_suffix(exit_status)
if args.gen_script:
script.save(args.gen_script)
else:
exec(str(script))
其中
print(str(script))
#!/usr/bin/env python3
# Template auto-generated by polygraphy [v0.38.0] on 08/14/22 at 10:30:37
# Generation Command: /home/oldpan/anaconda3/envs/pytorch/bin/polygraphy run /home/oldpan/code/data/orignal_model/yolov5s.onnx --trt --validate --trt-outputs mark all --save-results=trt_out.json
# This script runs /home/oldpan/code/data/orignal_model/yolov5s.onnx using TensorRT.
from polygraphy.logger import G_LOGGER
from polygraphy import constants, util
from polygraphy.backend.trt import EngineFromNetwork, ModifyNetworkOutputs, NetworkFromOnnxPath, TrtRunner
from polygraphy.comparator import Comparator
import sys
# Loaders
parse_network_from_onnx = NetworkFromOnnxPath('/home/oldpan/code/data/orignal_model/yolov5s.onnx')
modify_network = ModifyNetworkOutputs(parse_network_from_onnx, outputs=constants.MARK_ALL)
build_engine = EngineFromNetwork(modify_network)
# Runners
runners = [
TrtRunner(build_engine),
]
# Runner Execution
results = Comparator.run(runners)
# Save results
results.save('trt_out.json')
success = True
# Validation
success &= Comparator.validate(results, check_inf=True, check_nan=True)
# Report Results
cmd_run = ' '.join(sys.argv)
if not success:
G_LOGGER.critical(f"FAILED | Command: {cmd_run}")
G_LOGGER.finish(f"PASSED | Command: {cmd_run}")
可以看到被翻译为了python字节码
LOAD_CONST(0), LOAD_CONST(('G_LOGGER',)), IMPORT_NAME(polygraphy.logger), IMPORT_FROM(G_LOGGER), STORE_NAME(G_LOGGER), POP_TOP
LOAD_CONST(0), LOAD_CONST(('constants', 'util')), IMPORT_NAME(polygraphy), IMPORT_FROM(constants), STORE_NAME(constants), IMPORT_FROM(util), STORE_NAME(util), POP_TOP
LOAD_CONST(0), LOAD_CONST(('EngineFromNetwork', 'ModifyNetworkOutputs', 'NetworkFromOnnxPath', 'TrtRunner')), IMPORT_NAME(polygraphy.backend.trt), IMPORT_FROM(EngineFromNetwork), STORE_NAME(EngineFromNetwork), IMPORT_FROM(ModifyNetworkOutputs), STORE_NAME(ModifyNetworkOutputs), IMPORT_FROM(NetworkFromOnnxPath), STORE_NAME(NetworkFromOnnxPath), IMPORT_FROM(TrtRunner), STORE_NAME(TrtRunner), POP_TOP
LOAD_CONST(0), LOAD_CONST(('Comparator',)), IMPORT_NAME(polygraphy.comparator), IMPORT_FROM(Comparator), STORE_NAME(Comparator), POP_TOP
LOAD_CONST(0), LOAD_CONST(None), IMPORT_NAME(sys), STORE_NAME(sys)
parse_network_from_onnx = NetworkFromOnnxPath('/home/oldpan/code/data/orignal_model/yolov5s.onnx')
LOAD_NAME(ModifyNetworkOutputs), LOAD_NAME(parse_network_from_onnx), LOAD_NAME(constants.MARK_ALL), LOAD_CONST(('outputs',)), CALL_FUNCTION_KW{2}, STORE_NAME(modify_network)
build_engine = EngineFromNetwork(modify_network)
BUILD_LIST{1}, STORE_NAME(runners)
TrtRunner(build_engine)
LOAD_NAME(Comparator), LOAD_METHOD(run), LOAD_NAME(runners), CALL_METHOD{1}, STORE_NAME(results)
LOAD_NAME(results), LOAD_METHOD(save), LOAD_CONST('trt_out.json'), CALL_METHOD{1}, POP_TOP
success = True
LOAD_NAME(success), LOAD_NAME(Comparator.validate), LOAD_NAME(results), LOAD_CONST(True), LOAD_CONST(True), LOAD_CONST(('check_inf', 'check_nan')), CALL_FUNCTION_KW{3}, INPLACE_AND, STORE_NAME(success)
LOAD_CONST(' '), LOAD_METHOD(join), LOAD_NAME(sys.argv), CALL_METHOD{1}, STORE_NAME(cmd_run)
LOAD_NAME(success), POP_JUMP_IF_TRUE{188}
LOAD_NAME(G_LOGGER), LOAD_METHOD(critical), LOAD_CONST('FAILED | Command: '), LOAD_NAME(cmd_run), FORMAT_VALUE{(None, False)}, BUILD_STRING{2}, CALL_METHOD{1}, POP_TOP
LOAD_NAME(G_LOGGER), LOAD_METHOD(finish), LOAD_CONST('PASSED | Command: '), LOAD_NAME(cmd_run), FORMAT_VALUE{(None, False)}, BUILD_STRING{2}, CALL_METHOD{1}, POP_TOP, return None