triton-inference-server的model_analysis配置

Configuring Model Analyzer

Model Analyzer can be configured using either a YAML config file, the `command line interface* (CLI), or a combination of both.

  • Every flag supported by Model Analyzer can be configured using a YAML config file
  • Only a subset of flags can be configured using the CLI

The placeholders listed below are used throughout the configuration:

  • <boolean>: a boolean that can take true or false as value
  • <string>: a regular string
  • <comma-delimited-list>: a list of comma separated items
  • <int>: a regular integer value
  • <list>: a list of values
  • <range>: An object containing start and stop key with an optional step
    value
    • If step is not defined, 1 is the default step value
    • Types that support <range> can be described by a list or by using the example
      structure below, which declares the value of batch_sizes to be an array [2, 4, 6]
batch_sizes:
  start: 2
  stop: 6
  step: 2
  • <dict>: a set of key-value pairs
triton_server_flags:
  log_verbose: True
  exit_timeout_secs: 120

来自:model_analyzer/config.md at main · triton-inference-server/model_analyzer · GitHub

Checkpointing in Model Analyzer

The Model Analyzer writes the collected measurements to checkpoint files when profiling. These are located within the specified checkpoint directory (See Config Defaults section for default location). Checkpoint files are used to create data table, summaries and detailed reports.

When is Checkpointing Done?

Model Analyzer saves a checkpoint in multiple circumstances:

  1. Model Analyzer will save a checkpoint after all the perf analyzer runs for a given model are complete.
  2. The user can initiate an early exit from profiling using CTRL-C (SIGINT). This will wait for the current perf analyzer run to finish and then save a checkpoint before exiting.
  3. If the user needs to exit immediately, they send the SIGINT 3 times. In this case, Model Analyzer will save a checkpoint and exit immediately.

Checkpoint Naming Scheme

When a profiling run completes:

$ model-analyzer profile -m example_model_repo --profile-models example_model_1,example_model_2
2021-05-13 19:57:05.87 INFO[entrypoint.py:98] Starting a local Triton Server...
2021-05-13 19:57:05.92 INFO[server_local.py:64] Triton Server started.
2021-05-13 19:57:09.234 INFO[server_local.py:81] Triton Server stopped.
2021-05-13 19:57:09.235 INFO[analyzer_state_manager.py:118] No checkpoint file found, starting a fresh run.
.
.
.
2021-05-13 19:58:01.625 INFO[analyzer.py:110] Finished profiling. Obtained measurements for models: ['example_model_1', 'example_model_2']

In the checkpoint directory, there will be 2 checkpoints.

$ ls -l checkpoints
-rw-r--r-- 1 root root 11356 May 11 20:00 0.ckpt
-rw-r--r-- 1 root root 11356 May 13 19:58 1.ckpt

Checkpoints are named using consecutive non-negative integers. On startup, Model Analyzer identifies the latest checkpoint (highest integer) and loads it. If there are any changes to the data in the checkpoint, the checkpoint index is incremented before it is saved again, thus creating a new latest checkpoint.

Note: Model analyzer does not clean up old checkpoints. It merely guarantees that the checkpoint with the highest integer index is the one with the most up-to-date measurements. The checkpoint directory should be removed between consecutive runs of the model-analyzer profile command if you want to start a fresh run.