Pytorch C++拓展多种方式

本文收集一些Pytorch C++部署extension相关的技术栈,希望依赖此实现的C++ code可以无缝嵌入到C++ runtime中。


torch.library is a collection of APIs for extending PyTorch’s core library of operators. It contains utilities for creating new custom operators as well as extending operators defined with PyTorch’s C++ operator registration APIs (e.g. aten operators).

/// This object provides the API for defining operators and providing
/// implementations at dispatch keys.  Typically, a torch::Library
/// is not allocated directly; instead it is created by the
/// Most methods on torch::Library return a reference to itself,
/// supporting method chaining.
/// ```
/// // Examples:
/// TORCH_LIBRARY(torchvision, m) {
///    // m is a torch::Library
///    m.def("roi_align", ...);
///    ...
/// }
/// TORCH_LIBRARY_IMPL(aten, XLA, m) {
///    // m is a torch::Library
///    m.impl("add", ...);
///    ...
/// }

Use torch.library.define() to define new custom operators. Use the impl methods, such as torch.library.impl() and func:torch.library.impl_abstract, to add implementations for any operators (they may have been created using torch.library.define() or via PyTorch’s C++ operator registration APIs).

All calls to custom operators must be made through the PyTorch dispatcher. That is, they must query the PyTorch Dispatcher for a TypedOperatorHandle and invoke TypedOperatorHandle::call:

#include <torch/library.h>

// Define the operator
TORCH_LIBRARY(your_namespace, m) {
    m.def("sin(Tensor x) -> Tensor");
// If you define operator schemas in multiple places, use TORCH_LIBRARY_FRAGMENT instead of TORCH_LIBRARY

How to add CPU/CUDA/Backend implementations

To provide backend-specific implementations for an operator, use TORCH_LIBRARY_IMPL.

Tensor custom_sin_cpu(const Tensor& x) {
    // Replace this with at::sin if you want to test it out.
    return my_custom_sin_implementation_on_cpu(x);

// Register the CPU implementation for the operator
TORCH_LIBRARY_IMPL(your_namespace, CPU, m) {
    m.impl("sin", &custom_sin_cpu);

Tensor custom_sin_cuda(const Tensor& x) {
    // Replace this with at::sin if you want to test it out.
    return my_custom_sin_implementation_on_cuda(x);

// Register the CUDA implementation for the operator
TORCH_LIBRARY_IMPL(your_namespace, CUDA, m) {
    m.impl("sin", &custom_sin_cuda);

How to invoke the custom op from C++

static auto custom_sin_op = torch::Dispatcher::singleton()
    .findSchemaOrThrow("your_namespace::sin", "")
Tensor result =

In order to invoke the custom operator, we must first query it from the PyTorch dispatcher and then invoke it.

How to invoke the custom op from Python

The C++ custom op gets compiled into a shared library. Use torch.ops.load_library(path_to_shared_library) to load the shared library.

Once the shared library has loaded, the custom op is available from the torch.ops namespace:

x = torch.randn(3)
y = torch.ops.your_namespace.sin(x)
assert torch.allclose(y, torch.sin(x))




Now that we have implemented our custom operator in C++ and written its registration code, it is time to build the operator into a (shared) library that we can load into Python for research and experimentation, or into C++ for inference in a no-Python environment. There exist multiple ways to build our operator, using either pure CMake, or Python alternatives like setuptools. For brevity, the paragraphs below only discuss the CMake approach. The appendix of this tutorial dives into other alternatives.