• torchscript不会被废弃,未来继续被支持,之后也可以用基于torchscript的dynamo
  • 目前没有torchinductor取代torchscript-graph的计划,也正在寻找通过triton找到kernel并且导出到C++的方式,也在尝试能否把这些kernel搞到torchscript图中
  • 在C++中编写PyTorch风格的代码。这些使用C++ API的用户喜欢PyTorch的Python API,并希望以与Python相同的方式直接在C++中编写其模型,但使用torch::empty()、C++ NN模块等来降低开销或消除GIL。以此方式编写的模型无法直接使用Dynamo,因为Dynamo完全基于Python字节码分析,并且开发者没有解决这个问题的计划
  • 如果我们迁移到C++是为了消除python端的开销,PT2可能之后会在这方面做的很好了,如果是这样还不如直接在python端开发
  • 网络中的某个graph,可以通过Inductor进行端到端编译并获得收益,可以在Python中捕获这些运算符,然后让Inductor提前导出融合内核以便从C++调用,目前还不存在这个功能,计划在下半年推出
  • 关于导出模型在C++中跑(无python环境)的进展,可以关注 Torch.compile’s deployment story to non-Python host processes

There are a number of different use cases for C++ frontend, which are worth stepping through individually, since PT2 has different implications for them.

Write PyTorch-style code in C++. These users of the C++ API liked PyTorch’s Python API and want to directly code their models the same way they did in Python, but using torch::empty(), C++ NN Modules, etc in C++, for lower overhead or removal of the GIL. There is no way for models written in this way to directly use Dynamo, since Dynamo is entirely predicated on Python bytecode analysis, and we no plans for actually solving this. Additionally, in the limit, PT2 is supposed to remove all of the Python-side overhead that might have originally induced you to port your code to C++, so if you don’t have requirements for Python-less deploy (more on this below), we would hope that the next models you write can be done back in Python.

That being said, it is still possible to make use of PT2 as a tool. You have a few pathways for doing this:

  • You have identified a region of your graph which can profitably be compiled end-to-end with Inductor. You can capture these operators in Python, and then have Inductor export the fused kernel ahead-of-time, to be invoked from C++. This does not exist today but is on our roadmap for this half.
  • You could use lazy tensor to capture all of the operations and then hand it to our compiler stack. The compiler stack is still in Python, but at runtime, in principle, Python can be excluded from the hotpath. You would run into some trouble if you needed dynamic shapes, but C++ code can be manually rewritten to symbolically trace integers if necessary.

C++ API as a deployment mechanism. We fully intend to support this via the “export” workflow. In export, we trace an entire model written in Python and produce it some serialization format, which can be loaded by a C++ runtime to be executed. The outputted model may or may not have had optimizations applied to it; this is up in the air. Our current work is on serialization to mobile devices, where Inductor-style compilation doesn’t make sense, but in this half we are also working on server-side export. You should be able to chain these models with other modules to the result.

Hope that helps.