Introduction
创建可移植的CUDA应用程序和库,使其适用于各种NVIDIA平台和软件环境有时很重要。 NVIDIA在不同级别上提供CUDA兼容性。
在本博客文章中,我想从CUDA应用程序或库兼容性(与GPU架构)、CUDA Runtime兼容性(与CUDA应用程序或库)以及CUDA Driver兼容性(与CUDA Runtime库)的角度讨论CUDA前向和后向兼容性。
CUDA Application Compatibility
For simplicity, let’s first assume our CUDA application or library has no dependency on other CUDA libraries, such as cuDNN, cuBLAS, etc. If we have a computer that has an NVIDIA GPU of an old architecture, we want to build a CUDA application or library that also runs on a computer that has an NVIDIA GPU of a new architecture or something even more future. NVCC compilation can generate PTX code when we build the CUDA application or library on the computer has an NVIDIA GPU of an old architecture as part of the compiled binary file. On the computer that has the NVIDIA GPU of the new architecture, when the CUDA application or library is executed, the PTX code will be JIT compiled to binaries for the new architecture by CUDA Runtime, therefore the application or software built on the computer has an NVIDIA GPU of an old architecture can be forward compatible on a computer has an NVIDIA GPU of a new architecture.
Of course, the drawback of being forward compatible is that the PTX code generated for the old architecture cannot take the advantage of the new feature of the new architecture which potentially will bring a large performance gain. We will not discuss performance in this article as compatibility is what we want to achieve.
On the contrary, if we have a computer that has an NVIDIA GPU of a new architecture, we want to build a CUDA application or library that also runs on a computer that has an NVIDIA GPU of an old architecture. NVCC compilation allows us to specify not only the PTX code but also the compiled binary generation for an old architecture. On the computer that has the NVIDIA GPU of the old architecture, when the CUDA application or library is executed, it will be executed directly if it has been compiled to binaries for the architecture, or the PTX code will be JIT compiled to binaries for the old architecture by CUDA Runtime, therefore the application or software built on the computer has an NVIDIA GPU of a new architecture can be backward compatible on a computer has an NVIDIA GPU of an old architecture.
The forward compatibility can be disabled by disabling the generation of PTX code as part of the binary file. The backward compatibility can be disabled by disabling the generation of PTX code and old architecture specific binaries as part of the binary file.
Now, if our CUDA application or library has dependencies on other CUDA libraries, such as cuDNN, cuBLAS, in order to archive the forward or backward compatibilities, those CUDA libraries should also be built to have the same forward or backward compatibilities as our CUDA application or library. However, this is sometimes not the case, which makes it impossible for our CUDA application or library being forward or backward compatible. The application or developer should always check carefully about the dependency library compatibilities beforehand.
为了简单起见,让我们首先假设我们的CUDA应用程序或库不依赖于其他CUDA库,例如cuDNN、cuBLAS等。如果我们有一台计算机具有旧架构的NVIDIA GPU,则希望构建一个CUDA应用程序或库,该应用程序或库也可以在具有新架构的NVIDIA GPU甚至更未来的计算机上运行。NVCC编译可以在将CUDA应用程序或库构建为已编译二进制文件的一部分时,在具有旧架构的NVIDIA GPU上生成PTX代码。当执行CUDA应用程序或库的计算机具有新架构的NVIDIA GPU时,PTX代码将由CUDA Runtime JIT编译为新体系结构的二进制文件,因此在计算机上安装了旧版 NVIDIA GPU 的电脑上可向前兼容。
当然,向前兼容性缺点是针对旧体系结构生成PTX代码无法利用新体系结构中潜在带来大量性能提升功能。本文不会讨论性能问题,因为我们想要实现兼容性。
相反地,如果我们拥有一台具备新体系结构 NVIDIA GPU 的计算机,则希望创建一个 CUDA 应用程序或库也可以运行于拥有老式 NVIDIA 架构GPU 的电脑上。NVCC编译允许我们指定不仅为旧体系结构生成PTX代码,还可以为其生成已编译的二进制文件。当在具有旧架构的NVIDIA GPU上执行CUDA应用程序或库时,如果已将其编译为该架构的二进制文件,则会直接执行它;否则,CUDA Runtime 将 JIT 编译 PTX 代码以便于在老式 NVIDIA 架构GPU 上运行。因此,在计算机上安装了新版 NVIDIA GPU 的电脑上可向后兼容。
通过禁用作为二进制文件一部分的PTX代码生成来禁用向前兼容性。通过禁用作为二进制文件和旧体系结构特定二进制文件一部分的PTX代码生成来禁用向后兼容性。
现在,如果我们的 CUDA 应用程序或库依赖于其他 CUDA 库(例如 cuDNN、cuBLAS),则这些 CUDA 库也应该被建立成与我们的 CUDA 应用程序或库相同的前向或后向兼容性才能实现前向或后向兼容性。然而,这有时并非如此,这使得我们无法实现CUDA应用程序或库之间互相之间进行前/后 向 兼 容 。开发人员应始终事先仔细检查依赖项库是否具有适当版本号以确保正确使用。
CUDA Runtime Compatibility
CUDA Runtime library is a library that CUDA applications or libraries will always have to link to, sometimes without having to specify explicitly, during building in most cases. The exceptions are there are CUDA applications or libraries that link to CUDA Driver library. Therefore, for released CUDA software, such as cuDNN, it will always mention the version of the CUDA (Runtime library) it links to. Sometimes, there can be multiple builds for different versions of CUDA Runtime libraries as well. So the CUDA application compatibility also depends on CUDA Runtime.
However, there are sometimes scenarios where the CUDA Runtime that our CUDA application or library links to at build time is different from the CUDA Runtime library in the execution environment. To address these problems, CUDA Runtime library provides minor version (forward and/or backward) compatibilities, provided that the NVIDIA Driver requirement is satisfied.
The reason why CUDA explicitly mention the compatibility is minor version compatibility is, because CUDA Runtime API might be different between different major versions, the application or library that uses those APIs and builds against the CUDA Runtime of one major version might not be able to run with a CUDA Runtime of another major version. For example, cuDNN 8.6 for CUDA 10.2 cannot be wrong with CUDA 11.2. In fact, if we check the linked shared libraries of a CUDA application or library using ldd
, we will often see that the CUDA Runtime library major version and no minor version is specified. For CUDA Runtime libraries that differs in minor versions, the CUDA Runtime APIs are usually the same, therefore, archiving minor version compatibility becomes possible.
CUDA运行时库是CUDA应用程序或库在大多数情况下都必须链接的库,有时无需显式指定即可进行构建。例外情况是有些CUDA应用程序或库会链接到CUDA驱动程序库。因此,对于已发布的CUDA软件(如cuDNN),它将始终提及其所链接的CUDA(运行时)版本。有时也可能存在不同版本的CUDA运行时库的多个构建。因此,CUDA应用程序兼容性也取决于CUDA运行时。
然而,在某些情况下,我们的CUDA应用程序或库在构建期间链接到的 CUDA 运行时与执行环境中 CUDA 运行时库不同。为了解决这些问题, CUDA 运行时提供了次要版本(前向和/或后向)兼容性,前提是满足 NVIDIA 驱动程序要求。
之所以 CUDA 明确说明次要版本兼容性是因为 CUDA 运行时 API 在不同主要版本之间可能会有所不同,使用这些 API 并针对一个主要版本的 CUDA 运行时间进行构建 的 应用 程序 或 库 可能 无法 使用 另一 个 主 要 版本 的 CU DA Runtime 来运 行 。例如 ,CU DNN 8.6 for CUD A 10.2 不 能 错误 地使 用 CU DA11 .2 。实际上 ,如果我们使用 ldd
检查 CUDA 应用程序或库的链接共享库,我们经常会看到CUDA运行时库主要版本而没有指定次要版本。对于不同次要版本的CUDA运行时库,CUDA运行时API通常是相同的,因此实现次要版本兼容性变得可能。
CUDA Driver Compatibility
CUDA Runtime library is a library that builds application components before run, and CUDA Driver library is a library that actually runs the application. So the CUDA Runtime compatibility also depends on CUDA Driver. Although each version of the CUDA Toolkit releases ships both CUDA Runtime library and CUDA Driver library that are compatible with each other, they can come from different sources and be installed separately.
CUDA Driver library is always backward compatible. Using the latest driver allows us to run CUDA applications with old CUDA Runtime libraries. The CUDA Driver library forward compatibility, which is sometimes required from data center computers that focus on stability, is more complicated and requires installing additional libraries. We will not elaborate too much on this in this article.
CUDA运行时库是在运行之前构建应用程序组件的库,而CUDA驱动程序库是实际运行应用程序的库。因此,CUDA运行时兼容性也取决于CUDA驱动程序。尽管每个版本的CUDA工具包都会发布与彼此兼容的CUDA运行时库和CUDA驱动程序库,但它们可以来自不同的来源并单独安装。
CUDA驱动程序库始终向后兼容。使用最新的驱动程序可以让我们使用旧版的CUDA Runtime 库来运行 CUDA 应用程序。需要注意到,在数据中心计算机上保持稳定性有时需要 CUDA 驱动 向前兼容,这更加复杂,并且需要安装其他库文件。本文将不会过多阐述这一点。
NVIDIA Docker
NVIDIA Docker is a convenient tool that allows the user to develop and deploy CUDA applications in a portable, reproducible and scalable way. With NVIDIA Docker, we could run build and run any CUDA applications in any CUDA environment we want, provided that the Driver and GPU architecture requirements are satisfied.
NVIDIA Docker是一个方便的工具,允许用户以可移植、可重现和可扩展的方式开发和部署CUDA应用程序。使用NVIDIA Docker,我们可以在任何想要的CUDA环境中运行构建和运行任何CUDA应用程序,只要满足驱动程序和GPU架构要求即可。
NVIDIA Docker
From the above diagram, we could see that the CUDA Driver library is from the host operating system, sits under the Docker engine, and is mapped to the Docker container that installs a CUDA Runtime library of any version. The entire CUDA application design and run flow follows “Application or Library → CUDA Runtime (Library) → CUDA Driver (Library) → NVIDIA GPU”. Fundamentally, it is the CUDA Driver backward compatibility that allows us to run almost any CUDA Runtime library inside the Docker container, provided that the CUDA Driver library from host is always up-to-date.
从上图中,我们可以看到CUDA驱动程序库来自主机操作系统,位于Docker引擎下,并映射到安装任何版本的CUDA运行时库的Docker容器。整个CUDA应用程序设计和运行流程遵循“应用程序或库-> CUDA运行时(库)-> CUDA驱动程序(库)-> NVIDIA GPU”。从根本上讲,正是CUDA驱动程序向后兼容性使得我们能够在Docker容器内运行几乎任何CUDA Runtime库,前提是主机上的CUDA Driver library始终保持最新状态。