NVIDIA Deep Learning Performance

整理一下,一共分很多章。

NVIDIA Deep Learning Performance Documentation - Last updated February 1, 2023

NVIDIA Deep Learning Performance


  • Get Started With Deep Learning Performance
    This is the landing page for our deep learning performance documentation. This page provides recommendations that apply to most deep learning operations. It also provides links, short explanations of other performance documents, and how these pages fit together.

Training


  • Train With Mixed Precision
    Mixed precision methods combine the use of different numerical formats in one computational workload. This document describes the application of mixed precision to deep neural network training.

Recommendation Systems


  • Best Practices for Building and Deploying Recommender Systems
    This document describes the best practices for building and deploying large-scale recommender systems using NVIDIA GPUs. These practices are the culmination of years of research and development in GPU-accelerated tools for recommender systems, as well as building recommender systems for our in-house products and top-performing solutions for international recommendation systems competitions.

Optimizing Performance


  • Linear/Fully-Connected Layers User’s Guide
    This guide provides tips for improving the performance of fully-connected (or linear) layers. It also provides an example of the impact of the parameter choice with layers in the Transformer network.
  • Convolutional Layers User’s Guide
    This guide provides tips for improving the performance of convolutional layers. It also provides details on the impact of parameters including batch size, input and filter dimensions, stride, and dilation.
  • Recurrent Layers User’s Guide
    This guide provides tips for improving the performance of recurrent layers. It also provides an example of use cases for persistence with layers in the GNMT system.
  • Memory-Limited Layers User’s Guide
    This guide describes the performance of memory-limited layers including batch normalization, activations, and pooling. It also provides tips for understanding and reducing the time spent on these layers within a network.

Performance Background


  • GPU Performance Background User’s Guide
    This guide provides background on the structure of a GPU, how operations are executed, and common limitations with deep learning operations.
  • Matrix Multiplication Background User’s Guide
    This guide describes matrix multiplications and their use in many deep learning operations. The trends described here form the basis of performance trends in fully-connected, convolutional, and recurrent layers, among others.

参考

额外的相关参考: