主要是翻译和总结并简化官方教程:CUDA C++ Programming Guide
主要是简化和简单化官方的教程,要不然官方这么一大坨看的挺费劲。同时也会参考其他博主的优质文章作为补充。
系列文章会发布在这里:
- CUDA C++ 编程指北-第一章 入门以及编程模型
- CUDA C++ 编程指北-第二章 编程接口
- CUDA C++ 编程指北-第三章 GPU硬件实现
- CUDA C++ 编程指北-第四章 性能提升指南
原始文章大纲:
- Introduction is a general introduction to CUDA.
- Programming Model outlines the CUDA programming model.
- Programming Interface describes the programming interface.
- Hardware Implementation describes the hardware implementation.
- Performance Guidelines gives some guidance on how to achieve maximum performance.
- CUDA-Enabled GPUs lists all CUDA-enabled devices.
- C++ Language Extensions is a detailed description of all extensions to the C++ language.
- Cooperative Groups describes synchronization primitives for various groups of CUDA threads.
- CUDA Dynamic Parallelism describes how to launch and synchronize one kernel from another.
- Virtual Memory Management describes how to manage the unified virtual address space.
- Stream Ordered Memory Allocator describes how applications can order memory allocation and deallocation.
- Graph Memory Nodes describes how graphs can create and own memory allocations.
- Mathematical Functions lists the mathematical functions supported in CUDA.
- C++ Language Support lists the C++ features supported in device code.
- Texture Fetching gives more details on texture fetching.
- Compute Capabilities gives the technical specifications of various devices, as well as more architectural details.
- Driver API introduces the low-level driver API.
- CUDA Environment Variables lists all the CUDA environment variables.
- Unified Memory Programming introduces the Unified Memory programming model.