老潘的AI社区
量化精度选择 FP8和INT8?
部署不内卷
tensorrt
imoldpan
2024 年4 月 18 日 09:36
1
image
1308×648 20.8 KB
理论算力注意
4090算力
参考
https://developer.nvidia.com/blog/tensorrt-accelerates-stable-diffusion-nearly-2x-faster-with-8-bit-post-training-quantization/
https://www.reddit.com/r/StableDiffusion/comments/1baeo5h/nvidia_tensorrt_int8_fp8_quantization/
https://zhuanlan.zhihu.com/p/574825662