老潘的AI社区

量化精度选择 FP8和INT8？

部署不内卷

tensorrt

imoldpan 2024 年4 月 18 日 09:36 1

image1308×648 20.8 KB

理论算力注意

4090算力

参考

https://developer.nvidia.com/blog/tensorrt-accelerates-stable-diffusion-nearly-2x-faster-with-8-bit-post-training-quantization/
https://www.reddit.com/r/StableDiffusion/comments/1baeo5h/nvidia_tensorrt_int8_fp8_quantization/
https://zhuanlan.zhihu.com/p/574825662

首页
类别
准则
服务条款
隐私政策

由 Discourse 提供技术支持，启用 JavaScript 以获得最佳体验