原文 Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
marrying Transformer-based detector DINO with grounded pre-training
The key solution of openset object detection is introducing language to a closed-set detector for open-set concept generalization. To effectively fuse language and vision modalities, we conceptually divide a closed-set detector into three phases and propose a tight fusion solution, which includes a feature enhancer, a language-guided query selection, and a cross-modality