Our Frankenstein-ish pipeline looks as follows: Use Grounding DINO to detect the "thing" categories (categories with instances) Get instance segmentation masks for the detected boxes using SAM Use ...