DINO-XSeek is a referring object detection model based on a multimodal large language model, designed to precisely locate objects based on user-input natural language descriptions.
Try NowDINO-XSeek can handle complex instructions involving attributes, positions, interactions, and reasoning, seamlessly integrating language with visual information. DINO-XSeek can be widely used in fields such as smart homes, augmented reality, and robotics, enhancing the intelligence of human-machine interactions.


















Object detection is the cornerstone of CV. Integrating cutting-edge perception and multimodal intelligence,
we build frontier AI models to empower a variety of scenarios,
including industrial, medical, agricultural, home, health management, retail, security, smart city, traffic management, etc.