YOLO models the object detection task as a regression task. Through learning, the network directly predicts continuous values such as the center coordinates, width, height, and confidence of the bounding box, instead of only making discrete classification judgments. The corresponding loss function is designed to supervise the network to accurately fit the position and confidence of the object, realizing object localization.





