YOLO divides the input image into grids of fixed size, and each grid is responsible for detecting objects whose centers fall within it. The network directly regresses bounding box coordinates, confidence and category probabilities. Redundant boxes are filtered out through non-maximum suppression, and finally the detection results of all objects in the image are output, realizing end-to-end rapid detection.





