Cold start is the state when the inference service processes a request for the first time after startup. At this time, the model has not been loaded into the memory or hardware acceleration device, which will cause the latency of the first request to be significantly higher than subsequent requests. It is an important factor affecting the first-screen response time of the inference service.





