DDS-LOGO

Rate Limiting

Rate limiting is a strategy for limiting the calling frequency of inference API interfaces, usually measured in "requests per minute" or "requests per hour". It is used to prevent server resource exhaustion caused by malicious requests or excessive calls, ensuring service stability.