DDS-LOGO

Data Leakage

Data leakage is a critical issue in machine learning, referring to unintended data overlap between the training, validation, or test sets. Examples include test set samples being used for model training, cross-set operations during data augmentation, and preprocessing statistics calculated using the entire dataset. It leads to inflated model evaluation metrics that fail to reflect the true generalization ability.