Overfitting to the public leaderboard is one of the main causes why open-source models struggle when used in real-world use cases.
Hereโs an example, the data preparation for wizard-coder uses human eval pass
@1
scores to decide if to evolve the dataset further or not.