As the title suggests, some experiments in the paper were conducted on Qwen2.5, but existing research indicates that models in this family may suffer from data contamination.
Could you conduct experiments on more model families, especially those known to be "clean," as this might lead to even greater improvements in metrics?
As the title suggests, some experiments in the paper were conducted on Qwen2.5, but existing research indicates that models in this family may suffer from data contamination.
Could you conduct experiments on more model families, especially those known to be "clean," as this might lead to even greater improvements in metrics?