Towards Lossless Dataset Distillation via Difficulty-Aligned Trajectory Matching

Overview

Dataset Distillation aims at synthesizing a small synthetic dataset such that a model trained on this synthetic set will perform equally well as a model trained on the full, real dataset. Until now, no Dataset Distillation method has reached this completely lossless goal, in part due to the fact that previous methods only remain effective when the size of synthetic dataset is extremely small.

In this work, we elucidate why existing methods fail to generate larger, high-quality synthetic sets, taking trajectory matching (TM) based distillation methods as an example. Firstly, We empirically find that the training stage of the trajectories we choose to match (i.e., early or late) greatly affects the effectiveness of the distilled dataset.

Specifically, early trajectories (where the teacher network learns easy patterns) work well for a low-cardinality synthetic set since there are fewer examples wherein to distribute the necessary information. Conversely, late trajectories (where the teacher network learns hard patterns) provide better signals for larger synthetic sets since there are now enough samples to represent the necessary complex patterns.

Based on our findings, we propose to align the difficulty of the generated patterns with the size of the synthetic dataset. In doing so, we successfully scale TM-based methods to larger synthetic datasets, achieving lossless dataset distillation for the very first time.

Motivation and Findings

According to Cazenavette et al and Arpit et al:

1. TM-based methods embed informative patterns into synthetic data by matching expert training strategies.

2. DNNs tend to learn easy patterns early in training.

Based on this, we infer that patterns gengerated by matching trajectories from earlier training phases are easier for DNNs to learn. Then we explore the effect of matching trajectories from differrnt training phases:

As can be observed, matching early trajectories, which will generate easy patterns, performs well when IPC is low but tend to be harmful as IPC increases. Conversely, matching late trajectories is beneficial in the regime of high IPC.

To keep dataset distillation effective in various IPC cases, we propose to align the difficulty of the generated patterns with the size of the synthetic dataset.

Comparison

As can be observed in the figure, previous distillation methods work well only when IPC is extremely small. Benefited from our difficulty alignment strategy, our method is effective in all IPC settings. Notably, we distill CIFAR-10 and CIFAR-100 to 1/5 and Tiny ImageNet to 1/10 of their original sizes without any performance loss on ConvNet, offering the first lossless method of dataset distillation.

Visualization

What do easy patterns and hard patterns look like?

Here we visualize the images synthesized by matching early and late trajectories, where easy patterns and hard patterns are embedded respectively.

As can be observed, matching early trajectories will blend the target object into the background and blur the details, which help DNNs to learn to identify commen (easy) samples according to their basic patterns.

Conversely, matching late trajectories will give more detials about the target, which help DNNs to learn to identify the outlier (hard) samples.

Match Early Trajectories

Match Late Trajectories

2.CAFE: Learning to Condense Dataset by Aligning Features.

3.Generalizing Dataset Distillation via Deep Generative Prior.

4.DREAM: Efficient Dataset Distillation by Representative Matching.

BibTeX


@inproceedings{guo2024lossless,
      title={Towards Lossless Dataset Distillation via Difficulty-Aligned Trajectory Matching}, 
      author={Ziyao Guo and Kai Wang and George Cazenavette and Hui Li and Kaipeng Zhang and Yang You},
      year={2024},
      booktitle={The Twelfth International Conference on Learning Representations}
}