Data augmentation is the practice of synthesizing training data by applying transformations, which are often noisy. The transformation encodes a prior belief that data will vary in a particular way, and that the response is invariant to these changes. Well-chosen transformations can regularize the model, causing it to generalized to more varied data.
Data augmentation can partially correct for class imbalance, though it carries risk. First and foremost, it introduces the risk of systematic bias by suggesting that certain characteristics are represented more in that class than they actually are. In human-impacting models, this can raise serious problems of fairness. This same effect can also lead to overfitting.
Data augmentation bears some resemblance to bootstrapping, which tends to increase (rather than decrease) the risk of overfitting, and indeed excessive data augmentation can have that effect. However, the introduction of appropriate variation can help the model discover invariant structure, which has a regularizing effect.
Applications and techniques
Although data augmentation is most often associated with computer vision, it is applicable to many domains with sparse input data, such as natural language processing, audio and video processing, time series, and so forth.
For all of these applications, it is important to ensure that transformations represent changes to which the response is invariant. This is particularly true of time series data, where a change in ordering can dramatically change the meaning in some applications, such as fault detection. (Though not in all: consider, for example, a model trained on network traffic, which can arrive out of order.)
The following are a few examples of techniques that can be used in each of those domains. Note that this list is nowhere near exhaustive.
Computer vision
- Position shift
- Color shift
- Skew
- Rotation
- Reflection
- Random noise
Natural language processing
- Word deletion, insertion, replacement, or transposition
- Typo introduction
- Synonym replacement
- Word shuffling
- Paraphrasing
Audio processing
- Pitch shifting
- Time expansion and compression
- Frequency filtering
- Random noise
Video processing
All of the techniques from audio processing and computer vision. It also introduces the possibility of varying the parameters of visual transformations over time.
Time series (see note above)
- Scaling
- Window slicing (creating shorter sequences)
- Insertion, deletion, or transposition of values
- Random noise (jittering)