Data augmentation is the practice of synthesizing training data by applying transformations, which are often noisy. The transformation encodes a prior belief that data will vary in a particular way, and that the response is invariant to these changes. Well-chosen transformations can regularize the model, causing it to generalized to more varied data.

Data augmentation can partially correct for class imbalance, though it carries risk. First and foremost, it introduces the risk of systematic bias by suggesting that certain characteristics are represented more in that class than they actually are. In human-impacting models, this can raise serious problems of fairness. This same effect can also lead to overfitting.

Data augmentation bears some resemblance to bootstrapping, which tends to increase (rather than decrease) the risk of overfitting, and indeed excessive data augmentation can have that effect. However, the introduction of appropriate variation can help the model discover invariant structure, which has a regularizing effect.

Applications and techniques

Although data augmentation is most often associated with computer vision, it is applicable to many domains with sparse input data, such as natural language processing, audio and video processing, time series, and so forth.

For all of these applications, it is important to ensure that transformations represent changes to which the response is invariant. This is particularly true of time series data, where a change in ordering can dramatically change the meaning in some applications, such as fault detection. (Though not in all: consider, for example, a model trained on network traffic, which can arrive out of order.)

The following are a few examples of techniques that can be used in each of those domains. Note that this list is nowhere near exhaustive.

Computer vision

Position shift
Color shift
Skew
Rotation
Reflection
Random noise

Natural language processing

Word deletion, insertion, replacement, or transposition
Typo introduction
Synonym replacement
Word shuffling
Paraphrasing

Audio processing

Pitch shifting
Time expansion and compression
Frequency filtering
Random noise

Video processing

All of the techniques from audio processing and computer vision. It also introduces the possibility of varying the parameters of visual transformations over time.

Time series (see note above)

Scaling
Window slicing (creating shorter sequences)
Insertion, deletion, or transposition of values
Random noise (jittering)

David's raw ML reference notes

Explorer

04 Data augmentation

Applications and techniques

Computer vision

Natural language processing

Audio processing

Video processing

Time series (see note above)

Graph View

Table of Contents

Backlinks