Negative Data Augmentation

Negative Data Augmentation (ICLR 2021) https://arxiv.org/abs/2102.05113
Created on April 6|Last edited on April 6
Comment
Negative Data Augmentation
﻿https://arxiv.org/abs/2102.05113﻿
TLDR
Most important figures in the paper
SummaryVery interesting paper about generative models data augmentation 
A common failure mode for generative models, especially GAN, is they produce a significant amount of  "false positives": samples that are not realistic.
﻿
Why?
﻿
Using an analogy with physics, standard GANs are trained by enforcing a sort of attractive potential between the generator distribution GθG_{\theta}Gθ​﻿ and the true distribution PdataP_{data}Pdata​﻿ samples, using the discriminator DϕD_{\phi}Dϕ​﻿ 
Theoretical guarantees: the generative distribution converges to the true distribution in the infinite data limit.
﻿lim⁡ng→∞Gθ=Pdata\lim_{n_{g} \rightarrow \infty} G_{\theta} = P_{data}limng​→∞​Gθ​=Pdata​﻿﻿
﻿
I used ngn_{g}ng​﻿ to indicate the number of real samples since 'g' stands for green dots (see later)
﻿
What happens in practice?
﻿
See Fig.3, in the finite data limit, if the green dots distribution is not representing the green oval properly then the blue oval won't be able to overlap well.
﻿
﻿
﻿
Simple solution: get more and better green dots so increase ngn_{g}ng​﻿ which is the obvious solution since it gets us closer and closer to the infinite data limit, but this is not always possible.
Reasons: sampling the PdataP_{data}Pdata​﻿ is typically expensive, hard and sometimes even not always possible.  
﻿
The idea proposed in this paper is simple yet powerful: adding to the attractive potential related to the positive samples, also a repulsive potential related to negative samples.
﻿
So here we have 
﻿ngn_{g}ng​﻿ green dots, responsible for the attractive potential 
﻿nrn_{r}nr​﻿ red dots, responsible for the repulsive potential 
﻿
Introducing the repulsive potential does not affect the original theoretical guarantees, but it makes the system more samples efficient since generating negative samples from positive samples is simple and cheap using a set of known transformations 
﻿{T}T:X→X\{T\} \quad T : X \rightarrow X{T}T:X→X﻿ 
See some examples In Fig.2
﻿
Examples of NDA Transformations 
﻿
So if the Discriminator learns quickly enough about negative samples it can help the Generator getting away from them soon enough 
﻿
Add a comment