Problem
Covariate shift
각 레이어마다 입력의 분포가 변함
the distribution of each layer’s inputs changes during training
레이어가 계속 새 분포에 적응해야하므로 문제 발생
The change in the distributions of layers’ inputs presents a problem because the layers need to continuously adapt to the new distribution
Saturating non-linearity, vanishing gradient
addressed by ReLU, careful initialization, small learning rates. But BN is more stable
Whitening remove ICS. However, requires normalization updated and reduces the effect of GD
Normalization이 Loss / GD에 영향을 주지 못함. L은 일정한데 b만 계속 증가
GD에 Normalization 포함하면 해결. 그러나 Jacobian으로 계산이 너무 비싸짐. Simplification 필요
Batch Normalization
정의 : normalization step that fixes the means and variances of layer inputs
Simplification 첫번째는 각 scalar feature을 normalize(d개의 dimension별 정규화)
simply normalize는 nonlinear을 제한시킴 ex) Sigmoid에서 0 근처는 linear
r과 B 파라미터 추가(scale & shift the normalized value) - 표현력 복구. nonlinearity 확보.
Simplification 두번째는 전체가 아닌 mini-batch별 mean and variance 사용
infer에서는 배치들의 평균의 평균, 분산의 평균 사용(Algoritm 2)
Advantages
dramatically accelerates the training
converge faster if inputs are whitened or normalization
ImageNet : increase lr, remove dropout, shuffle more, reduce L2,
regularizes the model and reduce need for Dropout
BN provide similar regulization as Dropout by random selection
reducing the dependence of gradients on their initial values
This allows us to use much higher learning rates without the risk of divergence
파라미터 작은 변화에 대한 증폭 방지, backpropa가 파라미터 크기에 unaffected,
not saturating nonlinearities
r, B의 사용으로 mean, varience 고정돼있지않음
differentiable, ICS less, identity transformation, preserve network capacity
CNN에서도 사용 가능. B가 Convolutional property 보존
'논문 리뷰' 카테고리의 다른 글
| Layer Normalization 리뷰 (6) | 2024.12.04 |
|---|---|
| Resnet(Deep Residual Learning for Image Recognition) 리뷰 (9) | 2024.12.01 |
| 논문 단어 (10) | 2024.11.24 |
| Alexnet(ImageNet Classification with Deep Convolutional Neural Networks) 리뷰 (4) | 2024.11.16 |
| Dropout(A Simple Way to Prevent Neural Networks fromOverfitting) 리뷰 (9) | 2024.11.11 |