Batch Normalization: Accelerating Deep Network Training by ReducingInternal Covariate Shift 리뷰

논문 리뷰

Batch Normalization: Accelerating Deep Network Training by ReducingInternal Covariate Shift 리뷰

gmlee729 2024. 12. 4. 21:11

Problem

Covariate shift

각 레이어마다 입력의 분포가 변함

the distribution of each layer’s inputs changes during training

레이어가 계속 새 분포에 적응해야하므로 문제 발생

The change in the distributions of layers’ inputs presents a problem because the layers need to continuously adapt to the new distribution

Saturating non-linearity, vanishing gradient

addressed by ReLU, careful initialization, small learning rates. But BN is more stable

Whitening remove ICS. However, requires normalization updated and reduces the effect of GD

Normalization이 Loss / GD에 영향을 주지 못함. L은 일정한데 b만 계속 증가

GD에 Normalization 포함하면 해결. 그러나 Jacobian으로 계산이 너무 비싸짐. Simplification 필요

Batch Normalization

정의 : normalization step that fixes the means and variances of layer inputs

Simplification 첫번째는 각 scalar feature을 normalize(d개의 dimension별 정규화)

simply normalize는 nonlinear을 제한시킴 ex) Sigmoid에서 0 근처는 linear

r과 B 파라미터 추가(scale & shift the normalized value) - 표현력 복구. nonlinearity 확보.

Simplification 두번째는 전체가 아닌 mini-batch별 mean and variance 사용

infer에서는 배치들의 평균의 평균, 분산의 평균 사용(Algoritm 2)

Advantages

dramatically accelerates the training

converge faster if inputs are whitened or normalization

ImageNet : increase lr, remove dropout, shuffle more, reduce L2,

regularizes the model and reduce need for Dropout

BN provide similar regulization as Dropout by random selection

reducing the dependence of gradients on their initial values

This allows us to use much higher learning rates without the risk of divergence

파라미터 작은 변화에 대한 증폭 방지, backpropa가 파라미터 크기에 unaffected,

not saturating nonlinearities

r, B의 사용으로 mean, varience 고정돼있지않음

differentiable, ICS less, identity transformation, preserve network capacity

CNN에서도 사용 가능. B가 Convolutional property 보존

'논문 리뷰' 카테고리의 다른 글

Layer Normalization 리뷰 (6)	2024.12.04
Resnet(Deep Residual Learning for Image Recognition) 리뷰 (9)	2024.12.01
논문 단어 (10)	2024.11.24
Alexnet(ImageNet Classification with Deep Convolutional Neural Networks) 리뷰 (4)	2024.11.16
Dropout(A Simple Way to Prevent Neural Networks fromOverfitting) 리뷰 (9)	2024.11.11

현재글Batch Normalization: Accelerating Deep Network Training by ReducingInternal Covariate Shift 리뷰

머신러닝, 딥러닝

Dropout, on first-order meta-learning algorithms, 메타러닝, 머신러닝, meta learning, Mathematics for Machine Learning,

Today :
Yesterday :

일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

머신러닝, 딥러닝

Batch Normalization: Accelerating Deep Network Training by ReducingInternal Covariate Shift 리뷰

Problem

Batch Normalization

Advantages

'논문 리뷰' 카테고리의 다른 글

'논문 리뷰'의 다른글

티스토리툴바

Batch Normalization: Accelerating Deep Network Training by ReducingInternal Covariate Shift 리뷰

Problem

Batch Normalization

Advantages

'논문 리뷰' 카테고리의 다른 글

'논문 리뷰'의 다른글

관련글

티스토리툴바