# #规范化（regularization）

## #L2规范化

### #学习规则

L2规范化的想法是增加一个额外的项到代价函数上，这个项被称为**规范化项。**例如，对于规范化的交叉熵：

$$C= -\frac{1}{n}\sum\limits_{x}[y_j\ln a_j^L+ (1-y_j)\ln (1-a_j^L)]+ \frac{\lambda}{2n}\sum\limits_ww^2$$

$$C =C_0+\frac{\lambda}{2n}\sum\limits_ww^2$$

$$\frac{\partial C}{\partial w} =\frac{\partial C_0}{\partial w}+\frac{\lambda}{n}w$$

$$\frac{\partial C}{\partial w} =\frac{\partial C_0}{\partial b}$$

$$w\to (1-\frac{\lambda\eta}{n})w-\frac{\eta}{m}\sum\limits_x\frac{\partial C_x}{\partial w}$$

$$b\to b-\frac{\eta}{m}\sum\limits_x\frac{\partial C_x}{\partial b}$$

## #L1规范化

### #学习规则

$$C = C_0+ \frac{\lambda}{n}\sum\limits_w|w|$$

$$\frac{\partial C}{\partial w} =\frac{\partial C_0}{\partial w}+\frac{\lambda}{n}sgn(w )$$

### #与L2规范化的联系

$$w\to w - \frac{\lambda\eta}{nm}\sum sgn(w )-\frac{\eta}{m}\sum\limits_x\frac{\partial C_x}{\partial w}$$

$$w\to (1-\frac{\lambda\eta}{n})w-\frac{\eta}{m}\sum\limits_x\frac{\partial C_x}{\partial w}$$

## #Dropout

Dropout是一种相当激进的技术，和之前的规范化技术不同，它不改变网络本身，而是会随机地删除网络中的一般隐藏的神经元，并且让输入层和输出层的神经元保持不变。

### #why works?

This technique reduces complex co-adaptations of neurons, since a neuron cannot rely on the presence of particular other neurons. It is, therefore, forced to learn more robust features that are useful in conjunction with many different random subsets of the other neurons.