Search results for key=KrH1992 : 1 match found.

Refereed full papers (journals, book chapters, international conferences)

1992

Anders Krogh and John A. Hertz, A simple weight decay can improve generalisation, Advances in Neural Information Processing Systems, 4, pp. 950-957, 1992.

It has been observed in numerical simulations that a weight decay can improve generalization in a feed forward neural network. This paper explains why. It is proven that a weight decay has two effects in a linear network. First, it suppresses any irrelevant components of the weight vector by choosing the smallest vector that solves the learning problem. Second, if the size is chosen right, a weight decay can suppress some of the effects of static noise on the targets, which improves generalization quite a lot. It is then shown how to extend these results to networks with hidden layers and non-linear units. Finally the theory is confirmed by some numerical simulations using the data from NetTalk.