One way of simplifying neural networks so they generalize better is to add an extra term to the error function that will penalize complexity. We propose a new penalty term in which the distribution of weight values is modelled as a mixture of multiple gaussians. Under this model, a set of weights is simple if the weights can be clustered into subsets so that the weights in each cluster have similar values. We allow the parameters of the mixture model to adapt at the same time as the network learns. Simulations demonstrate that this complexity term is more effective than previous complexity terms.