Derives a self-organizing learning algorithm that maximizes the information transferred in a network of nonlinear units. The nonlinearities in the transfer function pick up higher-order moments of the input distributions and perform something akin to true redundancy reduction between units in the output representation. This enables the network to separate statistically independent components in the inputs: a higher-order generalization of principal components analysis. The network is applied to the source separation ("cocktail party") problem, successfully separating unknown mixtures of up to 10 speakers. It is shown that a variant on the network architecture can perform blind deconvolution. Dependencies of information transfer on time delays are derived. Information maximization appears to provide a unifying framework for problems in blind signal processing.