The most surprizing aspect of neural networks is their simplicity. I don’t mean that the whole of a neural network is simple. It is not. But at any given instant you are either adding numbers or applying a function on one number (a single variable function). Why these two kinds of operations will give you any old function of multiple variables is a mystry to me! Let me take a small step towards understanding this by reading this really old paper by Kolmogrov. This paper is in Russian, I found an English translation here. For more on the mathematical history of the problem look at the Wikiedia page on Kolmogorov-Arnold representation theorem (or superposition theorem). Essentially, this paper is important because it solves the Hilbert’s 13th problem for continous functions.
Kolmogorov Arnold Representation Theorem
Understanding the Kolmogorov paper.