What do we know about Information bottleneck?

What do we know about Information bottleneck?


Alexander Kulitski

Humankind has progressed moving by simple laziness or inspiration for ages taking the path of trial and error. Deep neural networks are just another example of how deep learning as a theory has taken over so many spheres of our lives puzzling human beings who laid very little hope on their success.

We know that neural network is similar to our brains. It has artificial layers resembling the thinking apparatus of human beings. Due to multiple training sessions, artificial neural network gains the ability to identify objects from pictures as good as we do confirming the good old “practice makes perfect” rule applies to machines.

A recent AI influencer, Professor Naftali Tishby from Israel gave the world a sneak peek into his new theory that elaborates on what exactly makes deep learning tick. The scientist first touched the ground of this subject in the late 90s when along with his fellow-colleagues he laid the theoretical foundations of information bottleneck method.

The famous speech can be found here:  https://www.youtube.com/watch?v=bLqJHjXihK8&t=22s

Information bottleneck theory: what is it really about?

In our post we will try and walk you through the method summing up the theory in a nutshell.

The Professor argues that a neural network gets rid of numerous details that input has through filtering the data sets as if they were going through a bottleneck. In such a way, the network sources only general details, not taking into account miscellaneous and secondary information.

Naftali Tishby and the team of researchers took a simple AI scenario of recognizing an animal in a photo and trained a CNN. At the initial stages of the training process, the top nodes of CNN were linked to the next layer. The latter were linked to the next layer in a similar way until the very last output layer.

During their experiment, the scientists discovered that neural network progressed from grasping all the photo features at the very start to only the relevant ones after a while.

Deep learning: the mystery unveiled

Furthermore, Naftali Tishby‘s team brought to light a particular feature of deep neural networks. He proved that the CNNs converge with a span of time in accuracy. This convergence varies taking into account a few factors such as: sample size, layers quantity, as well as number of training cycles.

Earlier in the year, Huffington Post presented an article on the same topic where they discussed the emerging products that we can see popping up after the theory of bottleneck was introduced.

Here is what an opinion and news website noted: “information bottleneck” makes the auto-encoder learn a compressed representation of the data sets in the hidden layer, which is then decoded back to the initial state by the remaining layers in the CNN. The experts from Huffington Post add that such a behavior of neural network can present an opportunity for industry-specific data compressions. The area of application of deep learning becomes enormous: from art to gaming etc.

“The information bottleneck thing is a really interesting direction. Coming from statistical physics as a field of research, it makes a lot of sense.”

AI enthusiasts are very positive about this theory that explains the tremendous success of deep learning. In the research paper, however, professor Tishby and the student researcher Noga Zaslavsky of Berkeley University emphasize that there must be a connection, which should be investigated further: the connection between the network architecture (the quantity of layers and their structure) and the structural phase transitions in the information bottleneck problem since both of them are related to spectral properties of the second order correlations of the data, at the critical points.

Inferrence blog’s author, machine learning researcher, called the theory “a beautifully elegant approach to representation learning”. We agree that this method is a great approach to trim the fat, get rid of irrelevant info that can be omitted, and to grasp only the relevant aspects instead to process further.