I have been doing some work recently on Restricted Boltzmann Machines (RBMs). Specifically, I have been looking at the evaluation of the log of the partition function.
RBMs consist of a layer of visible nodes and a layer of hidden nodes, with the number of hidden nodes typically being less than the number of visible nodes. Where both visible and hidden nodes have binary states we can think of the RBM as performing a discrete-to-discrete dimensionality reduction. Stacked RBMs provided some of the earliest examples of deep learning neural networks – see for example the work of Hinton and Salakhutdinov.
The partition function Z is the normalizing constant for the joint distribution over the states of the visible and hidden nodes, and is often used for model selection, i.e. when we want to control or select the right level of complexity in the RBM. I wanted to test out some of the ideas behind the message passing algorithm of Huang and Toyoizumi (arxiv version here). Huang and Toyoizumi use a Bethe approximation to develop a mean-field approximation for log Z. As usual, the self-consistent mean-field equations lead to a set of coupled equations for the expected magnetization, which are solved iteratively leading to the passing of information on local field strengths between nodes – the so-called message passing. To test the Huang and Toyoizumi algorithm I need to know the true value of log Z.
A standard, non mean-field, method for evaluation of the log-partition function is the Annealed Importance Sampling (AIS) algorithm of Salakhutdinov and Murray, who base their derivation on the generic AIS work of Neal (arxiv version). The AIS algorithm is an Monte Carlo based approach and samples from a series of RBMs that go from being completely decoupled (no visible to hidden node interactions) to the fully coupled RBM of interest.
I have pushed my implementations of the Huang and Toyoizumi message passing algorithm and the Salakhutdinov and Murray AIS algorithm to github. However, there is still the question of how do I test the implementations given that there is no simple closed form analytical expressions for log Z when we have visible to hidden node coupling? Fortunately, as the RBMs are of finite size, then for sufficiently small hidden and visible layers we can evaluate logZ ‘exactly’ via complete enumeration of all the states of the visible and hidden layers. I say ‘exactly’ as some numerical approximation can be required when combining terms in the partition function whose energies are on very different scales. I have also included in the github repository code to do the ‘exact’ evaluation.