Flowification: Everything is a Normalizing flow
B. Máté*, S. Klein*, T. Golling, F. Fleuret
Neural Information Processing Systems (NeurIPS '22)
The two key characteristics of a normalizing flow (NF) is that it is invertible (in particular, dimension preserving) and that it monitors the amount by which it changes the likelihood of data points as samples are propagated along the network. Recently, multiple generalizations of NFs have been introduced
(
2007.02731,
2002.07101) that relax these two conditions.
Neural networks (NNs), on the other hand, only perform a forward pass on the input, there is neither a notion of an inverse of a neural network nor is there one of its likelihood contribution.
We argue that certain NN architectures can be enriched with a stochastic inverse pass and that their likelihood contribution can be monitored in a way that they fall under the generalized notion of NF mentioned above. We term this enrichment flowification.
We prove that neural networks only containing linear and convolutional layers and invertible activations such as LeakyReLU can be flowified and evaluate them in the generative setting on image datasets.
Deformations of Boltzmann Distributions
B. Máté and F. Fleuret
ML for Physics Workshop @ NeurIPS 2022
Sampling from unnormalized densities has been studied by many due to its relevance for the sciences.
The problem can be summarized as follows. Given an energy function (unnormalized negative log-density) \(f: \mathbb{R}^n \rightarrow \mathbb{R}\) can we efficiently generate samples from the probability density \(p(x) = \frac{1}{Z}e^{-f(x)}\)? In particular, there are no samples given, all we have is the ability to evaluate \(f\) for any sample candidate. A popular strategy to attack this problem is to use a normalizing flow to parametrize a distribution \(q_\theta\) and to optimize the parameters \(\theta\) to minimize the reverse KL-divergence \(KL(q_\theta,p)\).
We derive a family of energy functions \(f_t\) containing \(f_0\) that defines the \(\phi^4\) lattice field theory on a two-dimensional lattice. These energy functions that define a family of unnormalized densities \(p_t(x) =\tfrac{1}{Z_t}e^{-f_t(x)}\). We prove that these sampling from any member of this family is equivalent to sampling from \(p_0\) in the sense that samples from \(p_t\) can easily be transformed to samples from \(p_0\). Moreover, we experimentally find a particular value of \(\tau\) such that normalizing flows perform better at learning \(p_\tau\) then at learning \(p_0\).