tf.contrib.distributions.VectorDiffeomixture

VectorDiffeomixture distribution.

Inherits From: Distribution

tf.contrib.distributions.VectorDiffeomixture(
    mix_loc, temperature, distribution, loc=None, scale=None, quadrature_size=8, qua
    drature_fn=tf.contrib.distributions.quadrature_scheme_softmaxnormal_quantiles,
    validate_args=False, allow_nan_stats=True, name='VectorDiffeomixture'
)

A vector diffeomixture (VDM) is a distribution parameterized by a convex combination of K component loc vectors, loc[k], k = 0,...,K-1, and K scale matrices scale[k], k = 0,..., K-1. It approximates the following compound distribution

p(x) = int p(x | z) p(z) dz,
where z is in the K-simplex, and
p(x | z) := p(x | loc=sum_k z[k] loc[k], scale=sum_k z[k] scale[k])

The integral int p(x | z) p(z) dz is approximated with a quadrature scheme adapted to the mixture density p(z). The N quadrature points z_{N, n} and weights w_{N, n} (which are non-negative and sum to 1) are chosen such that

as N --> infinity.

Since q_N(x) is in fact a mixture (of N points), we may sample from q_N exactly. It is important to note that the VDM is defined as q_N above, and not p(x). Therefore, sampling and pdf may be implemented as exact (up to floating point error) methods.

A common choice for the conditional p(x | z) is a multivariate Normal.

The implemented marginal p(z) is the SoftmaxNormal, which is a K-1 dimensional Normal transformed by a SoftmaxCentered bijector, making it a density on the K-simplex. That is,

Z = SoftmaxCentered(X),
X = Normal(mix_loc / temperature, 1 / temperature)

The default quadrature scheme chooses z_{N, n} as N midpoints of the quantiles of p(z) (generalized quantiles if K > 2).

See [Dillon and Langmore (2018)][1] for more details.

About `Vector` distributions in TensorFlow.

The VectorDiffeomixture is a non-standard distribution that has properties particularly useful in variational Bayesian methods.

Conditioned on a draw from the SoftmaxNormal, X|z is a vector whose components are linear combinations of affine transformations, thus is itself an affine transformation.

Note: The marginals X_1|v, ..., X_d|v are not generally identical to some parameterization of distribution. This is due to the fact that the sum of draws from distribution are not generally itself the same distribution.

About `Diffeomixture`s and reparameterization.

The VectorDiffeomixture is designed to be reparameterized, i.e., its parameters are only used to transform samples from a distribution which has no trainable parameters. This property is important because backprop stops at sources of stochasticity. That is, as long as the parameters are used after the underlying source of stochasticity, the computed gradient is accurate.

Reparametrization means that we can use gradient-descent (via backprop) to optimize Monte-Carlo objectives. Such objectives are a finite-sample approximation of an expectation and arise throughout scientific computing.

Examples

import tensorflow_probability as tfp
tfd = tfp.distributions

# Create two batches of VectorDiffeomixtures, one with mix_loc=[0.],
# another with mix_loc=[1]. In both cases, `K=2` and the affine
# transformations involve:
# k=0: loc=zeros(dims)  scale=LinearOperatorScaledIdentity
# k=1: loc=[2.]*dims    scale=LinOpDiag
dims = 5
vdm = tfd.VectorDiffeomixture(
    mix_loc=[[0.], [1]],
    temperature=[1.],
    distribution=tfd.Normal(loc=0., scale=1.),
    loc=[
        None,  # Equivalent to `np.zeros(dims, dtype=np.float32)`.
        np.float32([2.]*dims),
    ],
    scale=[
        tf.linalg.LinearOperatorScaledIdentity(
          num_rows=dims,
          multiplier=np.float32(1.1),
          is_positive_definite=True),
        tf.linalg.LinearOperatorDiag(
          diag=np.linspace(2.5, 3.5, dims, dtype=np.float32),
          is_positive_definite=True),
    ],
    validate_args=True)

References

[1]: Joshua Dillon and Ian Langmore. Quadrature Compound: An approximating family of distributions. arXiv preprint arXiv:1801.03080, 2018. https://arxiv.org/abs/1801.03080

Args
`mix_loc`	`float`-like `Tensor` with shape `[b1, ..., bB, K-1]`. In terms of samples, larger `mix_loc[..., k]` ==> `Z` is more likely to put more weight on its `kth` component.
`temperature`	`float`-like `Tensor`. Broadcastable with `mix_loc`. In terms of samples, smaller `temperature` means one component is more likely to dominate. I.e., smaller `temperature` makes the VDM look more like a standard mixture of `K` components.
`distribution`	`tf.Distribution`-like instance. Distribution from which `d` iid samples are used as input to the selected affine transformation. Must be a scalar-batch, scalar-event distribution. Typically `distribution.reparameterization_type = FULLY_REPARAMETERIZED` or it is a function of non-trainable parameters. WARNING: If you backprop through a VectorDiffeomixture sample and the `distribution` is not `FULLY_REPARAMETERIZED` yet is a function of trainable variables, then the gradient will be incorrect!
`loc`	Length-`K` list of `float`-type `Tensor`s. The `k`-th element represents the `shift` used for the `k`-th affine transformation. If the `k`-th item is `None`, `loc` is implicitly `0`. When specified, must have shape `[B1, ..., Bb, d]` where `b >= 0` and `d` is the event size.
`scale`	Length-`K` list of `LinearOperator`s. Each should be positive-definite and operate on a `d`-dimensional vector space. The `k`-th element represents the `scale` used for the `k`-th affine transformation. `LinearOperator`s must have shape `[B1, ..., Bb, d, d]`, `b >= 0`, i.e., characterizes `b`-batches of `d x d` matrices
`quadrature_size`	Python `int` scalar representing number of quadrature points. Larger `quadrature_size` means `q_N(x)` better approximates `p(x)`.
`quadrature_fn`	Python callable taking `normal_loc`, `normal_scale`, `quadrature_size`, `validate_args` and returning `tuple(grid, probs)` representing the SoftmaxNormal grid and corresponding normalized weight. normalized) weight. Default value: `quadrature_scheme_softmaxnormal_quantiles`.
`validate_args`	Python `bool`, default `False`. When `True` distribution parameters are checked for validity despite possibly degrading runtime performance. When `False` invalid inputs may silently render incorrect outputs.
`allow_nan_stats`	Python `bool`, default `True`. When `True`, statistics (e.g., mean, mode, variance) use the value "`NaN`" to indicate the result is undefined. When `False`, an exception is raised if one or more of the statistic's batch members are undefined.
`name`	Python `str` name prefixed to Ops created by this class.

Raises
`ValueError`	if `not scale or len(scale) < 2`.
`ValueError`	if `len(loc) != len(scale)`
`ValueError`	if `quadrature_grid_and_probs is not None` and `len(quadrature_grid_and_probs[0]) != len(quadrature_grid_and_probs[1])`
`ValueError`	if `validate_args` and any not scale.is_positive_definite.
`TypeError`	if any scale.dtype != scale[0].dtype.
`TypeError`	if any loc.dtype != scale[0].dtype.
`NotImplementedError`	if `len(scale) != 2`.
`ValueError`	if `not distribution.is_scalar_batch`.
`ValueError`	if `not distribution.is_scalar_event`.

Attributes
`allow_nan_stats`	Python `bool` describing behavior when a stat is undefined. Stats return +/- infinity when it makes sense. E.g., the variance of a Cauchy distribution is infinity. However, sometimes the statistic is undefined, e.g., if a distribution's pdf does not achieve a maximum within the support of the distribution, the mode is undefined. If the mean is undefined, then by definition the variance is undefined. E.g. the mean for Student's T for df = 1 is undefined (no clear way to say it is either + or - infinity), so the variance = E[(X - mean)**2] is also undefined.
`batch_shape`	Shape of a single sample from a single event index as a `TensorShape`. May be partially defined or unknown. The batch dimensions are indexes into independent, non-identical parameterizations of this distribution.
`distribution`	Base scalar-event, scalar-batch distribution.
`dtype`	The `DType` of `Tensor`s handled by this `Distribution`.
`endpoint_affine`	Affine transformation for each of `K` components.
`event_shape`	Shape of a single sample from a single batch as a `TensorShape`. May be partially defined or unknown.
`grid`	Grid of mixing probabilities, one for each grid point.
`interpolated_affine`	Affine transformation for each convex combination of `K` components.
`mixture_distribution`	Distribution used to select a convex combination of affine transforms.
`name`	Name prepended to all ops created by this `Distribution`.
`parameters`	Dictionary of parameters used to instantiate this `Distribution`.
`reparameterization_type`	Describes how samples from the distribution are reparameterized. Currently this is one of the static instances `distributions.FULLY_REPARAMETERIZED` or `distributions.NOT_REPARAMETERIZED`.
`validate_args`	Python `bool` indicating possibly expensive checks are enabled.

Args
`value`	`float` or `double` `Tensor`.
`name`	Python `str` prepended to names of ops created by this function.

Args
`other`	`tfp.distributions.Distribution` instance.
`name`	Python `str` prepended to names of ops created by this function.

Args
`sample_shape`	`Tensor` or python list/tuple. Desired shape of a call to `sample()`.
`name`	name to prepend ops with.

Args
`sample_shape`	0D or 1D `int32` `Tensor`. Shape of the generated samples.
`seed`	Python integer seed for RNG
`name`	name to give to the op.

tf.contrib.distributions.VectorDiffeomixture

About Vector distributions in TensorFlow.

About Diffeomixtures and reparameterization.

Examples

References

Methods

batch_shape_tensor

cdf

copy

covariance

cross_entropy

entropy

event_shape_tensor

is_scalar_batch

is_scalar_event

kl_divergence

log_cdf

log_prob

log_survival_function

mean

mode

param_shapes

param_static_shapes

prob

quantile

sample

stddev

survival_function

variance

About `Vector` distributions in TensorFlow.

About `Diffeomixture`s and reparameterization.

`batch_shape_tensor`

`cdf`

`copy`

`covariance`

`cross_entropy`

`entropy`

`event_shape_tensor`

`is_scalar_batch`

`is_scalar_event`

`kl_divergence`

`log_cdf`

`log_prob`

`log_survival_function`

`mean`

`mode`

`param_shapes`

`param_static_shapes`

`prob`

`quantile`

`sample`

`stddev`

`survival_function`

`variance`