sklearn.manifold.trustworthiness

sklearn.manifold.trustworthiness(X, X_embedded, *, n_neighbors=5, metric='euclidean') [source]

Expresses to what extent the local structure is retained.

The trustworthiness is within [0, 1]. It is defined as

\[T(k) = 1 - \frac{2}{nk (2n - 3k - 1)} \sum^n_{i=1} \sum_{j \in \mathcal{N}_{i}^{k}} \max(0, (r(i, j) - k))\]

where for each sample i, \(\mathcal{N}_{i}^{k}\) are its k nearest neighbors in the output space, and every sample j is its \(r(i, j)\)-th nearest neighbor in the input space. In other words, any unexpected nearest neighbors in the output space are penalised in proportion to their rank in the input space.

“Neighborhood Preservation in Nonlinear Projection Methods: An Experimental Study” J. Venna, S. Kaski
“Learning a Parametric Embedding by Preserving Local Structure” L.J.P. van der Maaten

Parameters

Xndarray of shape (n_samples, n_features) or (n_samples, n_samples): If the metric is ‘precomputed’ X must be a square distance matrix. Otherwise it contains a sample per row.
X_embeddedndarray of shape (n_samples, n_components): Embedding of the training data in low-dimensional space.
n_neighborsint, default=5: Number of neighbors k that will be considered.
metricstr or callable, default=’euclidean’: Which metric to use for computing pairwise distances between samples from the original input space. If metric is ‘precomputed’, X must be a matrix of pairwise distances or squared distances. Otherwise, see the documentation of argument metric in sklearn.pairwise.pairwise_distances for a list of available metrics.

New in version 0.20.

Returns

trustworthinessfloat: Trustworthiness of the low-dimensional embedding.

© 2007–2020 The scikit-learn developers
Licensed under the 3-clause BSD License.
https://scikit-learn.org/0.24/modules/generated/sklearn.manifold.trustworthiness.html