# ValueDifferenceMetric

class imbalanced_ensemble.metrics.ValueDifferenceMetric(*, n_categories='auto', k=1, r=2)

Class implementing the Value Difference Metric.

This metric computes the distance between samples containing only categorical features. The distance between feature values of two samples is defined as:

$\delta(x, y) = \sum_{c=1}^{C} |p(c|x_{f}) - p(c|y_{f})|^{k} \ ,$

where $$x$$ and $$y$$ are two samples and $$f$$ a given feature, $$C$$ is the number of classes, $$p(c|x_{f})$$ is the conditional probability that the output class is $$c$$ given that the feature value $$f$$ has the value $$x$$ and $$k$$ an exponent usually defined to 1 or 2.

The distance for the feature vectors $$X$$ and $$Y$$ is subsequently defined as:

$\Delta(X, Y) = \sum_{f=1}^{F} \delta(X_{f}, Y_{f})^{r} \ ,$

where $$F$$ is the number of feature and $$r$$ an exponent usually defined equal to 1 or 2.

The definition of this distance was propoed in .

Read more in the User Guide.

Parameters
n_categories“auto” or array-like of shape (n_features,), default=”auto”

The number of unique categories per features. If “auto”, the number of categories will be computed from X at fit. Otherwise, you can provide an array-like of such counts to avoid computation. You can use the fitted attribute categories_ of the OrdinalEncoder to deduce these counts.

kint, default=1

Exponent used to compute the distance between feature value.

rint, default=2

Exponent used to compute the distance between the feature vector.

Attributes
n_categories_ndarray of shape (n_features,)

The number of categories per features.

proba_per_class_list of ndarray of shape (n_categories, n_classes)

List of length n_features containing the conditional probabilities for each category given a class.

Notes

The input data X are expected to be encoded by an OrdinalEncoder and the data type is used should be np.int32. If other data types are given, X will be converted to np.int32.

References

1

Stanfill, Craig, and David Waltz. “Toward memory-based reasoning.” Communications of the ACM 29.12 (1986): 1213-1228.

Examples

>>> import numpy as np
>>> X = np.array(["green"] * 10 + ["red"] * 10 + ["blue"] * 10).reshape(-1, 1)
>>> y =  * 8 +  * 5 +  * 7 +  * 9 + 
>>> from sklearn.preprocessing import OrdinalEncoder
>>> encoder = OrdinalEncoder(dtype=np.int32)
>>> X_encoded = encoder.fit_transform(X)
>>> from imbalanced_ensemble.metrics.pairwise import ValueDifferenceMetric
>>> vdm = ValueDifferenceMetric().fit(X_encoded, y)
>>> pairwise_distance = vdm.pairwise(X_encoded)
>>> pairwise_distance.shape
(30, 30)
>>> X_test = np.array(["green", "red", "blue"]).reshape(-1, 1)
>>> X_test_encoded = encoder.transform(X_test)
>>> vdm.pairwise(X_test_encoded)
array([[ 0.  ,  0.04,  1.96],
[ 0.04,  0.  ,  1.44],
[ 1.96,  1.44,  0.  ]])


Methods

 fit(X, y) Compute the necessary statistics from the training set. get_params([deep]) Get parameters for this estimator. pairwise(X[, Y]) Compute the VDM distance pairwise. set_params(**params) Set the parameters of this estimator.
fit(X, y)

Compute the necessary statistics from the training set.

Parameters
Xndarray of shape (n_samples, n_features), dtype=np.int32

The input data. The data are expected to be encoded with a OrdinalEncoder.

yndarray of shape (n_features,)

The target.

Returns
self
get_params(deep=True)

Get parameters for this estimator.

Parameters
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
paramsdict

Parameter names mapped to their values.

pairwise(X, Y=None)

Compute the VDM distance pairwise.

Parameters
Xndarray of shape (n_samples, n_features), dtype=np.int32

The input data. The data are expected to be encoded with a OrdinalEncoder.

Yndarray of shape (n_samples, n_features), dtype=np.int32

The input data. The data are expected to be encoded with a OrdinalEncoder.

Returns
distance_matrixndarray of shape (n_samples, n_samples)

The VDM pairwise distance.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters
**paramsdict

Estimator parameters.

Returns
selfestimator instance

Estimator instance.