CondensedNearestNeighbour

class imbens.sampler.CondensedNearestNeighbour(*, sampling_strategy='auto', random_state=None, n_neighbors=None, n_seeds_S=1, n_jobs=None)

Undersample based on the condensed nearest neighbour method.

See also

EditedNearestNeighbours: Undersample by editing samples.
RepeatedEditedNearestNeighbours: Undersample by repeating ENN algorithm.
AllKNN: Undersample using ENN and various number of neighbours.

Notes

The method is based on [1].

Supports multi-class resampling. A one-vs.-rest scheme is used when sampling a class as proposed in [1].

References

[1] (1,2)

P. Hart, “The condensed nearest neighbor rule,” In Information Theory, IEEE Transactions on, vol. 14(3), pp. 515-516, 1968.

Examples

>>> from collections import Counter 
>>> from sklearn.datasets import fetch_mldata 
>>> from imbens.sampler._under_sampling import CondensedNearestNeighbour 
>>> pima = fetch_mldata('diabetes_scale') 
>>> X, y = pima['data'], pima['target'] 
>>> print('Original dataset shape %s' % Counter(y)) 
Original dataset shape Counter({1: 500, -1: 268}) 
>>> cnn = CondensedNearestNeighbour(random_state=42) 
>>> X_res, y_res = cnn.fit_resample(X, y) 
>>> print('Resampled dataset shape %s' % Counter(y_res)) 
Resampled dataset shape Counter({-1: 268, 1: 227}) 

Methods

`fit`(X, y)	Check inputs and statistics of the sampler.
`fit_resample`(X, y, *[, sample_weight])	Resample the dataset.
`get_params`([deep])	Get parameters for this estimator.
`set_params`(**params)	Set the parameters of this estimator.

fit(X, y)

Check inputs and statistics of the sampler.

You should use fit_resample in all cases.

Parameters:

X{array-like, dataframe, sparse matrix} of shape (n_samples, n_features): Data array.
yarray-like of shape (n_samples,): Target array.

Returns:

selfobject: Return the instance itself.

fit_resample(X, y, *, sample_weight=None, **kwargs)

Resample the dataset.

Parameters:

X{array-like, dataframe, sparse matrix} of shape (n_samples, n_features)

Matrix containing the data which have to be sampled.

yarray-like of shape (n_samples,)

Corresponding label for each sample in X.

sample_weightarray-like of shape (n_samples,), default=None

Corresponding weight for each sample in X.

If None, perform normal resampling and return (X_resampled, y_resampled).
If array-like, the given sample_weight will be resampled along with X and y, and the resampled sample weights will be added to returns. The function will return (X_resampled, y_resampled, sample_weight_resampled).

Returns:

X_resampled{array-like, dataframe, sparse matrix} of shape (n_samples_new, n_features): The array containing the resampled data.
y_resampledarray-like of shape (n_samples_new,): The corresponding label of X_resampled.
sample_weight_resampledarray-like of shape (n_samples_new,), default=None: The corresponding weight of X_resampled. Only will be returned if input sample_weight is not None.

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.