ADASYN

class imbens.sampler.ADASYN(*, sampling_strategy='auto', random_state=None, n_neighbors=5, n_jobs=None)

Oversample using Adaptive Synthetic (ADASYN) algorithm.

This method is similar to SMOTE but it generates different number of samples depending on an estimate of the local distribution of the class to be oversampled.

See also

SMOTE: Over-sample using SMOTE.
SVMSMOTE: Over-sample using SVM-SMOTE variant.
BorderlineSMOTE: Over-sample using Borderline-SMOTE variant.

Notes

The implementation is based on [1].

Supports multi-class resampling. A one-vs.-rest scheme is used.

References

[1]

He, Haibo, Yang Bai, Edwardo A. Garcia, and Shutao Li. “ADASYN: Adaptive synthetic sampling approach for imbalanced learning,” In IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pp. 1322-1328, 2008.

Examples

>>> from collections import Counter
>>> from sklearn.datasets import make_classification
>>> from imbens.sampler._over_sampling import ADASYN 
>>> X, y = make_classification(n_classes=2, class_sep=2,
... weights=[0.1, 0.9], n_informative=3, n_redundant=1, flip_y=0,
... n_features=20, n_clusters_per_class=1, n_samples=1000,
... random_state=10)
>>> print('Original dataset shape %s' % Counter(y))
Original dataset shape Counter({1: 900, 0: 100})
>>> ada = ADASYN(random_state=42)
>>> X_res, y_res = ada.fit_resample(X, y)
>>> print('Resampled dataset shape %s' % Counter(y_res))
Resampled dataset shape Counter({0: 904, 1: 900})

Methods

`fit`(X, y)	Check inputs and statistics of the sampler.
`fit_resample`(X, y, *[, sample_weight])	Resample the dataset.
`get_params`([deep])	Get parameters for this estimator.
`set_params`(**params)	Set the parameters of this estimator.

fit(X, y)

Check inputs and statistics of the sampler.

You should use fit_resample in all cases.

Parameters:

X{array-like, dataframe, sparse matrix} of shape (n_samples, n_features): Data array.
yarray-like of shape (n_samples,): Target array.

Returns:

selfobject: Return the instance itself.

fit_resample(X, y, *, sample_weight=None, **kwargs)

Resample the dataset.

Parameters:

X{array-like, dataframe, sparse matrix} of shape (n_samples, n_features)

Matrix containing the data which have to be sampled.

yarray-like of shape (n_samples,)

Corresponding label for each sample in X.

sample_weightarray-like of shape (n_samples,), default=None

Corresponding weight for each sample in X.

If None, perform normal resampling and return (X_resampled, y_resampled).
If array-like, the given sample_weight will be resampled along with X and y, and the resampled sample weights will be added to returns. The function will return (X_resampled, y_resampled, sample_weight_resampled).

Returns:

X_resampled{array-like, dataframe, sparse matrix} of shape (n_samples_new, n_features): The array containing the resampled data.
y_resampledarray-like of shape (n_samples_new,): The corresponding label of X_resampled.
sample_weight_resampledarray-like of shape (n_samples_new,), default=None: The corresponding weight of X_resampled. Only will be returned if input sample_weight is not None.

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.