make_imbalance

imbens.datasets.make_imbalance(X, y, *, sampling_strategy=None, random_state=None, verbose=False, **kwargs)[source]

Turns a dataset into an imbalanced dataset with a specific sampling strategy.

A simple toy dataset to visualize clustering and classification algorithms.ss

Read more in the User Guide.

Parameters:
X{array-like, dataframe} of shape (n_samples, n_features)

Matrix containing the data to be imbalanced.

yndarray of shape (n_samples,)

Corresponding label for each sample in X.

sampling_strategydict or callable,

Ratio to use for resampling the data set.

  • When dict, the keys correspond to the targeted classes. The values correspond to the desired number of samples for each targeted class.

  • When callable, function taking y and returns a dict. The keys correspond to the targeted classes. The values correspond to the desired number of samples for each class.

random_stateint, RandomState instance or None, default=None

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

verbosebool, default=False

Show information regarding the sampling.

kwargsdict

Dictionary of additional keyword arguments to pass to sampling_strategy.

Returns:
X_resampled{ndarray, dataframe} of shape (n_samples_new, n_features)

The array containing the imbalanced data.

y_resampledndarray of shape (n_samples_new)

The corresponding label of X_resampled.

Notes

See Make a dataset class-imbalanced for an example.

Examples

>>> from collections import Counter
>>> from sklearn.datasets import load_iris
>>> from imbens.datasets import make_imbalance
>>> data = load_iris()
>>> X, y = data.data, data.target
>>> print(f'Distribution before imbalancing: {Counter(y)}')
Distribution before imbalancing: Counter({0: 50, 1: 50, 2: 50})
>>> X_res, y_res = make_imbalance(X, y,
...                               sampling_strategy={0: 10, 1: 20, 2: 30},
...                               random_state=42)
>>> print(f'Distribution after imbalancing: {Counter(y_res)}')
Distribution after imbalancing: Counter({2: 30, 1: 20, 0: 10})

Examples using imbens.datasets.make_imbalance

Classify class-imbalanced hand-written digits

Classify class-imbalanced hand-written digits

Plot probabilities with different base classifiers

Plot probabilities with different base classifiers

Classifier comparison

Classifier comparison

Classification with PyTorch Neural Network

Classification with PyTorch Neural Network

Make a dataset class-imbalanced

Make a dataset class-imbalanced

Make digits dataset class-imbalanced

Make digits dataset class-imbalanced