Usage of pipeline embedding samplers

An example of the Pipeline object (or make_pipeline() helper function) working with transformers (PCA, KNeighborsClassifier from scikit-learn) and resamplers (EditedNearestNeighbours, SMOTE).

# Adapted from imbalanced-learn
# Authors: Christos Aridas
#          Guillaume Lemaitre <>
# License: MIT

# sphinx_gallery_thumbnail_path = '../../docs/source/_static/thumbnail.png'

Let’s first create an imbalanced dataset and split in to two sets.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

X, y = make_classification(
    weights=[0.3, 0.7],

X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, random_state=42)

Now, we will create each individual steps that we would like later to combine

from sklearn.decomposition import PCA
from sklearn.neighbors import KNeighborsClassifier
from imbalanced_ensemble.sampler.under_sampling import EditedNearestNeighbours
from imbalanced_ensemble.sampler.over_sampling import SMOTE

pca = PCA(n_components=2)
enn = EditedNearestNeighbours()
smote = SMOTE(random_state=0)
knn = KNeighborsClassifier(n_neighbors=1)

Now, we can finally create a pipeline to specify in which order the different transformers and samplers should be executed before to provide the data to the final classifier.

from imbalanced_ensemble.pipeline import make_pipeline

model = make_pipeline(pca, enn, smote, knn)

We can now use the pipeline created as a normal classifier where resampling will happen when calling fit and disabled when calling decision_function, predict_proba, or predict.

from sklearn.metrics import classification_report, y_train)
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))


              precision    recall  f1-score   support

           0       0.99      0.99      0.99       375
           1       1.00      1.00      1.00       875

    accuracy                           0.99      1250
   macro avg       0.99      0.99      0.99      1250
weighted avg       0.99      0.99      0.99      1250

Total running time of the script: ( 0 minutes 55.730 seconds)

Estimated memory usage: 13 MB

Gallery generated by Sphinx-Gallery