Note
Click here to download the full example code
Usage of pipeline embedding samplers
An example of the Pipeline
object (or
make_pipeline()
helper function) working with
transformers (PCA
,
KNeighborsClassifier
from scikit-learn) and resamplers
(EditedNearestNeighbours
,
SMOTE
).
# Adapted from imbalanced-learn
# Authors: Christos Aridas
# Guillaume Lemaitre
# License: MIT
print(__doc__)
# sphinx_gallery_thumbnail_path = '../../docs/source/_static/thumbnail.png'
Let’s first create an imbalanced dataset and split in to two sets.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
X, y = make_classification(
n_classes=2,
class_sep=1.25,
weights=[0.3, 0.7],
n_informative=3,
n_redundant=1,
flip_y=0,
n_features=5,
n_clusters_per_class=1,
n_samples=5000,
random_state=10,
)
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, random_state=42)
Now, we will create each individual steps that we would like later to combine
from sklearn.decomposition import PCA
from sklearn.neighbors import KNeighborsClassifier
from imbens.sampler import EditedNearestNeighbours
from imbens.sampler import SMOTE
pca = PCA(n_components=2)
enn = EditedNearestNeighbours()
smote = SMOTE(random_state=0)
knn = KNeighborsClassifier(n_neighbors=1)
Now, we can finally create a pipeline to specify in which order the different transformers and samplers should be executed before to provide the data to the final classifier.
from imbens.pipeline import make_pipeline
model = make_pipeline(pca, enn, smote, knn)
We can now use the pipeline created as a normal classifier where resampling will happen when calling fit and disabled when calling decision_function, predict_proba, or predict.
from sklearn.metrics import classification_report
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))
precision recall f1-score support
0 0.99 0.99 0.99 375
1 1.00 1.00 1.00 875
accuracy 0.99 1250
macro avg 0.99 0.99 0.99 1250
weighted avg 0.99 0.99 0.99 1250
Total running time of the script: ( 0 minutes 0.026 seconds)