.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/basic/plot_basic_example.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_basic_plot_basic_example.py: ========================================================= Train and predict with an ensemble classifier ========================================================= This example shows the basic usage of an :mod:`imbens.ensemble` classifier. This example uses: - :class:`imbens.ensemble.SelfPacedEnsembleClassifier` .. GENERATED FROM PYTHON SOURCE LINES 13-18 .. code-block:: Python # Authors: Zhining Liu # License: MIT .. GENERATED FROM PYTHON SOURCE LINES 19-37 .. code-block:: Python print(__doc__) # Import imbalanced-ensemble import imbens # Import utilities from collections import Counter import sklearn from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from imbens.ensemble.base import sort_dict_by_key # Import plot utilities import matplotlib.pyplot as plt from imbens.utils._plot import plot_2Dprojection_and_cardinality RANDOM_STATE = 42 .. GENERATED FROM PYTHON SOURCE LINES 38-41 Prepare & visualize the data ---------------------------- Make a toy 3-class imbalanced classification task. .. GENERATED FROM PYTHON SOURCE LINES 41-67 .. code-block:: Python # Generate and split a synthetic dataset X, y = make_classification( n_classes=3, n_samples=2000, class_sep=2, weights=[0.1, 0.3, 0.6], n_informative=3, n_redundant=1, flip_y=0, n_features=20, n_clusters_per_class=2, random_state=RANDOM_STATE, ) X_train, X_valid, y_train, y_valid = train_test_split( X, y, test_size=0.5, stratify=y, random_state=RANDOM_STATE ) # Visualize the training dataset fig = plot_2Dprojection_and_cardinality(X_train, y_train, figsize=(8, 4)) plt.show() # Print class distribution print('Training dataset distribution %s' % sort_dict_by_key(Counter(y_train))) print('Validation dataset distribution %s' % sort_dict_by_key(Counter(y_valid))) .. image-sg:: /auto_examples/basic/images/sphx_glr_plot_basic_example_001.png :alt: Dataset (2D projection by KernelPCA), Class Distribution :srcset: /auto_examples/basic/images/sphx_glr_plot_basic_example_001.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none Training dataset distribution {np.int64(0): 100, np.int64(1): 300, np.int64(2): 600} Validation dataset distribution {np.int64(0): 100, np.int64(1): 300, np.int64(2): 600} .. GENERATED FROM PYTHON SOURCE LINES 68-71 Using ensemble classifiers in ``imbens`` ----------------------------------------------------- Take ``SelfPacedEnsembleClassifier`` as example .. GENERATED FROM PYTHON SOURCE LINES 71-88 .. code-block:: Python # Initialize an SelfPacedEnsembleClassifier clf = imbens.ensemble.SelfPacedEnsembleClassifier(random_state=RANDOM_STATE) # Train an SelfPacedEnsembleClassifier clf.fit(X_train, y_train) # Make predictions y_pred_proba = clf.predict_proba(X_valid) y_pred = clf.predict(X_valid) # Evaluate balanced_acc_score = sklearn.metrics.balanced_accuracy_score(y_valid, y_pred) print(f'SPE: ensemble of {clf.n_estimators} {clf.estimator_}') print('Validation Balanced Accuracy: {:.3f}'.format(balanced_acc_score)) .. rst-class:: sphx-glr-script-out .. code-block:: none SPE: ensemble of 50 DecisionTreeClassifier() Validation Balanced Accuracy: 0.980 .. GENERATED FROM PYTHON SOURCE LINES 89-92 Set the ensemble size --------------------- (parameter ``n_estimators``: int) .. GENERATED FROM PYTHON SOURCE LINES 92-107 .. code-block:: Python from imbens.ensemble import SelfPacedEnsembleClassifier as SPE from sklearn.metrics import balanced_accuracy_score clf = SPE( n_estimators=5, # Set ensemble size to 5 random_state=RANDOM_STATE, ).fit(X_train, y_train) # Evaluate balanced_acc_score = balanced_accuracy_score(y_valid, clf.predict(X_valid)) print(f'SPE: ensemble of {clf.n_estimators} {clf.estimator_}') print('Validation Balanced Accuracy: {:.3f}'.format(balanced_acc_score)) .. rst-class:: sphx-glr-script-out .. code-block:: none SPE: ensemble of 5 DecisionTreeClassifier() Validation Balanced Accuracy: 0.978 .. GENERATED FROM PYTHON SOURCE LINES 108-111 Use different base estimator ---------------------------- (parameter ``estimator``: estimator object) .. GENERATED FROM PYTHON SOURCE LINES 111-126 .. code-block:: Python from sklearn.svm import SVC clf = SPE( n_estimators=5, estimator=SVC(probability=True), # Use SVM as the base estimator random_state=RANDOM_STATE, ).fit(X_train, y_train) # Evaluate balanced_acc_score = balanced_accuracy_score(y_valid, clf.predict(X_valid)) print(f'SPE: ensemble of {clf.n_estimators} {clf.estimator_}') print('Validation Balanced Accuracy: {:.3f}'.format(balanced_acc_score)) .. rst-class:: sphx-glr-script-out .. code-block:: none SPE: ensemble of 5 SVC(probability=True) Validation Balanced Accuracy: 0.972 .. GENERATED FROM PYTHON SOURCE LINES 127-130 Enable training log ------------------- (``fit()`` parameter ``train_verbose``: bool, int or dict) .. GENERATED FROM PYTHON SOURCE LINES 130-136 .. code-block:: Python clf = SPE(random_state=RANDOM_STATE).fit( X_train, y_train, train_verbose=True, # Enable training log ) .. rst-class:: sphx-glr-script-out .. code-block:: none ┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ ┃ ┃ Data: train ┃ ┃ #Estimators ┃ Class Distribution ┃ Metric ┃ ┃ ┃ ┃ acc balanced_acc weighted_f1 ┃ ┣━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┫ ┃ 1 ┃ {np.int64(0): 100, np.int64(1): 100, np.int64(2): 100} ┃ 0.958 0.968 0.959 ┃ ┃ 5 ┃ {np.int64(0): 100, np.int64(1): 100, np.int64(2): 100} ┃ 1.000 1.000 1.000 ┃ ┃ 10 ┃ {np.int64(0): 100, np.int64(1): 100, np.int64(2): 100} ┃ 1.000 1.000 1.000 ┃ ┃ 15 ┃ {np.int64(0): 100, np.int64(1): 100, np.int64(2): 100} ┃ 0.999 0.997 0.999 ┃ ┃ 20 ┃ {np.int64(0): 100, np.int64(1): 100, np.int64(2): 100} ┃ 1.000 1.000 1.000 ┃ ┃ 25 ┃ {np.int64(0): 100, np.int64(1): 100, np.int64(2): 100} ┃ 1.000 1.000 1.000 ┃ ┃ 30 ┃ {np.int64(0): 100, np.int64(1): 100, np.int64(2): 100} ┃ 1.000 1.000 1.000 ┃ ┃ 35 ┃ {np.int64(0): 100, np.int64(1): 100, np.int64(2): 100} ┃ 1.000 1.000 1.000 ┃ ┃ 40 ┃ {np.int64(0): 100, np.int64(1): 100, np.int64(2): 100} ┃ 1.000 1.000 1.000 ┃ ┃ 45 ┃ {np.int64(0): 100, np.int64(1): 100, np.int64(2): 100} ┃ 1.000 1.000 1.000 ┃ ┃ 50 ┃ {np.int64(0): 100, np.int64(1): 100, np.int64(2): 100} ┃ 1.000 1.000 1.000 ┃ ┣━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┫ ┃ final ┃ {np.int64(0): 100, np.int64(1): 100, np.int64(2): 100} ┃ 1.000 1.000 1.000 ┃ ┗━━━━━━━━━━━━━┻━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┻━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 1.104 seconds) .. _sphx_glr_download_auto_examples_basic_plot_basic_example.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_basic_example.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_basic_example.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_basic_example.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_