bqlearn.irlnl.IRLNL

class bqlearn.irlnl.IRLNL(base_estimator, final_estimator, *, transition_matrix='anchor', quantile=0.97, n_iter=100, noise_free_prior=0, n_jobs=None)[source]

A Reweighted Classifier for Learning with Noisy Label [1].

For each class \(y\), the untrusted samples are reweighted given the predictions made by a classifier learned on untrusted data and a noise transition matrix:

\[\frac{\mathbb{P}(Y=y|X)}{\mathbb{P}(\tilde{Y}=y|X)} = \frac{\mathbb{P}(\tilde{Y}=y|X) - \mathbb{1}_{\tilde{Y}=y} \times \mathbb{P} (\tilde{Y}= y | Y\neq y ) - \mathbb{1}_{\tilde{Y}\neq y} \times \mathbb{P} (\tilde{Y}\neq y | Y =y )} {\left(1 - \mathbb{P}(\tilde{Y}= y | Y\neq y ) - \mathbb{P}(\tilde{Y}\neq y | Y =y )\right)\mathbb{P}(\tilde{Y}=y|X)}\]

It does support multiclass classification thanks to a One versus Rest approach [2].

Parameters:
base_estimatorobject, optional (default=None)

The classifier used to estimate the transition matrix and the noisy classification task. Support for probability prediction is required.

final_estimatorobject, optional (default=None)

The final estimator which will be reweighted to handle label noise. Support for sample weighting is required.

transition_matrix{‘iterative’, ‘anchor’, ‘gold’, ‘confusion’} or array-like of shape (n_classes, n_classes), default=’anchor’

Algorithm to estimate the transition matrix. ‘gold’ and ‘confusion’ are only available on biquality data.

quantilefloat, default=0.97

Quantile used to select the anchor points. Only used when transition_matrix=’anchor’ or transition_matrix=’iterative’.

n_iterint, default=100

Number of iteratives to compute the transition matrix. Only used when transition_matrix=’iterative’.

noise_free_priorfloat, default=0.0

Factor for the convex combination between the estimated transition_matrix and the identity matrix to lower the condition number of the estimated transition matrix. It’s equivalent to take a more conservative noise-free prior.

n_jobsint, default=None

The number of jobs to use for the computation: the n_classes one-vs-rest problems are computed in parallel. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.

Attributes:
final_estimator_classifier

The final fitted estimator.

transition_matrix_: ndarray of shape (n_classes, n_classes)

Estimated transition matrix between untrusted and untrusted labels.

sample_weight_ndarray, shape (n_samples, n_classes)

The weights of the examples computed during fit().

classes_ndarray of shape (n_classes,)

The classes labels.

n_classes_int

The number of classes.

n_features_in_int

Number of features seen during fit. Only defined if the underlying estimator exposes such an attribute when fit.

feature_names_in_ndarray of shape (n_features_in_,)

Names of features seen during fit. Only defined if the underlying estimator exposes such an attribute when fit.

References

[1]
  1. Liu and D. Tao, “Classification with noisy labels by importance reweighting.”, in IEEE Transactions on pattern analysis and machine intelligence, 2015

[2]
  1. Wang, T. Liu and D. Tao, “Multiclass Learning With Partially Corrupted Labels”, in IEEE Transactions on Neural Networks and Learning Systems, 2018.

Methods

decision_function(X)

Call predict of the regressor estimator.

fit(X, y[, sample_quality])

Fit the noisy classification model and the reweighted final classifier.

get_params([deep])

Get parameters for this estimator.

predict(X)

Predict the classes of X.

predict_proba(X)

Predict probability for each possible outcome.

score(X, y[, sample_weight])

Return the mean accuracy on the given test data and labels.

set_params(**params)

Set the parameters of this estimator.

decision_function(X)[source]

Call predict of the regressor estimator.

Parameters:
Xarray-like, shape (n_samples, n_features)

The input samples.

Returns:
yndarray, shape (n_samples, n_classes)

The predicted classes.

fit(X, y, sample_quality=None, **fit_params)[source]

Fit the noisy classification model and the reweighted final classifier.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

Training data.

yarray-like of shape (n_samples,) or (n_samples, n_targets)

Target values.

Returns:
selfobject

Returns the instance itself.

get_params(deep=True)[source]

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

predict(X)[source]

Predict the classes of X.

Parameters:
Xarray-like, shape (n_samples, n_features)

The input samples.

Returns:
yndarray, shape (n_samples,)

The predicted classes.

predict_proba(X)[source]

Predict probability for each possible outcome.

Parameters:
Xarray-like, shape (n_samples, n_features)

The input samples.

Returns:
parray, shape (n_samples, n_classes)

The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.

score(X, y, sample_weight=None)[source]

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
Xarray-like of shape (n_samples, n_features)

Test samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

True labels for X.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

Returns:
scorefloat

Mean accuracy of self.predict(X) w.r.t. y.

set_params(**params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.