`bqlearn.unbiased`.LossCorrection¶

class bqlearn.unbiased.LossCorrection(estimator, *, transition_matrix='anchor', quantile=0.97, n_iter=100, noise_free_prior=0, n_jobs=None)[source]¶

A Classifier corrected with the method of unbiased estimators [1].

It construts a surrogate loss \(\tilde{L}\) from the loss of interest \(L\) such that \(\mathbb{E}_{\tilde{y}}[\tilde{L}(f(x),\tilde{y})] = L(f(x),y)\).

\[\tilde{L}(f(x),y) = \frac{(1-\mathbb{P}(\tilde{Y}= y|Y\neqy))L(f(x), y) - \mathbb{P}(\tilde{Y}\neq y | Y =y ) L(f(x), -y) } {1 - \mathbb{P}(\tilde{Y}= y| Y\neq y ) - \mathbb{P}(\tilde{Y}\neqy|Y =y)}\]

It does support multiclass classification thanks to a One versus Rest approach.

Parameters:

estimatorobject, optional (default=None): The estimator which will be corrected to handle label noise. Support for negative sample weighting is required. Support for probability prediction for certain methods of transition matrix estimation.
transition_matrix{‘iterative’, ‘anchor’, ‘gold’, ‘confusion’} or array-like of shape (n_classes, n_classes), default=’anchor’: Algorithm to estimate the transition matrix. ‘gold’ and ‘confusion’ are only available on biquality data.
quantilefloat, default=0.97: Quantile used to select the anchor points. Only used when transition_matrix=’anchor’ or transition_matrix=’iterative’.
n_iterint, default=100: Number of iteratives to compute the transition matrix. Only used when transition_matrix=’iterative’.
noise_free_priorfloat, default=0.0: Factor for the convex combination between the estimated transition_matrix and the identity matrix to lower the condition number of the estimated transition matrix. It’s equivalent to take a more conservative noise-free prior.
n_jobsint, default=None: The number of jobs to use for the computation: the n_classes one-vs-rest problems are computed in parallel. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.

Attributes:

estimator_classifier: The fitted estimator.
transition_matrix_: ndarray of shape (n_classes, n_classes): Estimated transition matrix between untrusted and untrusted labels.
classes_ndarray of shape (n_classes,): The classes labels.
n_classes_int: The number of classes.
n_features_in_int: Number of features seen during fit. Only defined if the underlying estimator exposes such an attribute when fit.
feature_names_in_ndarray of shape (n_features_in_,): Names of features seen during fit. Only defined if the underlying estimator exposes such an attribute when fit.

References

[1]

Natarajan, I. S. Dhillon, P. Ravikumar, and A. Tewari, “Learning with Noisy Labels”, NeurIPS, 2013.

Methods

`decision_function`(X)	Call predict of the regressor estimator.
`fit`(X, y[, sample_quality])	Fit the noisy transition matrix and the corrected classifier.
`get_params`([deep])	Get parameters for this estimator.
`predict`(X)	Predict the classes of X.
`predict_proba`(X)	Predict probability for each possible outcome.
`score`(X, y[, sample_weight])	Return the mean accuracy on the given test data and labels.
`set_params`(**params)	Set the parameters of this estimator.

decision_function(X)[source]¶

Call predict of the regressor estimator.

Parameters:

Xarray-like, shape (n_samples, n_features): The input samples.

Returns:

yndarray, shape (n_samples, n_classes): The predicted classes.

fit(X, y, sample_quality=None, **fit_params)[source]¶

Fit the noisy transition matrix and the corrected classifier.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): Training data.
yarray-like of shape (n_samples,) or (n_samples, n_targets): Target values.

Returns:

selfobject: Returns the instance itself.

get_params(deep=True)[source]¶

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

predict(X)[source]¶

Predict the classes of X.

Parameters:

Xarray-like, shape (n_samples, n_features): The input samples.

Returns:

yndarray, shape (n_samples,): The predicted classes.

predict_proba(X)[source]¶

Predict probability for each possible outcome.

Parameters:

Xarray-like, shape (n_samples, n_features): The input samples.

Returns:

parray, shape (n_samples, n_classes): The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.

score(X, y, sample_weight=None)[source]¶

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:

Xarray-like of shape (n_samples, n_features): Test samples.
yarray-like of shape (n_samples,) or (n_samples, n_outputs): True labels for X.
sample_weightarray-like of shape (n_samples,), default=None: Sample weights.

Returns:

scorefloat: Mean accuracy of self.predict(X) w.r.t. y.

set_params(**params)[source]¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

bqlearn.unbiased.LossCorrection¶

`bqlearn.unbiased`.LossCorrection¶