bqlearn.unbiased.LossCorrection¶
- class bqlearn.unbiased.LossCorrection(estimator, *, transition_matrix='anchor', quantile=0.97, n_iter=100, noise_free_prior=0, n_jobs=None)[source]¶
A Classifier corrected with the method of unbiased estimators [1].
It construts a surrogate loss \(\tilde{L}\) from the loss of interest \(L\) such that \(\mathbb{E}_{\tilde{y}}[\tilde{L}(f(x),\tilde{y})] = L(f(x),y)\).
\[\tilde{L}(f(x),y) = \frac{(1-\mathbb{P}(\tilde{Y}= y|Y\neqy))L(f(x), y) - \mathbb{P}(\tilde{Y}\neq y | Y =y ) L(f(x), -y) } {1 - \mathbb{P}(\tilde{Y}= y| Y\neq y ) - \mathbb{P}(\tilde{Y}\neqy|Y =y)}\]It does support multiclass classification thanks to a One versus Rest approach.
- Parameters:
- estimatorobject, optional (default=None)
The estimator which will be corrected to handle label noise. Support for negative sample weighting is required. Support for probability prediction for certain methods of transition matrix estimation.
- transition_matrix{‘iterative’, ‘anchor’, ‘gold’, ‘confusion’} or array-like of shape (n_classes, n_classes), default=’anchor’
Algorithm to estimate the transition matrix. ‘gold’ and ‘confusion’ are only available on biquality data.
- quantilefloat, default=0.97
Quantile used to select the anchor points. Only used when transition_matrix=’anchor’ or transition_matrix=’iterative’.
- n_iterint, default=100
Number of iteratives to compute the transition matrix. Only used when transition_matrix=’iterative’.
- noise_free_priorfloat, default=0.0
Factor for the convex combination between the estimated transition_matrix and the identity matrix to lower the condition number of the estimated transition matrix. It’s equivalent to take a more conservative noise-free prior.
- n_jobsint, default=None
The number of jobs to use for the computation: the n_classes one-vs-rest problems are computed in parallel.
Nonemeans 1 unless in ajoblib.parallel_backendcontext.-1means using all processors.
- Attributes:
- estimator_classifier
The fitted estimator.
- transition_matrix_: ndarray of shape (n_classes, n_classes)
Estimated transition matrix between untrusted and untrusted labels.
- classes_ndarray of shape (n_classes,)
The classes labels.
- n_classes_int
The number of classes.
- n_features_in_int
Number of features seen during fit. Only defined if the underlying estimator exposes such an attribute when fit.
- feature_names_in_ndarray of shape (n_features_in_,)
Names of features seen during fit. Only defined if the underlying estimator exposes such an attribute when fit.
References
[1]Natarajan, I. S. Dhillon, P. Ravikumar, and A. Tewari, “Learning with Noisy Labels”, NeurIPS, 2013.
Methods
Call predict of the regressor estimator.
fit(X, y[, sample_quality])Fit the noisy transition matrix and the corrected classifier.
get_params([deep])Get parameters for this estimator.
predict(X)Predict the classes of X.
Predict probability for each possible outcome.
score(X, y[, sample_weight])Return the mean accuracy on the given test data and labels.
set_params(**params)Set the parameters of this estimator.
- decision_function(X)[source]¶
Call predict of the regressor estimator.
- Parameters:
- Xarray-like, shape (n_samples, n_features)
The input samples.
- Returns:
- yndarray, shape (n_samples, n_classes)
The predicted classes.
- fit(X, y, sample_quality=None, **fit_params)[source]¶
Fit the noisy transition matrix and the corrected classifier.
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
Training data.
- yarray-like of shape (n_samples,) or (n_samples, n_targets)
Target values.
- Returns:
- selfobject
Returns the instance itself.
- get_params(deep=True)[source]¶
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(X)[source]¶
Predict the classes of X.
- Parameters:
- Xarray-like, shape (n_samples, n_features)
The input samples.
- Returns:
- yndarray, shape (n_samples,)
The predicted classes.
- predict_proba(X)[source]¶
Predict probability for each possible outcome.
- Parameters:
- Xarray-like, shape (n_samples, n_features)
The input samples.
- Returns:
- parray, shape (n_samples, n_classes)
The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
- score(X, y, sample_weight=None)[source]¶
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
Mean accuracy of
self.predict(X)w.r.t. y.
- set_params(**params)[source]¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.