bqlearn.density_ratio.IKMM¶
- class bqlearn.density_ratio.IKMM(estimator, *, n_estimators=10, exploit_iterative_learning=True, window=1, kernel='rbf', kernel_params={}, B=1000, epsilon=None, max_iter=1000, tol=1e-06, batch_size=None, n_jobs=None)[source]¶
An Iterative KMM Density Ratio Biquality Classifier.
An Iterative DR Biquality Classifier using Ensemble [1] Kernel Mean Matching [2] to reweigh untrusted examples [3].
- Parameters:
- estimatorobject
The estimator from which the IDR classifier is built. Support for sample weighting and probability prediction is required.
- n_estimatorsint, default=10
Number of trained estimators on reweighted samples.
- exploit_iterative_learning: boolean, default=False
If the estimator supports iterative learning with warm_start, exploit it by computing new weights for every epoch when fitting estimator.
- window: int, default=1
Number of previous losses used to compute sample weights.
- kernelstr or callable, default=”rbf”
Kernel mapping used internally. This parameter is directly passed to
pairwise_kernel. If kernel is a string, it must be one of the metrics in pairwise.PAIRWISE_KERNEL_FUNCTIONS. If kernel is “precomputed”, X is assumed to be a kernel matrix. Alternatively, if kernel is a callable function, it is called on each pair of instances (rows) and the resulting value recorded. The callable should take two rows from X as input and return the corresponding kernel value as a single number. This means that callables fromsklearn.metrics.pairwiseare not allowed, as they operate on matrices, not single samples. Use the string identifying the kernel instead.- kernel_paramsdict, optional (default={})
Kernel additional parameters
- B: float, optional (default=1000)
Bounding weights parameter.
- epsilon: float, optional (default=None)
Constraint parameter. If
Noneepsilon is set to(np.sqrt(n_samples_untrusted - 1)/np.sqrt(n_samples_untrusted).- max_iterint, default=100
Maximum number of iterations. The solver iterates until convergence (determined by ‘tol’) or this number of iterations.
- tol: float, default=1e-4
Termination criteria dictating the absolute and relative error on the primal residual, dual residual and duality gap.
- batch_sizeint or float, default=None
Size of minibatches for batched Kernel Mean Matching. An int value represent an absolute number of untrusted samples used per batch. An float value represent the fraction of untrusted samples used per batch. When set to None, use the entire untrusted samples in one batch.
- n_jobsint, default=None
The number of jobs to use for the computation. This parallelize the density ratio estimation procedures on all samples.
Nonemeans 1 unless in ajoblib.parallel_backendcontext.-1means using all processors. See Glossary for more details.
- Attributes:
- estimator_classifier
The fitted estimator.
- classes_ndarray of shape (n_classes,)
The classes labels.
- n_classes_int
The number of classes.
References
[1]Miao Y., Farahat A. and Kamel M. “Ensemble Kernel Mean Matching”, 2015
[2]Huang, J. and Smola, A. and Gretton, A. and Borgwardt, KM. and Schölkopf, B., “Correcting Sample Selection Bias by Unlabeled Data”, 2006
[3]Fang, T., Lu, N., Niu, G., and Sugiyama, M. “Rethinking importance weighting for deep learning under distribution shift.”, NeurIPS 2020
Methods
Call decision function of the final_estimator.
fit(X, y[, sample_quality])Fit the reweighted model.
get_params([deep])Get parameters for this estimator.
predict(X)Predict the classes of X.
Predict log probability for each possible outcome.
Predict probability for each possible outcome.
score(X, y[, sample_weight])Return the mean accuracy on the given test data and labels.
set_params(**params)Set the parameters of this estimator.
- decision_function(X)[source]¶
Call decision function of the final_estimator.
- Parameters:
- Xarray-like, shape (n_samples, n_features)
The input samples.
- Returns:
- yndarray, shape (n_samples,)
The predicted classes.
- fit(X, y, sample_quality=None)[source]¶
Fit the reweighted model.
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples. Sparse matrix can be CSC, CSR, COO, DOK, or LIL. COO, DOK, and LIL are converted to CSR.
- yarray-like of shape (n_samples,)
The target labels.
- sample_qualityarray-like, shape (n_samples,)
Sample qualities.
- Returns:
- selfobject
- get_params(deep=True)[source]¶
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(X)[source]¶
Predict the classes of X.
- Parameters:
- Xarray-like, shape (n_samples, n_features)
The input samples.
- Returns:
- yndarray, shape (n_samples,)
The predicted classes.
- predict_log_proba(X)[source]¶
Predict log probability for each possible outcome.
- Parameters:
- Xarray-like, shape (n_samples, n_features)
The input samples.
- Returns:
- log_parray, shape (n_samples, n_classes)
Array with log prediction probabilities.
- predict_proba(X)[source]¶
Predict probability for each possible outcome.
- Parameters:
- Xarray-like, shape (n_samples, n_features)
The input samples.
- Returns:
- parray, shape (n_samples, n_classes)
The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
- score(X, y, sample_weight=None)[source]¶
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
Mean accuracy of
self.predict(X)w.r.t. y.
- set_params(**params)[source]¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.