bqlearn.model_selection.make_biquality_cv

bqlearn.model_selection.make_biquality_cv(X, sample_quality, cv=None, *, y=None, groups=None)[source]

Utility function for building a biquality cross-validator.

In the Biquality Data setup, cross-validators behave the same way as usual cross-validators, but untrusted samples should be remove from the generated test dataset.

At the moment this cross-validator is made thanks to PredifinedSplit and untrusted samples are removed from all test sets generated by the provided cv. That’s why each sample should be attributed to only one test set at maximum, otherwise a warning is returned.

Parameters:
Xarray-like of shape (n_samples, n_features)

The samples.

sample_qualityarray-like of shape (n_samples,)

The sample quality.

cvint, cross-validation generator or an iterable, default=None

Determines the cross-validation splitting strategy. Possible inputs for cv are: - None, to use the default 5-fold cross validation, - integer, to specify the number of folds. - CV splitter, - An iterable that generates (train, test) splits as arrays of indices.

For integer/None inputs, if y is either binary or multiclass, StratifiedKFold is used. In all other cases, KFold is used.

yarray-like of shape (n_samples,), default=None

The target variable.

groupsarray-like of shape (n_samples,), default=None

Group labels for the samples used while splitting the dataset into train/test set.

Returns:
biquality_cva cross-validator instance.

The return value is a cross-validator which generates the train/test splits via the split method.