Zero Stratified KFolds#

class abil.zero_stratified_kfold.UpsampledZeroStratifiedKFold(n_splits=3)#

Bases: object

Custom cross-validation generator with upsampling of zero instances for stratified folds.

get_n_splits(X, y, groups=None)#
split(X, y, groups=None)#

Generate train-test splits with upsampling of the minority class in the training data.

Parameters:
  • X (array-like, shape (n_samples, n_features)) – Feature matrix.

  • y (array-like, shape (n_samples,)) – Target variable.

  • groups (array-like, optional) – Group labels for the samples, used for group-based splitting. Not used in this method.

Yields:
  • train_indices (np.ndarray) – Indices for the training set with upsampled minority class.

  • test_indices (np.ndarray) – Indices for the test set.

Notes

  • Converts y into a binary variable (1 for non-zero values, 0 otherwise) for stratified sampling.

  • Upsamples the minority class in the training set to match the size of the majority class.

  • Uses StratifiedKFold for generating splits based on the binary target variable.

class abil.zero_stratified_kfold.ZeroStratifiedKFold(n_splits=3)#

Bases: object

Custom cross-validation generator to handle zero-inflated targets with stratification.

get_n_splits(X, y, groups=None)#
split(X, y, groups=None)#

Generate train-test splits with upsampling of the minority class in the training data.

Parameters:
  • X (array-like, shape (n_samples, n_features)) – Feature matrix.

  • y (array-like, shape (n_samples,)) – Target variable.

  • groups (array-like, optional) – Group labels for the samples, used for group-based splitting. Not used in this method.

Yields:
  • train_indices (np.ndarray) – Indices for the training set with upsampled minority class.

  • test_indices (np.ndarray) – Indices for the test set.

Notes

  • Converts y into a binary variable (1 for non-zero values, 0 otherwise) for stratified sampling.

  • Upsamples the minority class in the training set to match the size of the majority class.

  • Uses StratifiedKFold for generating splits based on the binary target variable.