Utility Functions#

abil.utils.abbreviate_species(species_name)#

Abbreviate a species name by shortening the first word to its initial.

Parameters:

species_name (str) – Full species name.

Returns:

Abbreviated species name.

Return type:

str

abil.utils.do_nothing(x)#

Apply no transformation to the input values.

Parameters:

x (array-like) – Input values.

Returns:

y – Non-transformed values.

Return type:

array-like

abil.utils.example_data(y_name, n_samples=100, n_features=5, noise=0.1, train_to_predict_ratio=0.7, zero_to_non_zero_ratio=0.5, random_state=59)#

Generate training and prediction datasets with [‘lat’, ‘lon’, ‘depth’, ‘time’] indices. Includes zeros in the target and allows upsampling of zero values.

Parameters:
  • y_name (str) – Name of the target variable.

  • n_samples (int) – Total number of samples to generate (training + prediction).

  • n_features (int) – Number of features for the dataset.

  • noise (float) – Noise level for the regression data.

  • train_to_predict_ratio (float) – Ratio of training to prediction data.

  • zero_to_non_zero_ratio (float) – Ratio of zero to non-zero target values after upsampling.

  • random_state (int) – Random seed for reproducibility.

Returns:

Training feature dataset with MultiIndex. X_predict (pd.DataFrame): Prediction feature dataset with MultiIndex. y (pd.Series): Target variable for training dataset.

Return type:

X_train (pd.DataFrame)

abil.utils.find_optimal_threshold(model, X, y_test)#

Finds the optimal probability threshold for binary classification using the ROC curve and Youden’s Index.

Parameters:#

modelsklearn classifier

A fitted binary classification model

Xarray-like of shape (n_samples, n_features)

Input features for the test or validation set.

y_testarray-like of shape (n_samples,)

True binary labels for the test or validation set.

Returns:#

optimal_thresholdfloat

The optimal probability threshold for classifying a sample as present.

abil.utils.inverse_weighting(values)#

Compute inverse weighting for a list of values.

Parameters:

values (list of float) – Input values.

Returns:

Normalized inverse weights.

Return type:

list of float

abil.utils.is_xgboost_model(model)#

Recursively check if the model is an XGBoost model, even if it’s wrapped in a Pipeline or TransformedTargetRegressor. Uses getattr to check for XGBoost-specific attributes.

abil.utils.merge_obs_env(obs_path='../data/gridded_abundances.csv', env_path='../data/env_data.nc', env_vars=None, out_path='../data/obs_env.csv')#

Merge observational and environmental datasets based on spatial and temporal indices.

Parameters:
  • obs_path (str, default="../data/gridded_abundances.csv") – Path to observational data CSV.

  • env_path (str, default="../data/env_data.nc") – Path to environmental data NetCDF file.

  • env_vars (list of str, optional) – List of environmental variables to include in the merge.

  • out_path (str, default="../data/obs_env.csv") – Path to save the merged dataset.

Return type:

None

abil.utils.upsample(d, target, ratio=10)#

Upsample zero and non-zero observations in the dataset to balance classes.

Parameters:
  • d (pd.DataFrame) – Input dataframe.

  • target (str) – Target column for upsampling.

  • ratio (int, default=10) – Ratio of zeros to non-zero samples after upsampling.

Returns:

ix – Upsampled dataframe.

Return type:

pd.DataFrame

abil.utils.weighted_quantile(x, weights, q=0.5)#

Computes the weighted quantile(s) of a dataset.

Parameters:#

xarray-like of shape (n_samples,)

The data for which to compute the quantile(s).

weightsarray-like of shape (n_samples,)

The weights corresponding to each data point in x.

qfloat or array-like of floats, default=0.5

The quantile(s) to compute. Must be between 0 and 1. If an array is provided, the function will return the weighted quantiles for each value in q.

Returns:#

resultfloat or list of floats

The weighted quantile(s) corresponding to the input q. If q is a single float, the result is a single value. If q is an array-like, the result is a list of quantiles.”

abil.utils.xgboost_get_n_estimators(model)#

Recursively extract the n_estimators parameter from an XGBoost model, even if it’s wrapped in a Pipeline or TransformedTargetRegressor.