Predict#

abil.predict.export_prediction(ensemble_config, m, target, target_no_space, X_predict, X_train, y_train, cv, model_out, n_threads=8)#

Exports model predictions to a NetCDF file.

Parameters:
  • m (object) – The trained model used for predictions.

  • target (str) – The name of the target variable.

  • target_no_space (str) – The target variable name with spaces replaced by underscores.

  • X_predict (pd.DataFrame of shape (n_points, n_features)) – Features to predict on (e.g., environmental data), where n_points is the total 1-d size of the features to predict on (ex. 31881600 for full 180x360x41x12 grid).

  • model_out (str) – Path where the predictions should be saved.

  • n_threads (int, optional, default=1) – The number of threads to use for parallel prediction.

abil.predict.load_model_and_scores(path_out, ensemble_config, n, target)#

Loads a trained model and scoring information, and calculates the mean absolute error (MAE) for the prediction.

Parameters:
  • path_out (str) – Path to the output folder containing model and scoring information.

  • ensemble_config (dict) – Dictionary containing configuration details for the ensemble models, including model names, regressor or classifier status.

  • n (int) – Index of the model to load in the ensemble.

  • target (str) – The target for which predictions are made (used to load target-specific files).

Returns:

A tuple containing the model and the mean absolute error (MAE) score.

Return type:

tuple

Raises:

ValueError – If both regressor and classifier are set to False, or if a classifier is used when only regressors are supported.

class abil.predict.predict(X_train, y, X_predict, model_config, n_jobs=1)#

Bases: object

Predict outcomes using an ensemble of regression models and export the predictions to a NetCDF file.

Parameters:
  • X_train (pd.DataFrame of shape (n_samples, n_features)) – Training features used for model fitting.

  • y (pd.Series of shape (n_samples,) or (n_samples, n_outputs)) – Target values used for model fitting.

  • X_predict (pd.DataFrame of shape (n_points, n_features)) – Features to predict on, where n_points represents the total number of prediction points (e.g., 31881600 for a full 180x360x41x12 grid).

  • model_config (dict) –

    Dictionary containing model configuration parameters, including:
    • seedint

      Random seed for reproducibility.

    • rootstr

      Path to the root folder.

    • path_outstr

      Directory where predictions are saved.

    • targetstr

      File name of the target list.

    • verboseint

      Verbosity level (0-3).

    • n_threadsint

      Number of threads to use for parallel processing.

    • cvint

      Number of cross-validation folds.

    • ensemble_configdict
      Configuration for the ensemble setup, containing:
      • classifierbool

        Whether to train a classification model.

      • regressorbool

        Whether to train a regression model.

      • m{n}str

        Model names (e.g., “m1: ‘rf’”, “m2: ‘xgb’”).

  • n_jobs (int, optional, default=1) – Number of threads to use for parallel processing.

path_out#

Path where predictions and model outputs are saved.

Type:

str

target#

Name of the target variable.

Type:

str

target_no_space#

Target variable name with spaces replaced by underscores.

Type:

str

verbose#

Verbosity level for logging.

Type:

int

n_jobs#

Number of parallel threads used for prediction and cross-validation.

Type:

int

make_prediction()#

Train the ensemble models and generate predictions, exporting them to NetCDF.

make_prediction()#

Fit models in the ensemble and generate predictions.

Predictions are exported to NetCDF files. If the ensemble contains multiple models, predictions are made for each individual model and the ensemble.

Return type:

None

Notes

  • Individual model predictions and ensemble predictions are saved separately.

  • Performance metrics (e.g., cross-validation scores) are saved for the ensemble.

  • Only regression models are supported; classification is not implemented.