Predict#
- abil.predict.export_prediction(ensemble_config, m, target, target_no_space, X_predict, X_train, y_train, cv, model_out, n_threads=8)#
Exports model predictions to a NetCDF file.
- Parameters:
m (object) – The trained model used for predictions.
target (str) – The name of the target variable.
target_no_space (str) – The target variable name with spaces replaced by underscores.
X_predict (pd.DataFrame of shape (n_points, n_features)) – Features to predict on (e.g., environmental data), where n_points is the total 1-d size of the features to predict on (ex. 31881600 for full 180x360x41x12 grid).
model_out (str) – Path where the predictions should be saved.
n_threads (int, optional, default=1) – The number of threads to use for parallel prediction.
- abil.predict.load_model_and_scores(path_out, ensemble_config, n, target)#
Loads a trained model and scoring information, and calculates the mean absolute error (MAE) for the prediction.
- Parameters:
path_out (str) – Path to the output folder containing model and scoring information.
ensemble_config (dict) – Dictionary containing configuration details for the ensemble models, including model names, regressor or classifier status.
n (int) – Index of the model to load in the ensemble.
target (str) – The target for which predictions are made (used to load target-specific files).
- Returns:
A tuple containing the model and the mean absolute error (MAE) score.
- Return type:
tuple
- Raises:
ValueError – If both regressor and classifier are set to False, or if a classifier is used when only regressors are supported.
- class abil.predict.predict(X_train, y, X_predict, model_config, n_jobs=1)#
Bases:
object
Predict outcomes using an ensemble of regression models and export the predictions to a NetCDF file.
- Parameters:
X_train (pd.DataFrame of shape (n_samples, n_features)) – Training features used for model fitting.
y (pd.Series of shape (n_samples,) or (n_samples, n_outputs)) – Target values used for model fitting.
X_predict (pd.DataFrame of shape (n_points, n_features)) – Features to predict on, where n_points represents the total number of prediction points (e.g., 31881600 for a full 180x360x41x12 grid).
model_config (dict) –
- Dictionary containing model configuration parameters, including:
- seedint
Random seed for reproducibility.
- rootstr
Path to the root folder.
- path_outstr
Directory where predictions are saved.
- targetstr
File name of the target list.
- verboseint
Verbosity level (0-3).
- n_threadsint
Number of threads to use for parallel processing.
- cvint
Number of cross-validation folds.
- ensemble_configdict
- Configuration for the ensemble setup, containing:
- classifierbool
Whether to train a classification model.
- regressorbool
Whether to train a regression model.
- m{n}str
Model names (e.g., “m1: ‘rf’”, “m2: ‘xgb’”).
n_jobs (int, optional, default=1) – Number of threads to use for parallel processing.
- path_out#
Path where predictions and model outputs are saved.
- Type:
str
- target#
Name of the target variable.
- Type:
str
- target_no_space#
Target variable name with spaces replaced by underscores.
- Type:
str
- verbose#
Verbosity level for logging.
- Type:
int
- n_jobs#
Number of parallel threads used for prediction and cross-validation.
- Type:
int
- make_prediction()#
Train the ensemble models and generate predictions, exporting them to NetCDF.
- make_prediction()#
Fit models in the ensemble and generate predictions.
Predictions are exported to NetCDF files. If the ensemble contains multiple models, predictions are made for each individual model and the ensemble.
- Return type:
None
Notes
Individual model predictions and ensemble predictions are saved separately.
Performance metrics (e.g., cross-validation scores) are saved for the ensemble.
Only regression models are supported; classification is not implemented.