Post#

class abil.post.post(X_train, y_train, X_predict, model_config, statistic='mean', datatype=None)#

Bases: object

Post processing of SDM

cwm(variable)#

Calculate community weighted mean values for a given parameter.

Parameters:: variable (string) – variable that is used to estimate cwm.

def_groups(dict)#

Define groups of species based on a provided dictionary.

Parameters:: dict (dict) – A dictionary where keys represent group names, and values are lists of species or column names to be grouped under each key.

Notes

The method renames columns in self.d based on the provided dictionary and then sums

their values to create grouped columns. - The resulting grouped data is concatenated to the original self.d.

diversity()#: Estimates Shannon diversity using scikit-bio.

estimate_applicability(targets=None)#

Estimate the area of applicability for the data using a strategy similar to Meyer & Pebesma 2022).

This calculates the importance-weighted feature distances from test to train points, and then defines the “applicable” test sites as those closer than some threshold distance.

A value of 0 indicates the point is within the Area of Applicability, while a value of 1 indicates the point is outside the Area of Applicability. Note: if using pseudo-absences in y_train and X_train, mask out where y_train = 0 to calculate the AOA for the original dataset.

Parameters:: targets (an np.array of str, optional) – An np.array of target variable names to include in the merge. If None, the default targets from self.targets are used (default is None).

estimate_carbon(variable)#

Estimate carbon content for each target based on a specified variable.

This method calculates the carbon content for each target by scaling the data in self.d with the values of the specified variable from the traits DataFrame. The results are stored back in self.d.

Parameters:: variable (str) – The name of the column in the traits DataFrame containing the carbon content values to be used for scaling the target data.

export_csv(file_name)#

Export the processed dataset to a csv file.

This method saves the processed dataset (self.d) to a csv file in the location defined by self.path_out, with optional metadata such as author and description.

Parameters:: file_name (str) – The name of the csv file (without extension).

Notes

The export location is defined in the model_config.yml file and is stored in self.path_out.
Missing directories in the export path are created if necessary.
The file is saved with a suffix that includes the pi value (e.g., _PI50.nc).

export_ds(file_name, author=None, description=None)#

Export the processed dataset to a NetCDF file.

This method saves the processed dataset (self.d) to a NetCDF file in the location defined by self.path_out, with optional metadata such as author and description.

Parameters:

file_name (str) – The name of the NetCDF file (without extension).
author (str, optional) – The name of the author to include in NetCDF metadata (default is None).
description (str, optional) – A description or title to include in the NetCDF metadata (default is None).

Notes

The export location is defined in the model_config.yml file and is stored in self.path_out.
The method sets metadata attributes such as conventions, creator name, and units for

latitude, longitude, and depth. - Missing directories in the export path are created if necessary. - The file is saved with a suffix that includes the pi value (e.g., _PI50.nc).

export_model_config()#

Export the model_config dictionary to a YAML file in self.path_out.

Raises:: Exception – If an error occurs during the directory creation or file writing process, an exception is caught and an error message is printed.

Notes

The YAML file is saved as “model_config.yml” in the self.path_out directory.

class integration(parent, resolution_lat=1.0, resolution_lon=1.0, depth_w=5, vol_conversion=1, magnitude_conversion=1, molar_mass=1, rate=False)#

Bases: object

calculate_volume()#

Calculate the volume for each cell and add it as a new field to the dataset.

Examples

>>> m = post(model_config)
>>> int = m.Integration(m, resolution_lat=1.0, resolution_lon=1.0, depth_w=5, vol_conversion=1, magnitude_conversion=1e-21, molar_mass=12.01, rate=True)
>>> print("Volume calculated:", int.ds['volume'].values)

integrate_total(variable='total', monthly=False, subset_depth=None)#

Estimates global integrated values for a single target. Returns the depth integrated annual total.

Parameters:

variable (str) – The field to be integrated. Default is ‘total’ from PIC or POC Abil output.
monthly (bool) – Whether or not to calculate a monthly average value instead of an annual total. Default is False.
subset_depth (float) – Depth in meters from surface to which integral should be calculated. Default is None. Ex. 100 for top 100m integral.

Examples

>>> m = post(model_config)
>>> int = m.Integration(m, resolution_lat=1.0, resolution_lon=1.0, depth_w=5, vol_conversion=1, magnitude_conversion=1e-21, molar_mass=12.01, rate=True)
>>> result = integration.integrate_total(variable='Calcification')
>>> print("Final integrated total:", result.values)

integrated_totals(targets=None, monthly=False, subset_depth=None, export=True, model='ens')#

Estimates global integrated values for all targets.

Considers latitude and depth bin size.

Parameters:

targets (an np.array of str, optional) – An np.array of target variable names to include in the merge. If None, the default targets from self.targets are used (default is None).
monthly (bool) – Whether or not to calculate a monthly average value instead of an annual total. Default is False.
subset_depth (float) – Depth in meters from surface to which integral should be calculated. Default is None. Ex. 100 for top 100m integral.
export (bool) – Whether of not to export integrated totals as .csv. Default is True.
model (str) – The model version to be integrated. Default is “ens”. Other options include {“rf”, “xgb”, “knn”}.

merge_env()#

Merge model output with environmental data.

This method aligns and merges the predicted values (model output) with the existing environmental dataset stored in self.d. The merged data replaces self.d.

Return type:: None

merge_obs(file_name, targets=None)#

Merge model output with observational data and calculate residuals.

This function integrates model predictions with observational data based on spatial and temporal indices, calculates residuals, and exports the merged dataset.

Parameters:

file_name (str) – The base name of the output file to save the merged dataset.
targets (an np.array of str, optional) – An np.array of target variable names to include in the merge. If None, the default targets from self.targets are used (default is None).

Notes

The function matches the observational data with model predictions based on the

indices [‘lat’, ‘lon’, ‘depth’, ‘time’]. - Residuals are calculated as observed - predicted for each target variable. - Columns included in the output are the original targets, their modeled values (suffixed with _mod), and their residuals (suffixed with _resid). - The merged dataset is saved as a CSV file with a suffix _PI followed by the pi value, appended to the output file name. - Observational data is loaded from the path defined in self.model_config[‘training’].

Raises:: FileNotFoundError – If the observational dataset file cannot be found at the specified location.

merge_parameters()#

Merges model parameters for multiple models as specified in the model configuration.

Notes

The method operates by iterating over each model in the ensemble configuration, collecting model parameters using merge_parameters_single_model, and saving the results to a CSV file in the “posts/parameters” directory.

merge_parameters_single_model(model)#

Merges and saves model parameters for a single model.

This method extracts the hyperparameters of a specified model (e.g., “rf”, “xgb”, “knn”) from serialized files stored as pickle objects. The method supports different model types, including regression (“reg”), classification (“clf”), and ensemble (“zir”) models. The extracted parameters are stored in a DataFrame and then saved to a CSV file.

The function also handles the creation of the necessary directories to save the resulting CSV file if they do not already exist.

Parameters:: model (str) – The name of the model for which parameters are being merged. Expected models include “rf” (Random Forest), “xgb” (XGBoost), and “knn” (K-Nearest Neighbors).
Raises:: ValueError – If the model configuration includes classifiers but not regressors, an error is raised since classifiers are not supported for parameter merging.

Notes

The method processes the parameters of each model for regression and ensemble models, extracting hyperparameters such as n_estimators, max_depth, learning_rate, and others. The parameters for each target are aggregated into a DataFrame and saved as a CSV file in the “posts/parameters” directory.

merge_performance()#

Merges the performance data of multiple models as specified in the model configuration.

Notes

The function relies on the merge_performance_single_model method to merge individual model performance data, and this is done for each model in the list, including the ensemble model.

merge_performance_single_model(model)#

Merges performance metrics for a single model and saves the results to a CSV file.

Parameters:: model (str) – The name of the model for which performance metrics are being calculated and merged. The model’s performance data is expected to be stored in a pickle file in the “scoring” directory under the model name and target name.
Return type:: None
Raises:: ValueError – If the model configuration includes a classifier but not a regressor, an error is raised since classifiers are not supported for performance merging.

Notes

The method calculates several performance metrics for each target column in the dataset:

R2: Coefficient of determination.
RMSE: Root Mean Squared Error.
MAE: Mean Absolute Error.
rRMSE: Relative Root Mean Squared Error.
rMAE: Relative Mean Absolute Error.

The performance metrics for each target are aggregated into a DataFrame, which is then saved as a CSV file in the “posts/performance” directory for the specified model.

process_resampled_runs()#

Take mean of target rows. Take the standard deviation of the target rows. Calculate the 2.5th and 97.5th percentiles of target rows.

Notes

Useful when running resampled targets of the same initial target. Mean is estimated based on the target list defined in model_config.

total()#

Sum target rows to estimate total.

Notes

Useful for estimating total species abundances or varable sum if targets are continuous. Total is estimated based on the target list defined in model_config.