Pseudo-absence generation#
- abil.pseudo_generation.generate_pseudo_absences(merged_df, missing_rows, env_vars, species_cols, absence_ratio=1, aoa_threshold=0.99, min_presence=100, allow_replacement=True)#
Generate pseudo-absences for each species at a specified ratio to presences.
Pseudo-absences are sampled from rows without observations that fall outside the area of applicability estimated from each species’ observed rows.
area_of_applicabilityuses0for inside AOA and1for outside AOA, so this function samples candidates where the AOA mask equals1.- Parameters:
merged_df (pandas.DataFrame) – Merged observation and environmental data.
missing_rows (pandas.DataFrame) – Environmental rows without observations that can be sampled as pseudo-absence candidates.
env_vars (list of str) – Environmental variable names. Coordinate variables named
time,depth,lat, andlonare excluded from the AOA feature set.species_cols (list of str) – Species column names.
absence_ratio (float, default=1) – Number of pseudo-absences to sample relative to the number of presences.
aoa_threshold (float or str, default=0.99) – Threshold passed to
abil.analyze.area_of_applicability().min_presence (int, default=100) – Minimum number of presence records required to generate pseudo-absences for a species.
allow_replacement (bool, default=True) – If True, sample outside-AOA candidate rows with replacement when there are fewer available candidate rows than requested by
absence_ratio. This allowsabsence_ratio=1to produce one pseudo-absence per presence even when outside-AOA candidates are scarce. If False, the number of pseudo-absences is capped by the number of unique outside-AOA candidate rows.
- Returns:
merged_dfplus sampled pseudo-absence rows. If no pseudo-absences can be generated, a copy ofmerged_dfis returned.- Return type:
pandas.DataFrame