Pseudo-absence generation#
In this example we use a small synthetic dataset to show pseudo-absence generation in environmental space. Presences are restricted to warmer, lower-silicate conditions, and pseudo-absences are sampled outside the area of applicability.
Running the example#
Before running the Python script we need to import the required packages and define the synthetic dataset. The candidate rows span temperature from 0 to 25 and silicate from 0.1 to 3. Observed presences are then selected from rows with temperature above 5 and silicate below 1.
Loading dependencies#
import os
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from abil.pseudo_generation import generate_pseudo_absences
Generating synthetic data#
species = "synthetic_species"
env_vars = ["temperature", "silicate"]
rng = np.random.default_rng(42)
missing_rows = pd.DataFrame(
{
"temperature": rng.uniform(0, 25, 1200),
"silicate": rng.uniform(0.1, 3, 1200),
}
)
presence_pool = missing_rows[
(missing_rows["temperature"] > 5) & (missing_rows["silicate"] < 1)
]
merged_df = presence_pool.sample(n=150, random_state=42).copy()
merged_df[species] = 1
missing_rows = missing_rows.drop(index=merged_df.index)
Generating pseudo-absences#
Next we call generate_pseudo_absences.
With absence_ratio=1, the function targets one pseudo-absence for each observed presence.
If there are fewer candidate rows outside the area of applicability, rows are sampled with replacement by default.
augmented = generate_pseudo_absences(
merged_df,
missing_rows,
env_vars,
[species],
absence_ratio=1,
min_presence=50,
)
Plotting#
Now that we have pseudo-absences we can plot them in environmental space:
pseudo_absences = augmented[augmented[species] == 0]
fig, ax = plt.subplots(figsize=(7, 5))
ax.scatter(
missing_rows["temperature"],
missing_rows["silicate"],
alpha=0.15,
s=12,
label="Candidate rows",
)
ax.scatter(
merged_df["temperature"],
merged_df["silicate"],
marker="x",
s=28,
label="Observed presences",
)
ax.scatter(
pseudo_absences["temperature"],
pseudo_absences["silicate"],
alpha=0.85,
s=45,
label="Pseudo-absences",
)
ax.set_xlabel("temperature")
ax.set_ylabel("silicate")
ax.set_title("Pseudo-absences from synthetic data")
ax.legend()
fig.tight_layout()
plt.savefig("pseudo_absence.png", dpi=300, bbox_inches="tight", facecolor="white")
plt.show()