Research Section

Ingestion methodology

Wildlife rehabilitation centers record their patients in incompatible spreadsheets — different column names, date formats, species conventions, and outcome vocabularies. This page documents the conceptual pipeline by which such files could be normalized into the WildlifeStats schema, and demonstrates it on a set of sample files. It is a methodology demonstration, not a live service: there is no upload, and no file you provide is processed.

  1. Schema inference. Detect the delimiter and read each file's header row to discover its columns, without assuming a fixed layout.
  2. Field mapping. Map each source column to a canonical WildlifeStats field (source identifier, species, admission date, location, state, admission reason, outcome) using a candidate-name dictionary.
  3. Species name normalization. Resolve free-text common names ("Red-tailed Hawk", "Mexican free-tailed bat") to the dataset's archetype vocabulary and infer the taxonomic class.
  4. Date and outcome standardization. Convert heterogeneous date formats to a single year-month grain and map outcome phrases ("died in care", "released to wild") to the canonical outcome set.
  5. K-suppression on aggregation. When normalized records are aggregated and published, any group below the suppression threshold is withheld, as on the rest of this site.

Worked examples

Each sample below is a real committed file with a different schema. The left column is the raw file; the right column is the normalized output the pipeline produces in your browser.

Loading the sample files…