DAGI Georeferencing protocole

Context

Georeferencing is the enrichment of the geographical information of a collection specimen with modern coordinates and a margin of error. Nowadays, this can usually be done directly in the field with a GPS. However for most of older specimen, the geographical description is only textual (e.g. right side of the road between this place and this place) and often scarce (e.g. this country). Concepts have also evolved with time as well as country names and borders. On the other side, the development of standardised terms such as the Darwin Core terms implies strict rules and definitions for each data term. For instance, the value Pays de Vaud could never be written in the attribute country of Darwin Core.

In the process of database standardisation and cleaning, the use of controlled vocabulary is essential but often an important constraint compared to day-to-day habits of an institution, as well as the usual treatment of historical specimens. Countries that don’t exist anymore are still being recorded in the attribute dwc:country, while the recommendation of this term is to use the ISO 3166-1-alpha-2 country standard.

The Data Aggregator DAGI uses the OpenCage Geocoding API of OpenCage GmbH (https://opencagedata.com/) to encode and enrich specific attributes of the institution’s records. For the encoding to be successfull, the values of the source attributes must be following the definitions of the attributes, or else inconsistencies and “wrong” data might be inserted.

This protocole aims to provide the best practices in georeferencing natural history collection data, with the perspective of being compatible to the DAGI geographical encoding.

1) Preliminary assessment

Mermaid Decision Graph
graph TD; Start[📜📍 Specimen label Location data] --> |Step 1| Step1[1.1. Transcribe verbatim Location data]; Step1 --> |Step 2| Step2[2.1. Enrich standardized textual Location data]; %% Decision Graph in Step 3 Step2 --> |Step 3| HasCoord[Coordinates present?]; HasCoord[Coordinates present?] -->|no| GeoPoss[Georeferencing possible?]; HasCoord -->|yes| TransVerbCoo[Transcribe/convert/document/enrich]; GeoPoss --> |no| NoGeoref[Don't georeference]; GeoPoss -->|⬇️ click here ⬇️| Link3[3.1 Possible?]; GeoPoss --> |yes| GeorefPert[Pertinent?]; GeorefPert --> |no| NoGeoref; GeorefPert -->|⬇️ click here ⬇️| Link4[3.2 Pertinent?]; GeorefPert --> |yes| GeorefDoable[Do-able?]; GeorefDoable --> |no| NoGeoref; GeorefDoable --> |yes| Georef[Georeference/document/enrich]; TransVerbCoo -->|⬇️ click here ⬇️| Link1[1.2. Transcribe verbatim coordinates data]; Georef -->|⬇️ click here ⬇️| Link2[2.2. Enrich standardised coordinates data]; NoGeoref -->|⬇️ click here ⬇️| Link5[4. Do not georeference]; %% Define a "common style" for several blocks classDef bgColor fill: #FA5E97,stroke:#333,stroke-width:2px,rx:10px,ry:20px; %% Apply le style à tous les blocs qui doivent avoir la même couleur class Step1,Step2,Link1,Link2,Link3,Link4,Link5 bgColor; %% Apply Colors style Start fill:#FFFFFF,stroke:#333,stroke-width:4px,font-weight:bold,font-size:50px; style HasCoord fill:#ffcc00,stroke:#333,stroke-width:2px; style GeoPoss fill:#ff6666,stroke:#333,stroke-width:2px; style TransVerbCoo fill:#66ccff,stroke:#333,stroke-width:2px; style NoGeoref fill:#ff3333,stroke:#000,stroke-width:2px,color:white; style GeorefPert fill:#ff9900,stroke:#333,stroke-width:2px; style GeorefDoable fill:#66cc99,stroke:#333,stroke-width:2px; style Georef fill:#33cc33,stroke:#333,stroke-width:2px; %% Define the hyperlink click Link1 "/en/geo-protocole#12-transcribe-verbatim-coordinates-data" click Link2 "/en/geo-protocole#22-enrich-standardised-coordinates-data" click Link3 "/en/geo-protocole#31-possible" click Link4 "/en/geo-protocole#31-pertinent" click Link5 "/en/geo-protocole#4-do-not-georeference" click Step1 "/en/geo-protocole#11-transcribe-verbatim-location-data" click Step2 "/en/geo-protocole#21-enrich-standardised-textual-location-data"

1.1) Transcribe verbatim Location data

Transcribe the content of the specimen label concerning the location in the attribute verbatimLocality.

Document information about the method used or difficulty encountered in the attribute locationRemarks.


1.2) Transcribe verbatim coordinates data

Transcribe the coordinates present on the specimen label in the attribute verbatimCoordinates and associated attributes.

Convert the verbatim coordinates into their corresponding modern coordinate system in the coordinate attributes

Document information about the method used or difficulty encountered in the attribute georeferenceRemarks.


2) Standardized data entry

2.1) Enrich standardised textual Location data

Enrich the corresponding information based on the verbatimLocality (and the coordinates if present) in a standardised and controlled way.

Document information about the method used or difficulty encountered in the attribute locationRemarks.


2.2) Enrich standardised coordinates data

Enrich

Document


In-house practices among Swiss natural history collections

pseudo ISO Codes

In order to deal with unfitting historical locations, the in-house use of self-made codes help filtering data in databases before publishing them and/or sharing them outside of the institution. Here are some example, that would facilitate the collaboration between swiss institutions:

Pseudo-ISO codes Definition Example
AA Covering several countries Yugoslavia
ZZ Overlapping several countries Prussia
XX Unknown, Whole world empty

Notation: added elements in verbatim attributes

In order to remove ambiguity and make a distinction between original and added/interpreted elements, a clear symbole can be used, such as the square brackets [ ]

verbatimLocality Comment
[a word] This word was difficult to read on the label. The content of the square brackets is the interpretation of what may have been written.
[…] illegible word present
Pte Grave [La Petite Grave] Repetition of the verbatim text with disambiguation of the abbreviation.
O’stmdg [Ostermundigen] ibid.

3) Georeferencing approach

3.1) Possible

  • Does the georeferencing provide more information to the specimen that the already existing textual information?
  • Are the coordinates provided by georeferencing useful in practice when they are accompanied by an uncertainty of several hundred kilometres? (See example G00304771)

Recommendation
Only georeference ‘large’ historical localities (e.g. countries, regions) when georeferencing provides greater accuracy than the information provided in the ad hoc attributes (country, stateProvince).

3.2) Pertinent

Defining the pertinence of georeferencing historical localities is prior to any step of the procedure in terms of:

  • Human ressources
  • Time ressources
  • Basic data (readable, completeness, univocity)

3.3) Coordinate assignment

Cross-border entities

Recommendation
Fill in the relevant attributes as accurately as possible before proceeding with georeferencing. Do not submit the coordinates if you do not want any of the countries involved to be returned by the encoding.

Protocole Attribute Lakes Mountain massifs National parks Mountain ranges
Step 1.1
verbatim Location data
verbatimLocality Mont Blanc Lac Léman Parc national des Écrins Forêt Noire
Step 2.1
standardised textual Location data
continent Europe Europe Europe Europe
higherGeography Europe | Mont Blanc Europe | Lake Geneva Europe | France | Parc National des Ecrins Europe | Germany | Baden-Wuerttemberg | Schwarzwald
waterBody Lake Geneva
country France Germany
stateProvince Baden-Wuerttemberg
Step 2.2
standardised coordinates data
georeferenceRemarks Standardised coordinates data should be left empty or else the "official" administrative entity might be attributed during encoding.

3.4) DAGI Geographical encoding

Geographical encoding allows to standardise and enrich the geographical data associated to a specimen. In DAGI, the geographical encoding is composed of two parts. One is the Geo Forward (based on the imported information in the attributes country and continent) and the other one is the Geo Reverse (based on the imported information in the coordinates attributes (decimalLatitude and decimalLongitude, swissCoordinates_lv95_x and swissCoordinates_lv95_y, swissCoordinates_lv03_x and swissCoordinates_lv03_y).

Geo Forward and Geo Reverse encoding uses the API of the website OpenCage data. With the use of the attributes mentioned before, a query is built and the result elements are splitted into the corresponding encoded attribute.

In addition to these two parts, coordinates in one or two of the available swiss systems attributes are being converted into WGS84 (decimalLatitude and decimalLongitude) and the other available swiss system if it was previously empty.

Geo Forward

Attributes needed : continent OR country
Attributes encoded : continent, country, countryCode

Import examples Imported values Encoded values
Example 1 continent Europe continent Europe
country Suisse country Switzerland
countryCode CH
Example 2 country Schweiz
continent Europe
country Switzerland
countryCode CH

Geo Reverse

Attributes needed : decimalLatitude and decimalLongitude
Attributes encoded : continent, country, countryCode, stateProvince and municipality (if countryCode = CH, then also swissCoordinatesLv95_x, swissCoordinatesLv95_y, swissCoordinatesLv03_x and swissCoordinatesLv03_y

The first step of Geo Reverse is to the conversion of the coordinates in Switzerland, afterwards, the WGS84 coordinates are used in the query to the website OpenCage data.

Coordinates conversion

⚠️ When importing multiple coordinate systems for the same record, ensure that they all correspond to the same location. DAGI does not compare attributes to detect inconsistencies.

The coordinate attributes are ranked by priority :

  1. WGS84
  2. CH1903+/LV95
  3. CH1903/LV03

When importing coordinates, it is better practice to import one set of coordinates per specimen (in either of the three systems supported by DAGI) than doing the conversion before importing and importing multiple systems for the same record. The table here below illustrates what happens when importing 1 or 2 or 3 different locations among the coordinate terms available.

IMPORTED ENCODED Consequence
WGS84 LV95 LV03 WGS84 LV95 LV03
X 🟰 1 location ✔️
Y 🟰
Z 🟰
X Y 🟰 2 locations ⚠️️
X Z 🟰
Y Z 🟰
X Y Z 🟰 3 locations️ ⚠️⚠️
legend:
LV95 = CH1903+/LV95; LV03 = CH1903/LV03
coordinates X ≠ coordinates Y ≠ coordinates Z (these represent three different locations)
🟰 = original value used for conversion, ✅ = converted from 🟰 value, ❌ = unconverted original value

Risks with the geographical encoding of DAGI

Attributing coordinates to an unprecise location

As the whole set of textual information, from continent to municipality, is encoded through the coordinates without an uncertainty radius, it leads to erroneous data by attributing wrong values to smaller elements than the degree of precision of the specimen location.

What can be done to avoid this:

  • Compare the imported and encoded data in DAGI
  • Target and correct the errors of capture in the original database
  • Remove the coordinates of the records less precise or import them and publish them without encoding

Uncertainty estimation

Currently the geographical encoding in DAGI does not take the uncertainty radius in consideration

4) Do not georeference

In cases where georeferencing does not provide the plus-value desired, there are solutions to express it in the data, specifically in the countryCode attribute, and leaving all other lower textual location attributes and coordinates attributes empty:

  • Supranational entity (precision grouping several countries) : countryCode = AA
  • International entity (precision overlapping several countries) : countryCode = ZZ
  • Unknown location : countryCode = XX

Documentation

Reference protocols

Historical maps and reference material

Online coordinates conversion

Helpful files

Source for this page

  • Tschudin P., Burckhardt D., Monnerat C., Sanchez A., Burri F., Jutzi M. & Gonseth Y. 2014. Recommandations pour la saisie de données de spécimens en collections, Ver. 2.0. Neuchâtel : GBIF Swiss Node, 12 pp. (available upon request)

Authors: Collaboration between the CJBG, Naturéum, Info fauna and GBIF.ch