DAGI Georeferencing protocole
Context
Georeferencing is the enrichment of the geographical information of a collection specimen with modern coordinates and a margin of error. Nowadays, this can usually be done directly in the field with a GPS. However for most of older specimen, the geographical description is only textual (e.g. right side of the road between this place and this place) and often scarce (e.g. this country). Concepts have also evolved with time as well as country names and borders. On the other side, the development of standardised terms such as the Darwin Core terms implies strict rules and definitions for each data term. For instance, the value Pays de Vaud could never be written in the attribute country of Darwin Core.
In the process of database standardisation and cleaning, the use of controlled vocabulary is essential but often an important constraint compared to day-to-day habits of an institution, as well as the usual treatment of historical specimens. Countries that don’t exist anymore are still being recorded in the attribute dwc:country, while the recommendation of this term is to use the ISO 3166-1-alpha-2 country standard.
The Data Aggregator DAGI uses the OpenCage Geocoding API of OpenCage GmbH (https://opencagedata.com/) to encode and enrich specific attributes of the institution’s records. For the encoding to be successfull, the values of the source attributes must be following the definitions of the attributes, or else inconsistencies and “wrong” data might be inserted.
This protocole aims to provide the best practices in georeferencing natural history collection data, with the perspective of being compatible to the DAGI geographical encoding.
1) Preliminary assessment
1.1) Transcribe verbatim Location data
Transcribe the content of the specimen label concerning the location in the attribute verbatimLocality.
Document information about the method used or difficulty encountered in the attribute locationRemarks.
1.2) Transcribe verbatim coordinates data
Transcribe the coordinates present on the specimen label in the attribute verbatimCoordinates and associated attributes.
Convert the verbatim coordinates into their corresponding modern coordinate system in the coordinate attributes
Document information about the method used or difficulty encountered in the attribute georeferenceRemarks.
2) Standardized data entry
2.1) Enrich standardised textual Location data
Enrich the corresponding information based on the verbatimLocality (and the coordinates if present) in a standardised and controlled way.
Document information about the method used or difficulty encountered in the attribute locationRemarks.
2.2) Enrich standardised coordinates data
Enrich
- decimalLatitude
- decimalLongitude
- geodeticDatum
- swissCoordinatesLv95_E = swiss longitude coordinates of the CH1903+/LV95 system
- swissCoordinatesLv95_N = swiss latitude coordinates of the CH1903+/LV95 system
- swissCoordinatesLv03_E = swiss longitude coordinates of the CH1903/LV03 system
- swissCoordinatesLv03_N = swiss latitude coordinates of the CH1903/LV03 system
- coordinateUncertaintyInMeters
- coordinatePrecision
Document
In-house practices among Swiss natural history collections
pseudo ISO Codes
In order to deal with unfitting historical locations, the in-house use of self-made codes help filtering data in databases before publishing them and/or sharing them outside of the institution. Here are some example, that would facilitate the collaboration between swiss institutions:
Pseudo-ISO codes | Definition | Example |
---|---|---|
AA | Covering several countries | Yugoslavia |
ZZ | Overlapping several countries | Prussia |
XX | Unknown, Whole world | empty |
Notation: added elements in verbatim attributes
In order to remove ambiguity and make a distinction between original and added/interpreted elements, a clear symbole can be used, such as the square brackets [ ]
verbatimLocality | Comment |
---|---|
[a word] | This word was difficult to read on the label. The content of the square brackets is the interpretation of what may have been written. |
[…] | illegible word present |
Pte Grave [La Petite Grave] | Repetition of the verbatim text with disambiguation of the abbreviation. |
O’stmdg [Ostermundigen] | ibid. |
3) Georeferencing approach
3.1) Possible
- Does the georeferencing provide more information to the specimen that the already existing textual information?
- Are the coordinates provided by georeferencing useful in practice when they are accompanied by an uncertainty of several hundred kilometres? (See example G00304771)
Recommendation
Only georeference ‘large’ historical localities (e.g. countries, regions) when georeferencing provides greater accuracy than the information provided in the ad hoc attributes (country, stateProvince).
3.2) Pertinent
Defining the pertinence of georeferencing historical localities is prior to any step of the procedure in terms of:
- Human ressources
- Time ressources
- Basic data (readable, completeness, univocity)
3.3) Coordinate assignment
Cross-border entities
Recommendation
Fill in the relevant attributes as accurately as possible before proceeding with georeferencing. Do not submit the coordinates if you do not want any of the countries involved to be returned by the encoding.
Protocole | Attribute | Lakes | Mountain massifs | National parks | Mountain ranges |
---|---|---|---|---|---|
Step 1.1 verbatim Location data |
verbatimLocality | Mont Blanc | Lac Léman | Parc national des Écrins | Forêt Noire |
Step 2.1 standardised textual Location data |
continent | Europe | Europe | Europe | Europe |
higherGeography | Europe | Mont Blanc | Europe | Lake Geneva | Europe | France | Parc National des Ecrins | Europe | Germany | Baden-Wuerttemberg | Schwarzwald | |
waterBody | Lake Geneva | ||||
country | France | Germany | |||
stateProvince | Baden-Wuerttemberg | ||||
Step 2.2 standardised coordinates data |
georeferenceRemarks | Standardised coordinates data should be left empty or else the "official" administrative entity might be attributed during encoding. |
3.4) DAGI Geographical encoding
Geographical encoding allows to standardise and enrich the geographical data associated to a specimen. In DAGI, the geographical encoding is composed of two parts. One is the Geo Forward (based on the imported information in the attributes country and continent) and the other one is the Geo Reverse (based on the imported information in the coordinates attributes (decimalLatitude and decimalLongitude, swissCoordinates_lv95_x and swissCoordinates_lv95_y, swissCoordinates_lv03_x and swissCoordinates_lv03_y).
Geo Forward and Geo Reverse encoding uses the API of the website OpenCage data. With the use of the attributes mentioned before, a query is built and the result elements are splitted into the corresponding encoded attribute.
In addition to these two parts, coordinates in one or two of the available swiss systems attributes are being converted into WGS84 (decimalLatitude and decimalLongitude) and the other available swiss system if it was previously empty.
Geo Forward
Attributes needed : continent OR country
Attributes encoded : continent, country, countryCode
Import examples | Imported values | Encoded values | ||
---|---|---|---|---|
Example 1 | continent | Europe | continent | Europe |
country | Suisse | country | Switzerland | |
countryCode | CH | |||
Example 2 | country | Schweiz | ||
continent | Europe | |||
country | Switzerland | |||
countryCode | CH |
Geo Reverse
Attributes needed : decimalLatitude and decimalLongitude
Attributes encoded : continent, country, countryCode, stateProvince and municipality (if countryCode = CH, then also swissCoordinatesLv95_x, swissCoordinatesLv95_y, swissCoordinatesLv03_x and swissCoordinatesLv03_y
The first step of Geo Reverse is to the conversion of the coordinates in Switzerland, afterwards, the WGS84 coordinates are used in the query to the website OpenCage data.
Coordinates conversion
The coordinate attributes are ranked by priority :
- WGS84
- CH1903+/LV95
- CH1903/LV03
When importing coordinates, it is better practice to import one set of coordinates per specimen (in either of the three systems supported by DAGI) than doing the conversion before importing and importing multiple systems for the same record. The table here below illustrates what happens when importing 1 or 2 or 3 different locations among the coordinate terms available.
IMPORTED | ENCODED | Consequence | |||||
---|---|---|---|---|---|---|---|
WGS84 | LV95 | LV03 | WGS84 | LV95 | LV03 | ||
X | 🟰 | ✅ | ✅ | 1 location ✔️ | |||
Y | ✅ | 🟰 | ✅ | ||||
Z | ✅ | ✅ | 🟰 | ||||
X | Y | 🟰 | ❌ | ✅ | 2 locations ⚠️️ | ||
X | Z | 🟰 | ✅ | ❌ | |||
Y | Z | ✅ | 🟰 | ❌ | |||
X | Y | Z | 🟰 | ❌ | ❌ | 3 locations️ ⚠️⚠️ | |
legend: LV95 = CH1903+/LV95; LV03 = CH1903/LV03 coordinates X ≠ coordinates Y ≠ coordinates Z (these represent three different locations) 🟰 = original value used for conversion, ✅ = converted from 🟰 value, ❌ = unconverted original value |
Risks with the geographical encoding of DAGI
Attributing coordinates to an unprecise location
As the whole set of textual information, from continent to municipality, is encoded through the coordinates without an uncertainty radius, it leads to erroneous data by attributing wrong values to smaller elements than the degree of precision of the specimen location.
What can be done to avoid this:
- Compare the imported and encoded data in DAGI
- Target and correct the errors of capture in the original database
- Remove the coordinates of the records less precise or import them and publish them without encoding
Uncertainty estimation
Currently the geographical encoding in DAGI does not take the uncertainty radius in consideration
4) Do not georeference
In cases where georeferencing does not provide the plus-value desired, there are solutions to express it in the data, specifically in the countryCode attribute, and leaving all other lower textual location attributes and coordinates attributes empty:
- Supranational entity (precision grouping several countries) : countryCode = AA
- International entity (precision overlapping several countries) : countryCode = ZZ
- Unknown location : countryCode = XX
Documentation
Reference protocols
-
Chapman AD & Wieczorek JR (2020) Georeferencing Best Practices. Copenhagen: GBIF Secretariat. https://doi.org/10.15468/doc-gg7h-s853. Available online : https://docs.gbif.org/georeferencing-best-practices/1.0/en/georeferencing-best-practices.en.pdf
-
Zermoglio PF, Chapman AD, Wieczorek JR, Luna MC & Bloom DA (2020) Georeferencing Quick Reference Guide. Copenhagen: GBIF Secretariat. https://doi.org/10.35035/e09p-h128. Available online: https://docs.gbif.org/georeferencing-quick-reference-guide/1.0/en/georeferencing-quick-reference-guide.en.pdf
Historical maps and reference material
- map.geo.admin.ch
- the Getty Thesaurus of Geographic Names (TGN)
- Wikipedia et WikiData
- OpenStreetMap et Nominatim
- GeoNames.org
- Marine Regions
- World Historical Gazetteer
- Natural Earth Data
- World Geographical Scheme for Recording Plant Distributions (WGSRPD)
Online coordinates conversion
- NAVREF (Swisstopo) –> Swiss national coordinates systems (MN03, MN95), Global coordinates (WGS84 decimal, WGS84 DMS)
- The World Coordinate Converter (TWCC) –> all coordinates systems
Helpful files
Source for this page
- Tschudin P., Burckhardt D., Monnerat C., Sanchez A., Burri F., Jutzi M. & Gonseth Y. 2014. Recommandations pour la saisie de données de spécimens en collections, Ver. 2.0. Neuchâtel : GBIF Swiss Node, 12 pp. (available upon request)
Authors: Collaboration between the CJBG, Naturéum, Info fauna and GBIF.ch