GBIF Issues & Flags
Publishers share datasets, but also manage data quality. GBIF provides access to the use of biodiversity data, but also flags suspicious or missing content. Users use data, but also clean and remove records. Each play an important role in managing and improving data quality..
What are GBIF issues and flags?
The GBIF network publishes datasets, integrating them into a common access system. Here users can retrieve data through common search and download services. During the indexation process over the raw data, GBIF adds issues and flags to records with common data quality problems.
Excluding all records with a particular issue is not currently possible with the search interface. It is possible to filter all records you are not interested in with issues by selecting the particular issue and hitting the reverse button. However, reversing will still only give you all other flagged occurrences and not issue-free records. This is something that GBIF is working to improve. (at occurrence search)
Remarks are shown on the individual occurrence pages to explain the process done after interpretation:
- Excluded means the original data couldn’t be interpreted, so is excluded in the interpreted fields.
- Altered means the original data is modified in the interpretation process to be indexed in GBIF.org.
- Inferred means the Using other record information the data indexed is inferred, if the original is empty.
The following table highlights some common geospatial issues and instructs how to fix them.
Issue & flag | Action to take |
---|---|
Country derived from coordinates | Fill in the columns countryCode and country with the country information where the record was registered, following the officially ISO 3166-1-alpha-2 country code |
Recorded date invalid | Use existing valid dates in the columns eventDate, year, month, day, following the format ISO 8601-1:2019 (YYYY-MM-DD) |
Basis of record invalid | Use a valid Basis of Record in the column basisOfRecord, according to the nature of the record. Follow the controlled vocabulary present on this list https://rs.gbif.org/vocabulary/dwc/basis_of_record.xml |
Country coordinate mismatch | Make sure coordinates (decimalLatitudedecimalLongitude), fall inside the indicated country (i.e. country countryCode). The country and countryCode must match and be documented following the officially ISO 3166-1-alpha-2. |
Zero coordinate | Leave decimalLatitude and decimalLongitude blank if the coordinates are missing. Don't use "0" as a coordinate value in both columns unless your record be present there (Null Island https://en.wikipedia.org/wiki/Null_Island). |
Coordinate invalid | Make sure coordinates are valid numeric decimal values. decimalLatitude: legal values lie between -90 and 90, inclusive. and decimalLongitude: legal values lie between -180 and 180, inclusive. Also, verbatimCoordinates have to be valid values for coordinates in decimal degrees, degrees decimal minutes, degrees minutes second |
Definitions
More than 50 issues and flags have been created to deal with common data quality problems. The following long section compiles all of them and offers a more clear description of each one. This section is intended to serve as a placeholder until more formal documentation can be written.
Geospatial Issues
Zero coordinate (geospatial) example
Coordinates are exactly 0/0, often indicating an actual null coordinate.
Terms: dwc:decimalLatitude, dwc:decimalLongitude
Country coordinate mismatch (geospatial) example
The interpreted occurrence coordinates fall outside of the indicated country.
Terms: dwc:countryCode, dwc:country, dwc:decimalLatitude, dwc:decimalLongitude
Coordinate invalid (geospatial) example
A coordinate value is given in some form, but GBIF is unable to interpret it. Possible reasons include, i.a., coordinates that fall out of range (larger/lower than 90/-90 or 180/-180, depending) or text values that cannot be interpreted.
Terms: dwc:decimalLatitude, dwc:decimalLongitude, dwc:verbatimCoordinates, dwc:verbatimLatitude, dwc:verbatimLongitude
Coordinate out of range (geospatial) example
The supplied coordinates lie outside of the range for decimal lat/lon values (-90⁄90, -180⁄180).
Terms: dwc:decimalLatitude, dwc:decimalLongitude, dwc:verbatimCoordinates, dwc:verbatimLatitude, dwc:verbatimLongitude
These 4 issues are removed by default when including coordinates and not clicking the check box:
Geodetic datum assumed WGS84 (geospatial) example
If the datum is null, data interpretation assumes the record coordinates are in WGS84.
Terms: dwc:geodeticDatum
Geodetic datum invalid (geospatial) example
The geodetic datum could not be interpreted, because the supplied term cannot be matched against the vocabulary of known values.
Terms: dwc:geodeticDatum
Country mismatch (geospatial) example
Interpreted Country and Country code contradict each other.
Terms: dwc:countryCode, dwc:country
Country derived from coordinates (geospatial) example
If the country and country code are not supplied or cannot be matched to known values, data interpretation derives their content from the decimal coordinates through a lookup service.
Terms: dwc:countryCode, dwc:country, dwc:decimalLatitude, dwc:decimalLongitude
Country invalid (geospatial) example
The country or countryCode given cannot be matched to the vocabulary for country names.
Terms: dwc:country
Continent invalid (geospatial) example
The continent given cannot be matched to the vocabulary for continent names
Terms: dwc:continent
Coordinate rounded (geospatial) example
In the data interpretation the original coordinates are rounded to 6 decimals (~1m precision).
Terms: dwc:decimalLatitude, dwc:decimalLongitude
Coordinate reprojected (geospatial) example
The original coordinates were successfully reprojected from a different geodetic datum to WGS84.
Terms: dwc:geodeticDatum
Coordinate reprojection suspicious (geospatial) example
Indicates successful coordinate reprojection according to provided datum, but which results in a datum shift larger than 0.1 decimal degrees.
Terms: dwc:geodeticDatum, dwc:decimalLatitude, dwc:decimalLongitude
Coordinate reprojection failed (geospatial) example
The given decimal latitude and longitude could not be reprojected to WGS84 based on the provided datum.
Terms: dwc:geodeticDatum, dwc:decimalLatitude, dwc:decimalLongitude
Coordinate uncertainty meters invalid (geospatial) example
The value given for Coordinate uncertainty in meters, indicating the radius of uncertainty around the given decimal coordinates, is not a valid number, or lies outside a plausible range.
Terms: dwc:coordinateUncertaintyInMeters
Coordinate precision invalid (geospatial) example
Indicates an invalid or very unlikely coordinates precision. The value is not a decimal number as expected, or it has an unusually low or high for a margin of uncertainty.
Terms: dwc:coordinatePrecision
Presumed negated longitude (geospatial) example
The supplied longitude value places the coordinates outside of the indicated country. Negating the longitude value would result in a country match.
Terms: dwc:decimalLongitude
Presumed negated latitude (geospatial) example
The supplied latitude value places the coordinates outside of the indicated country. Negating the latitude value would result in a country match.
Terms: dwc:decimalLatitude
Presumed swapped coordinate (geospatial) example
Coordinates seem to be swapped when testing against the interpreted country.
Terms: dwc:decimalLatitude, dwc:decimalLongitude, dwc:country
Depth min max swapped (geospatial) example
The values for minimum and maximum depth appear to the swapped.
Terms: dwc:minimumDepthInMeters, dwc:maximumDepthInMeters
Depth non numeric (geospatial) example
The values for minimum and maximum depth are non-numeric values and cannot be interpreted.
Terms: dwc:minimumDepthInMeters, dwc:maximumDepthInMeters
Depth unlikely (geospatial) example
The values for minimum and maximum depth are negative or higher than 11000 (Mariana Trench depth in meters).
Terms: dwc:minimumDepthInMeters, dwc:maximumDepthInMeters
Depth not metric (geospatial) example
Set if supplied depth is not given in the metric system, for example using feet instead of meters.
Terms: dwc:minimumDepthInMeters, dwc:maximumDepthInMeters
Elevation non numeric (geospatial) example
The values for minimum and maximum elevation are non-numeric values and cannot be interpreted.
Terms: dwc:minimumElevationInMeters, dwc:maximumElevationMeters
Elevation min max swapped (geospatial) example
The values for minimum and maximum elevation appear to the swapped.
Terms: dwc:minimumElevationInMeters, dwc:maximumElevationInMeters
Elevation not metric (geospatial) example
Set if supplied elevation is not given in the metric system, for example using feet instead of meters.
Terms: dwc:minimumElevationInMeters, dwc:maximumElevationInMeters
Zero occurrence records are flagged with the following geospatial issues on GBIF as of the writing of this post.
Elevation unlikely (geospatial) example
The values for minimum and maximum elevation are above the troposphere (17000 m) or below Mariana Trench (11000 m).
Terms: dwc:minimumElevationInMeters, dwc:maximumElevationInMeters
Continent country mismatch (geospatial) example
The interpreted continent and country do not match up.
Terms: dwc:continent, dwc:countryCode, dwc:country
Continent derived from coordinates (geospatial) example
If no value is supplied for the continent or if the values cannot be matched against a known vocabulary, data interpretation derives the continent from the decimal coordinates.
Terms: dwc:continent, dwc:decimalLatitude, dwc:decimal Longitude
Taxonomic Issues
Taxon match higherrank (taxonomic) example
The record can be matched to the GBIF taxonomic backbone at a higher rank, but not with the scientific name given.
Terms: dwc:scientificName,dwc:kingdom,dwc:phylum, dwc:class, dwc:order, dwc:family, dwc:genus, dwc:subgenus, dwc:specificEpithet, dwc:infraspecificEpithet, dwc:taxonRank
Reasons include:
- The name is new, and not available in the taxonomic datasets yet
- The name is missing in the backbone’s taxonomic sources for others reasons
- Formatting or spelling of the scientific name caused interpretation errors
Taxon match none (taxonomic) example
Matching to the taxonomic backbone cannot be done cause there was no match at all or several matches with too little information to keep them apart (homonyms).
Terms: dwc:scientificName,dwc:kingdom,dwc:phylum, dwc:class, dwc:order, dwc:family, dwc:genus, dwc:subgenus, dwc:specificEpithet, dwc:infraspecificEpithet, dwc:taxonRank
Taxon match fuzzy (taxonomic) example
Matching to the taxonomic backbone can only be done using a fuzzy, non exact match.
Terms: dwc:scientificName,dwc:kingdom,dwc:phylum, dwc:class, dwc:order, dwc:family, dwc:genus, dwc:subgenus, dwc:specificEpithet, dwc:infraspecificEpithet, dwc:taxonRank
Date Issues
Recorded date invalid (date) example
The recording date given cannot be intrepreted because is invalid.
Terms: dwc:eventDate, dwc:year, dwc:month, dwc:day
Reasons include:
- A non-existing date (e.g “1995-04-34”)
- Missing date parts (e.g. Event date without year).
- The date format does not follow the ISO 8601 standard (YYYY-MM-DD)
Recorded date mismatch (date) example
The recording date specified as the eventDate string and the individual year, month, day are contradicting.
Terms: dwc:eventDate, dwc:year, dwc:month, dwc:day
Identified date unlikely (date) example
The identification date is in the future or before Linnean times (1700).
Terms: dwc:dateIdentified
Recorded Date Unlikely (date) example
The recording date is highly unlikely, falling either into the future or representing a very old date before 1600 that predates modern taxonomy.
Terms: dwc:eventDate, dwc:year, dwc:month, dwc:day
Multimedia date invalid (date) example
The creation date given cannot be intrepreted because is invalid.
Terms: dc:created
Reasons include:
- A non-existing date (e.g “1995-04-34”)
- Missing date parts (e.g. Event date without year).
- The date format does not follow the ISO 8601 standard (YYYY-MM-DD)
Identified date invalid (date) example
The identification date given cannot be intrepreted because is invalid.
Terms: dwc:dateIdentified
Reasons include:
- A non-existing date (e.g “1995-04-34”)
- Missing date parts (e.g. without year).
- The date format does not follow the ISO 8601 standard (YYYY-MM-DD)
Modified date invalid (date) example
A (partial) invalid modified date is given.
Terms: dc:modified
Reasons include:
- A non-existing date (e.g “1995-04-34”)
- Missing date parts (e.g. without year).
- The date format does not follow the ISO 8601 standard (YYYY-MM-DD)
Modified date unlikely (date) example
The modified date given is in the future or predates unix time (1970).
Terms: dc:modified
Georeferenced date invalid (date) example
The georeference date given cannot be intrepreted because it is invalid.
Terms: dwc:georeferencedDate
Reasons include:
- A non-existing date (e.g “1995-04-34”).
- Missing date parts (e.g. without year).
- The date format does not follow the ISO 8601 standard (YYYY-MM-DD)
Georeferenced date unlikely (date) example
The georeference date given is in the future or before Linnean times (1700).
Terms: dwc:georeferencedDate
Vocabulary Issues
Basis of record invalid (vocabulary) example
The given basis of record is impossible to interpret or very different from the recommended vocabulary: http://rs.gbif.org/vocabulary/dwc/basis_of_record.xml
Terms: dwc:basisOfRecord
Type status invalid (vocabulary) example
The given type status is impossible to interpret or very different from the recommended vocabulary: https://rs.gbif.org/vocabulary/gbif/type_status.xml
Terms: dwc:typeStatus
Occurrence status unparsable (vocabulary) example
The given occurenceStatus value cannot be interpreted; it does not match any of the known (vocabulary) values that indicate the presence or absence of a species at collection or observation event.
Terms: dwc:occurrenceStatus
Other Issues
Individual count invalid (individual count) example
Individual count value not parsable into a positive integer.
Terms: dwc:individualCount
Individual count conflicts with occurrence status (individual count) example
The values given for the individual count and for the status of the occurrence (present/absent) contradict each other (e.g. the count is 0 but the status says “present”).
Terms: dwc:individualCount, dwc:occurrenceStatus
Occurrence status inferred from individual count (occurrence status) example
The present/absent status of the occurrence was inferred from the individual count value because no status value was supplied explicitly. An individual count of 0 is interpreted as status=“absent”, a value > 0 as “present”
Terms: dwc:individualCount, dwc:occurrenceStatus
References URI invalid (uri) example
The references URL cannot be resolved, and may be malformed or contain invalid characters. If there is more than one URL, the values have to be separated by a pipe symbol “|”.
Terms: dc:references
Multimedia URI invalid (uri) example
The multimedia URL cannot be resolved, and may be malformed or contain invalid characters. If there is more than one URL, the values have to be separated by a pipe symbol “|”.
Terms: dwc:associatedMedia
Interpretation error (interpretation) example
An error occurred during interpretation, leaving the record interpretation incomplete.
Terms: GBIF interpretation