Publishers share datasets, but also manage data quality. GBIF provides access to the use of biodiversity data, but also flags suspicious or missing content. Users use data, but also clean and remove records. Each play an important role in managing and improving data quality..

What are GBIF issues and flags?

The GBIF network publishes datasets, integrating them into a common access system. Here users can retrieve data through common search and download services. During the indexation process over the raw data, GBIF adds issues and flags to records with common data quality problems.

Excluding all records with a particular issue is not currently possible with the search interface. It is possible to filter all records you are not interested in with issues by selecting the particular issue and hitting the reverse button. However, reversing will still only give you all other flagged occurrences and not issue-free records. This is something that GBIF is working to improve. (at occurrence search)

Remarks are shown on the individual occurrence pages to explain the process done after interpretation:

  1. Excluded means the original data couldn’t be interpreted, so is excluded in the interpreted fields.
  2. Altered means the original data is modified in the interpretation process to be indexed in GBIF.org.
  3. Inferred means the Using other record information the data indexed is inferred, if the original is empty.

The following table highlights some common geospatial issues and instructs how to fix them.

Issue & flag Action to take
Country derived from coordinates Fill in the columns countryCode and country with the country information where the record was registered, following the officially ISO 3166-1-alpha-2 country code
Recorded date invalid Use existing valid dates in the columns eventDate, year, month, day, following the format ISO 8601-1:2019 (YYYY-MM-DD)
Basis of record invalid Use a valid Basis of Record in the column basisOfRecord, according to the nature of the record. Follow the controlled vocabulary present on this list https://rs.gbif.org/vocabulary/dwc/basis_of_record.xml
Country coordinate mismatch Make sure coordinates (decimalLatitudedecimalLongitude), fall inside the indicated country (i.e. country countryCode). The country and countryCode must match and be documented following the officially ISO 3166-1-alpha-2.
Zero coordinate Leave decimalLatitude and decimalLongitude blank if the coordinates are missing. Don't use "0" as a coordinate value in both columns unless your record be present there (Null Island https://en.wikipedia.org/wiki/Null_Island).
Coordinate invalid Make sure coordinates are valid numeric decimal values. decimalLatitude: legal values lie between -90 and 90, inclusive. and decimalLongitude: legal values lie between -180 and 180, inclusive. Also, verbatimCoordinates have to be valid values for coordinates in decimal degrees, degrees decimal minutes, degrees minutes second



Definitions

More than 50 issues and flags have been created to deal with common data quality problems. The following long section compiles all of them and offers a more clear description of each one. This section is intended to serve as a placeholder until more formal documentation can be written.

Geospatial Issues


Zero coordinate (geospatial) example
Coordinates are exactly 0/0, often indicating an actual null coordinate.
Terms: dwc:decimalLatitude, dwc:decimalLongitude

Country coordinate mismatch (geospatial) example
The interpreted occurrence coordinates fall outside of the indicated country.
Terms: dwc:countryCode, dwc:country, dwc:decimalLatitude, dwc:decimalLongitude

Coordinate invalid (geospatial) example
A coordinate value is given in some form, but GBIF is unable to interpret it. Possible reasons include, i.a., coordinates that fall out of range (larger/lower than 90/-90 or 180/-180, depending) or text values that cannot be interpreted.
Terms: dwc:decimalLatitude, dwc:decimalLongitude, dwc:verbatimCoordinates, dwc:verbatimLatitude, dwc:verbatimLongitude

Coordinate out of range (geospatial) example
The supplied coordinates lie outside of the range for decimal lat/lon values (-9090, -180180).
Terms: dwc:decimalLatitude, dwc:decimalLongitude, dwc:verbatimCoordinates, dwc:verbatimLatitude, dwc:verbatimLongitude

These 4 issues are removed by default when including coordinates and not clicking the check box:

HTML tutorial


Geodetic datum assumed WGS84 (geospatial) example
If the datum is null, data interpretation assumes the record coordinates are in WGS84.
Terms: dwc:geodeticDatum

Geodetic datum invalid (geospatial) example
The geodetic datum could not be interpreted, because the supplied term cannot be matched against the vocabulary of known values.
Terms: dwc:geodeticDatum

Country mismatch (geospatial) example
Interpreted Country and Country code contradict each other.
Terms: dwc:countryCode, dwc:country

Country derived from coordinates (geospatial) example
If the country and country code are not supplied or cannot be matched to known values, data interpretation derives their content from the decimal coordinates through a lookup service.
Terms: dwc:countryCode, dwc:country, dwc:decimalLatitude, dwc:decimalLongitude

Country invalid (geospatial) example
The country or countryCode given cannot be matched to the vocabulary for country names.
Terms: dwc:country

Continent invalid (geospatial) example
The continent given cannot be matched to the vocabulary for continent names
Terms: dwc:continent

Coordinate rounded (geospatial) example
In the data interpretation the original coordinates are rounded to 6 decimals (~1m precision).
Terms: dwc:decimalLatitude, dwc:decimalLongitude

Coordinate reprojected (geospatial) example
The original coordinates were successfully reprojected from a different geodetic datum to WGS84.
Terms: dwc:geodeticDatum

Coordinate reprojection suspicious (geospatial) example
Indicates successful coordinate reprojection according to provided datum, but which results in a datum shift larger than 0.1 decimal degrees.
Terms: dwc:geodeticDatum, dwc:decimalLatitude, dwc:decimalLongitude

Coordinate reprojection failed (geospatial) example
The given decimal latitude and longitude could not be reprojected to WGS84 based on the provided datum.
Terms: dwc:geodeticDatum, dwc:decimalLatitude, dwc:decimalLongitude

Coordinate uncertainty meters invalid (geospatial) example
The value given for Coordinate uncertainty in meters, indicating the radius of uncertainty around the given decimal coordinates, is not a valid number, or lies outside a plausible range.
Terms: dwc:coordinateUncertaintyInMeters

Coordinate precision invalid (geospatial) example
Indicates an invalid or very unlikely coordinates precision. The value is not a decimal number as expected, or it has an unusually low or high for a margin of uncertainty.
Terms: dwc:coordinatePrecision

Presumed negated longitude (geospatial) example
The supplied longitude value places the coordinates outside of the indicated country. Negating the longitude value would result in a country match.
Terms: dwc:decimalLongitude

Presumed negated latitude (geospatial) example
The supplied latitude value places the coordinates outside of the indicated country. Negating the latitude value would result in a country match.
Terms: dwc:decimalLatitude

Presumed swapped coordinate (geospatial) example
Coordinates seem to be swapped when testing against the interpreted country.
Terms: dwc:decimalLatitude, dwc:decimalLongitude, dwc:country

Depth min max swapped (geospatial) example
The values for minimum and maximum depth appear to the swapped.
Terms: dwc:minimumDepthInMeters, dwc:maximumDepthInMeters

Depth non numeric (geospatial) example
The values for minimum and maximum depth are non-numeric values and cannot be interpreted.
Terms: dwc:minimumDepthInMeters, dwc:maximumDepthInMeters

Depth unlikely (geospatial) example
The values for minimum and maximum depth are negative or higher than 11000 (Mariana Trench depth in meters).
Terms: dwc:minimumDepthInMeters, dwc:maximumDepthInMeters

Depth not metric (geospatial) example
Set if supplied depth is not given in the metric system, for example using feet instead of meters.
Terms: dwc:minimumDepthInMeters, dwc:maximumDepthInMeters

Elevation non numeric (geospatial) example
The values for minimum and maximum elevation are non-numeric values and cannot be interpreted.
Terms: dwc:minimumElevationInMeters, dwc:maximumElevationMeters

Elevation min max swapped (geospatial) example
The values for minimum and maximum elevation appear to the swapped.
Terms: dwc:minimumElevationInMeters, dwc:maximumElevationInMeters

Elevation not metric (geospatial) example
Set if supplied elevation is not given in the metric system, for example using feet instead of meters.
Terms: dwc:minimumElevationInMeters, dwc:maximumElevationInMeters


Zero occurrence records are flagged with the following geospatial issues on GBIF as of the writing of this post.

Elevation unlikely (geospatial) example
The values for minimum and maximum elevation are above the troposphere (17000 m) or below Mariana Trench (11000 m).
Terms: dwc:minimumElevationInMeters, dwc:maximumElevationInMeters

Continent country mismatch (geospatial) example
The interpreted continent and country do not match up.
Terms: dwc:continent, dwc:countryCode, dwc:country

Continent derived from coordinates (geospatial) example
If no value is supplied for the continent or if the values cannot be matched against a known vocabulary, data interpretation derives the continent from the decimal coordinates.
Terms: dwc:continent, dwc:decimalLatitude, dwc:decimal Longitude


Taxonomic Issues


Taxon match higherrank (taxonomic) example
The record can be matched to the GBIF taxonomic backbone at a higher rank, but not with the scientific name given.
Terms: dwc:scientificName,dwc:kingdom,dwc:phylum, dwc:class, dwc:order, dwc:family, dwc:genus, dwc:subgenus, dwc:specificEpithet, dwc:infraspecificEpithet, dwc:taxonRank

Reasons include:
- The name is new, and not available in the taxonomic datasets yet
- The name is missing in the backbone’s taxonomic sources for others reasons
- Formatting or spelling of the scientific name caused interpretation errors

Taxon match none (taxonomic) example
Matching to the taxonomic backbone cannot be done cause there was no match at all or several matches with too little information to keep them apart (homonyms).
Terms: dwc:scientificName,dwc:kingdom,dwc:phylum, dwc:class, dwc:order, dwc:family, dwc:genus, dwc:subgenus, dwc:specificEpithet, dwc:infraspecificEpithet, dwc:taxonRank

Taxon match fuzzy (taxonomic) example
Matching to the taxonomic backbone can only be done using a fuzzy, non exact match.
Terms: dwc:scientificName,dwc:kingdom,dwc:phylum, dwc:class, dwc:order, dwc:family, dwc:genus, dwc:subgenus, dwc:specificEpithet, dwc:infraspecificEpithet, dwc:taxonRank


Date Issues


Recorded date invalid (date) example
The recording date given cannot be intrepreted because is invalid.
Terms: dwc:eventDate, dwc:year, dwc:month, dwc:day

Reasons include:
- A non-existing date (e.g “1995-04-34”)
- Missing date parts (e.g. Event date without year).
- The date format does not follow the ISO 8601 standard (YYYY-MM-DD)

Recorded date mismatch (date) example
The recording date specified as the eventDate string and the individual year, month, day are contradicting.
Terms: dwc:eventDate, dwc:year, dwc:month, dwc:day

Identified date unlikely (date) example
The identification date is in the future or before Linnean times (1700).
Terms: dwc:dateIdentified

Recorded Date Unlikely (date) example
The recording date is highly unlikely, falling either into the future or representing a very old date before 1600 that predates modern taxonomy.
Terms: dwc:eventDate, dwc:year, dwc:month, dwc:day

Multimedia date invalid (date) example
The creation date given cannot be intrepreted because is invalid.
Terms: dc:created

Reasons include:
- A non-existing date (e.g “1995-04-34”)
- Missing date parts (e.g. Event date without year).
- The date format does not follow the ISO 8601 standard (YYYY-MM-DD)

Identified date invalid (date) example
The identification date given cannot be intrepreted because is invalid.
Terms: dwc:dateIdentified

Reasons include:
- A non-existing date (e.g “1995-04-34”)
- Missing date parts (e.g. without year).
- The date format does not follow the ISO 8601 standard (YYYY-MM-DD)

Modified date invalid (date) example
A (partial) invalid modified date is given.
Terms: dc:modified

Reasons include:
- A non-existing date (e.g “1995-04-34”)
- Missing date parts (e.g. without year).
- The date format does not follow the ISO 8601 standard (YYYY-MM-DD)

Modified date unlikely (date) example
The modified date given is in the future or predates unix time (1970).
Terms: dc:modified

Georeferenced date invalid (date) example
The georeference date given cannot be intrepreted because it is invalid.
Terms: dwc:georeferencedDate

Reasons include:
- A non-existing date (e.g “1995-04-34”).
- Missing date parts (e.g. without year).
- The date format does not follow the ISO 8601 standard (YYYY-MM-DD)

Georeferenced date unlikely (date) example
The georeference date given is in the future or before Linnean times (1700).
Terms: dwc:georeferencedDate


Vocabulary Issues


Basis of record invalid (vocabulary) example
The given basis of record is impossible to interpret or very different from the recommended vocabulary: http://rs.gbif.org/vocabulary/dwc/basis_of_record.xml
Terms: dwc:basisOfRecord

Type status invalid (vocabulary) example
The given type status is impossible to interpret or very different from the recommended vocabulary: https://rs.gbif.org/vocabulary/gbif/type_status.xml
Terms: dwc:typeStatus

Occurrence status unparsable (vocabulary) example
The given occurenceStatus value cannot be interpreted; it does not match any of the known (vocabulary) values that indicate the presence or absence of a species at collection or observation event.
Terms: dwc:occurrenceStatus


Other Issues


Individual count invalid (individual count) example
Individual count value not parsable into a positive integer.
Terms: dwc:individualCount

Individual count conflicts with occurrence status (individual count) example
The values given for the individual count and for the status of the occurrence (present/absent) contradict each other (e.g. the count is 0 but the status says “present”).
Terms: dwc:individualCount, dwc:occurrenceStatus

Occurrence status inferred from individual count (occurrence status) example
The present/absent status of the occurrence was inferred from the individual count value because no status value was supplied explicitly. An individual count of 0 is interpreted as status=“absent”, a value > 0 as “present”
Terms: dwc:individualCount, dwc:occurrenceStatus

References URI invalid (uri) example
The references URL cannot be resolved, and may be malformed or contain invalid characters. If there is more than one URL, the values have to be separated by a pipe symbol “|”.
Terms: dc:references

Multimedia URI invalid (uri) example
The multimedia URL cannot be resolved, and may be malformed or contain invalid characters. If there is more than one URL, the values have to be separated by a pipe symbol “|”.
Terms: dwc:associatedMedia

Interpretation error (interpretation) example
An error occurred during interpretation, leaving the record interpretation incomplete.
Terms: GBIF interpretation