GBIF gets records from around 1600 (and counting) occurrence datasets. Not all of the data-providers have the time or resources to make sure their datasets are free from errors.

As a service to the biodiversity user community, GBIF flags records that might contain an obvious issue. Some flags are not easy to understand, and most users are probably not aware that they exist or filter them all out by default by leaving the “include records where coordinates are flagged as suspcious” unchecked. For most projects, leaving this box unchecked is probably the right decision.

For other projects (lacking data), it might make sense to still include some data points with suspicious values. But which flags might be worth keeping? Hopefully, knowing a little bit more about the data quality flags will help you decide which flags might be worth keeping. That is what this blog post is about.

dog

issues vs flags

type count 1 GEODETIC_DATUM_ASSUMED_WGS84 2212 2 COORDINATE_ROUNDED 1401 3 COUNTRY_DERIVED_FROM_COORDINATES 924 4 TAXON_MATCH_HIGHERRANK 913 5 RECORDED_DATE_INVALID 858 6 TAXON_MATCH_FUZZY 808 7 IDENTIFIED_DATE_UNLIKELY 569 8 RECORDED_DATE_MISMATCH 562 9 COORDINATE_UNCERTAINTY_METERS_INVALID 557 10 INTERPRETATION_ERROR 544 11 GEODETIC_DATUM_INVALID 475 12 COUNTRY_INVALID 453 13 ELEVATION_MIN_MAX_SWAPPED 441 14 BASIS_OF_RECORD_INVALID 394 15 PRESUMED_NEGATED_LATITUDE 391 16 PRESUMED_NEGATED_LONGITUDE 389 17 TAXON_MATCH_NONE 345 18 COORDINATE_REPROJECTED 290 19 RECORDED_DATE_UNLIKELY 290 20 DEPTH_MIN_MAX_SWAPPED 285 21 MULTIMEDIA_URI_INVALID 276 22 COORDINATE_PRECISION_INVALID 269 23 DEPTH_NON_NUMERIC 236 24 INDIVIDUAL_COUNT_INVALID 199 25 ELEVATION_NON_NUMERIC 194 26 PRESUMED_SWAPPED_COORDINATE 179 27 REFERENCES_URI_INVALID 124 28 DEPTH_UNLIKELY 109 29 MODIFIED_DATE_UNLIKELY 92 30 ELEVATION_NOT_METRIC 53 31 DEPTH_NOT_METRIC 49 32 COUNTRY_MISMATCH 43 33 MULTIMEDIA_DATE_INVALID 16 34 COORDINATE_REPROJECTION_SUSPICIOUS 14 35 COORDINATE_REPROJECTION_FAILED 10

There are two types of flags. But somewhat confusingly not all flags that appear on the list about are excluded when one leaves the “include records where coordinates are flagged as suspcious” unchecked.

[“COORDINATE_ROUNDED”,“GEODETIC_DATUM_INVALID”,“GEODETIC_DATUM_ASSUMED_WGS84”,“COORDINATE_UNCERTAINTY_METERS_INVALID”,“TAXON_MATCH_NONE”,“INDIVIDUAL_COUNT_INVALID”] [“COORDINATE_ROUNDED”,“GEODETIC_DATUM_ASSUMED_WGS84”,“COUNTRY_INVALID”,“COUNTRY_DERIVED_FROM_COORDINATES”,“RECORDED_DATE_INVALID”,“TAXON_MATCH_FUZZY”

Data quality flags

Coordinates Rounded

GBIF has rounded the coordinates because the original has too many decimal places.

Since 6 decimal places (0.000001) has an accuracy of around 50-100 mm, and modern GPS accuracy is around 5 meters, it makes sense for GBIF to round any coordinates with more than ? decimal places.

Zero Coordinate

The occurrence record is located at (0.00°,0.00°).

A zero only in both the decimallatitude and decimallongitude column will trigger this flag. A coordinate does not have to be exactly 0 to trigger the flag. A record with both lat-lon very close to zero will also trigger the flag. A point at (0.002383137,-0.003287347) will trigger the flag, since 0.001° is around 100 meters.

Geodetic datum assumed WGS84

The projection of the lat-lon points (the geodetic datum) was not filled in. WGS84 is the standard projection for GPS and other applications. GBIF has assumed the lat-lon points are WGS84.

A geodetic datum is a coordinate system used to locate places on the Earth. There are projections other than WGS84. Here is a table of some commonly used

ED50

AGD84, GDA94

  geodeticdatum
count
           WGS84
            null
       EPSG:4326
           GDA94
 EPSG:4326 WGS84
        AGD84/66
     WGS84/GDA94
not recorded (for… EPSG:4326 WGS 84 ED50 EPSG:6326 unknown GDA NO DISPONIBLE not recorded NAD27 World Geodetic Sy… North American Da… AGD84 AGD66
521141650 372089969 68553025 18287348 7560832 4316640 3526792 3026393 2742297 1673518 1542555 1335579 1264836 1045417 1013757 934040 858413 727007 534959 528809

Below I plot all the zero-coordinate flagged records in GBIF. We can see not all flagged records are exactly zero. Orange dots represent the GBIF maps view (which are rounded)