GBIF checklist datasets and data gaps
A checklist dataset is a catch-all term describing any dataset that contains primarily a list of taxonomic names. The lines between a checklist dataset and an occurrence dataset can be blurry.
GBIF classifies at least 6 types of datasets as checklists.
- National (or regional) lists of species example
- Taxonomic list of species example
- Species description example
- Checklists made up of other checklists GBIF backbone taxonomy & Catalogue of Life
- Checklists with occurrences example
- Checklists made from occurrences example
The top two are probably what most people imagine when they think of a checklist dataset. GBIF has published a guide on best practices for making checklist datasets.
The GBIF Backbone Taxonomy is a checklist dataset of names GBIF uses in order to classify all the species used in the different datasets. The backbone taxonomy dataset is in fact a large group of other selected checklist datasets, including large parts of the Catalogue of Life.
The GBIF backbone taxonomy is a list of >2 million names used to match and group the records on GBIF.
Not every name in the GBIF backbone has occurrences and names with occurrences are not distributed equally. One way to tell if the GBIF network is collecting occurrences for at least the named taxonomic groups is by comparing the species names in the backbone versus the species names with occurrences.
How much of the GBIF backbone is covered by occurrences?
Here I plot the percentage of the GBIF backbone with at least 1 occurrence for popular groups.
We see that the GBIF network has successfully collected occurrences for 94% of birds but only 51% of named insects.
Beetles, moths, butterflies, ants, and flies make up the majoriy of the missing insects. Each group having around only 50% of their named species with occurrences, with dragonflies (78%) not contributing greatly to the missing insects.
Flowering plants (Magnoliopsida) are a large and well-studied group of plants. The herbariums contributing to GBIF are no doubt responsible for the 86% coverage seen in this group.
Other small groups with fairly large animals (mammals, reptiles, amphibians) are all well covered.
Bacteria have 89% coverage but this must be due to the lack of named records within GBIF.
Using checklists to look for regional data gaps
To explore whether checklists datasets are useful for identifying regional data gaps, I have decided to look at checklists from two regions West Europe and West Africa. These regions represent areas where data coverage and checklist supply are different.
West Europe† checklisted species
The percentage of checklisted species with at least 1 occurrence for popular groups.
West Africa† checklisted species
Fewer checklisted species in West Africa
West Europe | West Africa | |
---|---|---|
Insects | 53,000 | 800 |
Fungi | 29,000 | 50 |
Flies | 16,500 | 190 |
Spiders | 1,600 | 5 |
Mammals | 210 | 240 |
Birds | 780 | 600 |
Dragonflies | 108 | 160 |
Amphibians | 95 | 60 |
Here we see that while the percentages seem quite good, the number of checklisted species in West Africa are in some cases orders of magnitude less than what is in West Europe. Although some West African groups (Mammals, Birds, Dragonflies) do have more checklisted species than West Europe.
Checklist supply is much lower in West Africa
West Europe has many more large national publishers than West Africa.
Most checklist species in West Africa are small checklists published by Plazi and Biodiversity Data Journal. Benin LFS publisher seems to be the only semi-large national publisher in West Africa. Most likely the number of species with occurrences is greater than the number of species in checklists in West Africa.