A checklist dataset is a catch-all term describing any dataset that contains primarily a list of taxonomic names. The lines between a checklist dataset and an occurrence dataset can be blurry.

GBIF classifies at least 6 types of datasets as checklists.

National (or regional) lists of species example
Taxonomic list of species example
Species description example
Checklists made up of other checklists GBIF backbone taxonomy & Catalogue of Life
Checklists with occurrences example
Checklists made from occurrences example

The top two are probably what most people imagine when they think of a checklist dataset. GBIF has published a guide on best practices for making checklist datasets.

The GBIF Backbone Taxonomy is a checklist dataset of names GBIF uses in order to classify all the species used in the different datasets. The backbone taxonomy dataset is in fact a large group of other selected checklist datasets, including large parts of the Catalogue of Life.

The GBIF backbone taxonomy is a list of >2 million names used to match and group the records on GBIF.

Not every name in the GBIF backbone has occurrences and names with occurrences are not distributed equally. One way to tell if the GBIF network is collecting occurrences for at least the named taxonomic groups is by comparing the species names in the backbone versus the species names with occurrences.

How much of the GBIF backbone is covered by occurrences?

Here I plot the percentage of the GBIF backbone with at least 1 occurrence for popular groups.

_{common name dictionary: [csv](/post/2019-04-23-gbif-checklist-datasets-and-data-gaps_files/commonNameDictionary.csv)}

We see that the GBIF network has successfully collected occurrences for 94% of birds but only 51% of named insects.

Beetles, moths, butterflies, ants, and flies make up the majoriy of the missing insects. Each group having around only 50% of their named species with occurrences, with dragonflies (78%) not contributing greatly to the missing insects.

Flowering plants (Magnoliopsida) are a large and well-studied group of plants. The herbariums contributing to GBIF are no doubt responsible for the 86% coverage seen in this group.

Other small groups with fairly large animals (mammals, reptiles, amphibians) are all well covered.

Bacteria have 89% coverage but this must be due to the lack of named records within GBIF.

Using checklists to look for regional data gaps

To explore whether checklists datasets are useful for identifying regional data gaps, I have decided to look at checklists from two regions West Europe and West Africa. These regions represent areas where data coverage and checklist supply are different.

West Europe† checklisted species

The percentage of checklisted species with at least 1 occurrence for popular groups.

_{† Austria, Belgium, Denmark, Finland, France, Germany, Iceland, Ireland, Lithuania, Netherlands, Norway, Sweden, Switzerland}

West Africa† checklisted species

_{† Benin, Ghana, Liberia, Mali, Mauritania, Niger, Nigeria, Senegal, Togo}

Fewer checklisted species in West Africa

	West Europe	West Africa
Insects	53,000	800
Fungi	29,000	50
Flies	16,500	190
Spiders	1,600	5
Mammals	210	240
Birds	780	600
Dragonflies	108	160
Amphibians	95	60

Here we see that while the percentages seem quite good, the number of checklisted species in West Africa are in some cases orders of magnitude less than what is in West Europe. Although some West African groups (Mammals, Birds, Dragonflies) do have more checklisted species than West Europe.

Checklist supply is much lower in West Africa

West Europe has many more large national publishers than West Africa.

Most checklist species in West Africa are small checklists published by Plazi and Biodiversity Data Journal. Benin LFS publisher seems to be the only semi-large national publisher in West Africa. Most likely the number of species with occurrences is greater than the number of species in checklists in West Africa.