link to interactive map Big 15-300K total names Medium 5-15K total names Small 0-5K total names Here I plot the total names in checklists published on GBIF linked to a single country. A checklist dataset is a term for any dataset that contains primarily a list of taxonomic names. National species checklists are lists of species recorded from a country usually through some organized effort. GBIF has published a guide on best practices for making national checklist datasets, which advises making national checklists as big as possible.
A checklist dataset is a catch-all term describing any dataset that contains primarily a list of taxonomic names. The lines between a checklist dataset and an occurrence dataset can be blurry. GBIF classifies at least 6 types of datasets as checklists. National (or regional) lists of species example Taxonomic list of species example Species description example Checklists made up of other checklists GBIF backbone taxonomy & Catalogue of Life Checklists with occurrences example Checklists made from occurrences example The top two are probably what most people imagine when they think of a checklist dataset.
As I mentioned in my previous post, a lot more sequence-based data has been made available on GBIF this past year. MGnify alone, published 295 datasets for a total of 13,285,109 occurrences. Even though most of these occurrences are Bacteria or Chromista, more than a million of them are animals and more than 300,000 are plants. So chances are, that even if you are not interested in bacteria, you might encounter sequence-based data on GBIF.
GBIF is trying to make it easier to share sequence-based data. In fact, this past year alone, we worked with UNITE to integrate species hypothesis for fungi and with EMBL-EBI to publish 295 metagenomics datasets. Unfortunately, documentation is not as quick to follow. Although we have now an FAQ on the topic, I thought that anyone could use a blog post with some advice and examples. Note that this blog post is not intended to be documentation.
Gridded datasets are now flagged on the GBIF registry This update builds on work from a previous blog post. Gridded datasets are broadly datasets that have low coordinate precision due to rasterized sampling or rounding. This can be a data quality issue because a user might assume an occurrence record has more precision than it actually does. Current statistics 572 datasets are currently flagged as gridded or rasterized on the registry.