Checklist publishing on GBIF - some explanations on taxonID, scientificNameID, taxonConceptID, acceptedNameUsageID, nameAccordingTo
When data publishers publish checklists, they will use a a Darwin Core Archive Taxon Core
. And although the taxon core terms are already described here, what exactly to put in which field can sometimes be confusing. And there is a lot to read, like here https://github.com/tdwg/tnc/issues/1
Here I am sharing a summary of an email conversation we had with some data publishers on Helpdesk concerning some of the Taxon Core fields. I hope this can be useful for others.
General considerations
The dwc:taxonID field contains the identifier used to establish relationships between entries within a checklist. The dwc:taxonID field can be understood as a “nameUsageID”, that is, an identifier for a name as used in a specific context, here: the context of the checklist (it isn’t called like that for historical reasons but this would match the concept better). This means the dwc:acceptedNameUsageID field must refer to identifiers used in the dwc:taxonID field. For example, to signify that Caretta nasuta is a synonym of Caretta caretta you could have a checklist like this:
taxonID | scientificName | acceptedNameUsageID | taxonomicStatus |
---|---|---|---|
ID1 | Caretta caretta (Linnaeus, 1758) | ACCEPTED | |
ID2 | Caretta nasuta Rafinesque, 1814 | ID1 | SYNONYM |
Other fields such as dwc:parentNameUsageID link to dwc:taxonIDs.
However, this doesn’t have to be the case for the scientificNameID and taxonConceptID fields
The dwc:scientificNameID is meant to identify the name alone and ideally uses an identifier from a nomenclator such as IPNI or ZooBank, but can also be a name identifier specific for the dataset. Our system doesn’t really interpret this field. [Edit 2023-11-09: the scientificNameID is now intepreted in specific contexts, if you want to know more, please read: https://github.com/gbif/pipelines/issues/217]
The dwc:taxonConceptID on the other hand is meant to identify the taxon concept regardless of which name is being used. AvibaseIDs (like D3A260BCA65503C6
for https://avibase.ca/D3A260BC) are a perfect fit for example. Even though we do not make use of them much, they could be used to identify identical concepts.
NB: The values in the dwc:taxonID field for the GBIF Backbone taxonomy checklist correspond to the GBIF taxon keys used in species pages (like https://www.gbif.org/species/2433753
) and to filter for GBIF occurrences (like https://www.gbif.org/occurrence/search?taxon_key=2433753
). For any other checklist, the dwc:taxonID are likely to be different and unrelated to GBIF taxon keys. You can use any identifier in your own checklist.
Relationship between those fields in two specific cases: circumscriptions and misapplied names
For capturing possible circumscriptions:
For those of us who need a little reminder:
In biological taxonomy, circumscription is the content of a taxon, that is, the delimitation of which subordinate taxa are parts of that taxon. If we determine that species X, Y, and Z belong in Genus A, and species T, U, V, and W belong in Genus B, those are our circumscriptions of those two genera. Another systematist might determine that T, U, V, W, X, Y, and Z all belong in genus A. Agreement on circumscriptions is not governed by the Codes of Zoological or Botanical Nomenclature, and must be reached by scientific consensus.
See: https://en.wikipedia.org/wiki/Circumscription_(taxonomy)
How this would be translated into a checklist:
- each entry in the checklist dataset will have its own dwc:taxonID
- multiple entries of the accepted name corresponding to the different circumscriptions would have the same dwc:scientificNameID
- but each would have a distinct dwc:taxonConceptID
- use dwc:nameAccordingTo to refer to the source of the circumscription.
In practice, how that this look like?
Let’s try to apply this to the genus Cortinarius (Pers.) Gray. According to Liimatainen, K., Kim, J.T., Pokorny, L. et al. (https://doi.org/10.1007/s13225-022-00499-9), the family Cortinariaceae should be split into ten genera. Following that assessment, the concept associated with the name Cortinarius became more narrow as many species have been moved to different genera like Calonarius callochrous (Pers.) Niskanen & Liimat., 2022.
So if I want to model this idea in a checklist, I could do as following (here I am using the identifier from Index Fungorum as scientificNameID):
taxonID | scientificName | dwc:nameAccordingTo | scientificNameID | dwc:taxonConceptID |
---|---|---|---|---|
ID1 | Cortinarius (Pers.) Gray | urn:lsid:indexfungorum.org:names:17391 | conceptID-1 | |
ID2 | Cortinarius (Pers.) Gray | Niskanen & Liimat., 2022 | urn:lsid:indexfungorum.org:names:17391 | conceptID-2 |
Note that we could also include the sensu/sec information in the dwc:scientificName. So the scientific name for the ID2 entry could also be Cortinarius (Pers.) Gray. sec. Liimatainen, K., Kim, J.T., Pokorny, L. et al.
or Cortinarius (Pers.) Gray. sensu Liimatainen, K., Kim, J.T., Pokorny, L. et al.
.
We could do something similar for Tyrannausorus rex which had many different circumscriptions over the years as many specimens originally identified as separate species were later thought to be juvenile T. rex. For example, Holtz identified Aublysodon mirandus as a young T. rex in 2004.
taxonID | scientificName | dwc:nameAccordingTo | scientificNameID | dwc:taxonConceptID |
---|---|---|---|---|
ID1 | Tyranosaurus rex Osborn, 1905 | Rauhut 2003 | https://paleobiodb.org/classic/checkTaxonInfo?taxon_no=54833 |
conceptID-16 |
ID2 | Tyranosaurus rex Osborn, 1905 | Holtz 2004 | https://paleobiodb.org/classic/checkTaxonInfo?taxon_no=54833 |
conceptID-18 |
For misapplied names:
A misapplied name is when the name of one taxon is erroneously applied to a different taxon.
(From: https://doi.org/10.7560/740440-015)
How this would be translated into a checklist:
- the entry for the misapplied name would have a unique dwc:taxonID,
- the dwc:taxonomicStatus would be
misapplied
, - the dwc:nameAccordingTo would be used to specify the source being the author of the misapplication of the name,
- the field dwc:acceptedNameUsageID would contain the ID of the currently accepted name in the dataset for the actual taxonomic entity corresponding to the correct identification.
For example:
taxonID | scientificName | dwc:nameAccordingTo | dwc:taxonomicStatus | dwc: acceptedNameUsageID |
---|---|---|---|---|
ID1 | Ficus exasperata Vahl | Vahl. (1805). In: Enum. Pl. 2: 197. | ACCEPTED | |
ID2 | Ficus exasperata | De Wildeman & Durand in Ann. Mus. Congo Belge, B, Bot., ser. 2, 1: 54. 1899 | MISAPPLIED | ID1 |
I hope this post is useful. If you would like us to cover more checklist cases and examples, please leave us a comment letting us know what you would like to see.