Sharing images, sounds and videos on GBIF
This blog post covers the publication of multimedia on GBIF. However, it is not intended to be documentation. For more information, please check the references below.
NB: GBIF does not host original multimedia files and there is no way to upload pictures to the platform. For more information, please read the how to publish paragraphs.
Media displayed on the GBIF portal
Let’s say that you are looking for pictures of otters, or perhaps the call of a sea eagle.
You can filter GBIF occurrences by Media Type. Once you find what you are looking for, you can see and/or hear it directly on the GBIF portal.
Images
GBIF’s searching interface has a gallery, which displays the first image available for each occurrence selected.
You can also view images for a particular occurrence, by simply clicking on it. See for example, this occurrence with two images from the Norwegian Species Observation Service.
All the images belonging to a single dataset are visible on its page. The same goes for all the images belonging to datasets published under the same organization: see this example.
Audio and Video
Sounds and videos can be directly played from a given occurrence page through a media player.
For example, by clicking on this occurrence from Xeno-canto - Bird sounds from around the world, you can listen to a kookaburra call.
You can also take a look at some videos showing the 3D reconstruction of a specimen of sand flea by clicking on this occurrence from the Museum of Comparative Zoology, Harvard University
Multimedia supported
What exactly can you make available through GBIF?
Media types
In theory, you could share any type of media defined by the Dublin Core Metadata Initiative. The recommended terms are:
In practice, it is a little bit more complicated.
The system infers the media type from the file format by using MIME types (these types correspond to the recommended media types cited above). If the type identified is text/html
, the media is interpreted as a reference link instead of an associated media.
In other words, GBIF currently integrates only images (StillImage
), sounds (Sound
), video (MovingImage
) or collections of the three. For all the other media types, users will have to click on the links given with each occurrence.
For more information about media types, you can check the references at the end of this post.
Formats
I explained in the previous paragraph that GBIF integrates only images, sound and videos. But which formats are supported?
In practice, any format that can be interpreted by Apache Tika is supported. This should include the formats in this IANA Media Type list (now called MIME types).
How to publish media in a Darwin Core Archive
As mentioned at the beginning of this post, GBIF doesn’t host original multimedia files. This means that you cannot upload pictures or audio files directly to GBIF. They must be hosted on another system. What you should provide is a URL or URI for each media file you wish to make available.
When sharing your URLs and URIs, keep the following points in mind:
-
The URL provided must be a direct link to the file. For example: https://ipt.gbif.org/media/UAIC1008871_X.jpg.
-
Images embedded on web pages like https://ipt.gbif.org/media/viewer/UAIC1008871_X.html won’t work, but can be provided in addition to the direct link.
-
The file extension doesn’t always have to be specified in the URL (see for example the URLs provided with this occurrence from The Hemiptera collection (EH) of the Muséum national d’Histoire naturelle).
-
GBIF resizes images for thumbnails, so you should provide the best-resolution possible.
For the next two sections, I assume that you are somewhat familiar Darwin Core Archives and IPTs.
Simplest method: dwc:associatedMedia
The simplest way to share your media is to use the associatedMedia field. Since this term belongs to the Darwin Core Occurrence, you don’t need to create a second file for multimedia. In other words, you can have both your occurrences and images, sounds or videos in the same file.
This field can handle one or several URLs separated by a pipe symbol: |
.
For example:
https://ipt.gbif.org/media/UAIC1008871_X.jpg
https://ipt.gbif.org/media/UAIC1008871_X.jpg | https://ipt.gbif.org/media/UAIC1052169_Pheidole_obtusospinosa_65mm_3x_compedit_lg.jpg
However, this method doesn’t allow to attach any metadata to the media (no title, no license, no author, etc.) so it is not ideal. It also requires that the URL has a common file extension like .jpg
, .jpeg
, .png
or .tiff
.
Extensions: Simple Multimedia and Audubon Media Description
The better way to share your images, sounds or videos is to use extension files.
“Extension” files support the exchange of additional, described classes of data that relate to the core data type (Occurrence or Taxon). An extension record points to a record in the core data file. (Definition from the Darwin Core Archive - How-to wiki.)
GBIF currently supports two types of extensions:
- Simple multimedia
- Audubon Media Description (partial support for now)
Both of these extensions will allow you to share detailed information about your media such as creator
, description
, license
, etc. However, the Audubon Media Description is way more exhaustive.
Whether you decide to use one extension or the other, you need to generate a file containing:
- an occurrenceID field (referring to the occurrence or specimen concerned),
- unique identifiers (
dcterms:identifier
), - links to the media (
dcterms:source
oraccessURI
), - etc.
This file should be mapped with the proper terms and integrated in the Darwin Core Archive.
For more information about the terms available in each extension, please check the references.
Examples
Here are a few datasets using different methods to share their media. Don’t hesitate to check out their Darwin Core Archive to see how it looks.
- This macroinvertebrate deep-sea dataset uses the dwc:associatedMedia field (get Darwin Core Archive here).
- The cnidarians collection (IK) of the Muséum national d’Histoire naturelle uses the Simple Multimedia extension (get Darwin Core Archive here).
- A great example of the use of the Audubon Media Description is this Xeno-canto dataset (get Darwin Core Archive here).
Edit: Checklists can also include multimedia extensions, see for example: Radiolaria taxa in the Norwegian Sea and Arctic Ocean (get Darwin Core Archive here).
Edit: See how GBIF displays the mutlimedia mapping. Example of two specimens of Stephanohelia gracilis from The cnidarians collection (IK) of the Muséum national d’Histoire naturelle. Here are two JPEG
images but the comments
show possible alternatives for other formats and types of media.
How to publish media outside of Darwin Core Archives
As you might know, you can publish resources on GBIF using alternatives to Darwin Core Archives.
See, for example, the two systems below:
I have never set up nor used either of these systems so I am not the best person to advise on this but I can try to give some links.
The only piece of documentation I found concerning the mapping of media fields between ABCD standards (used by BioCASe) and Darwin Core Terms comes from this blog post from 2014:
In ABCD 2.06 we use the unit MultiMediaObject subelements instead. Here there are distinct file and webpage URLs (FileURI, ProductURI), the description (Comment), the license (License/Text, TermsOfUseStatements) and also an indication of the mime type (Format).
If you have found better documentation, please leave a comment below.
Symbiota documents how to submit and upload images on any Symbiota portal here. To make the images accessible from GBIF, you simply need to follow these instructions. As far as I know, Symbiota doesn’t support sounds or videos at the moment.
Please don’t hesitate to mention and link to other systems and their documentations to share multimedia files through GBIF.
Choose a license
GBIF doesn’t give any official recommendation to set a license on your multimedia files. The Licenses fields are essentially free text. However, I would strongly encourage you to set up your licenses in a machine readable format.
For example: https://creativecommons.org/licenses/by-nc/4.0/
All the occurrences on GBIF have one of the three following licenses:
- CC0, for data made available for any use without any restrictions
- CC BY, for data made available for any use with appropriate attribution
- CC BY-NC, for data made available for any non-commercial use with appropriate attribution
Although your multimedia licenses don’t have to match your occurrence licenses, you could consider choosing one of them.
NB: This information might change in the near future and I will try to update this post accordingly or make a new post for multimedia licenses specifically.
Where to host images and other media
Most publishers host their own multimedia files but some use third party platforms such as flickr.com.
I advise against using iNaturalist.org as a way to host the images for your dataset. Since the iNaturalist portal makes its Research-grade Observations available on GBIF, this would create duplicate occurrences.
If you are publishing a dataset through an IPT, you could consider hosting your mutlimedia files on the same server. You can store your images on a media
folder and share them with Apache (see this example). If your are not publishing with your own IPT, don’t hesitate to contact your IPT administrator.
Don’t hesitate to leave a comment if you have any question or suggestion.
References
- Blog post from 2014
- Presentation from Matthew Blissett at TDWG 2018
- Dublin Core Metadata Initiative - term type
- TDWC - Dublin Core - term type
- IANA - Media Types - Formerly known as MIME type
- Apache - tika - MIME types
- GBIF API Media Types
- Darwin Core Archives - How-to Guide
- IPT manual
- dwc:associatedMedia
- Simple Multimedia extension
- Audubon Media Description extension
- Audubon Core
- ABCD standard - TDWG
- Publishing to GBIF from a Symbiota portal
- Quick guide to publishing data through GBIF.org
- Images on an IPT server