The Dutch Data Prize

Every two years, the Dutch Data Prize is awarded to an individual or a team that makes research data FAIR. In order to have a chance to win this prize, you can nominate a dataset produced by yourself or by another individual or research group. Nominations are restricted to research that was lead or primarily conducted by research-performing organisations in The Netherlands.

The Data Prize will again be awarded in three categories:

Social sciences and humanities
Natural and engineering sciences
Life sciences and health

The winners of the Dutch Data Prize receive €3.500,- towards making their dataset(s) more FAIR.

The Dutch Data Prize is a valuable recognition of researchers’ contributions to their own field and to the principles of FAIR data. The amount is intended to make data more FAIR and encourage data reuse. The winners can use the money, for example, to organise a symposium or make their data more accessible online. See here which datasets won the award since 2010. As many as 50 datasets in the categories Social Sciences & Humanities (SSH), Life Sciences & Health (LSH) and Natural & Engineering Sciences (NES) were submitted to have a shot at winning the Dutch Data Prize 2022 and a cash prize of €3,500. But only one can be the winner. An independent jury has selected three contenders per category from the 50 datasets that where submitted. The nominated datasets will be introduced in the morning programme. Who will be the winner per category? Join the award ceremony at 16:00h and find out!

Life Sciences and Health

DNA barcodes for fungal identification
https://www.ncbi.nlm.nih.gov/bioproject/PRJNA422523,
Duong Vu and Gerard Verkley
Het Westerdijk Fungal Biodiversity Institute (KNAW)

DNA barcoding is a global initiative, aiming at streamlining species identification through simple DNA sequence markers. Being the largest microbial fungal resource in the world, in the Westerdijk Institute’s DNA barcoding project, barcodes of two loci, ITS and LSU, were generated for all (ca. 100 k) strains included in the WI-CBS collection for fungal species identification. A large subset (ca. 24k) of the DNA barcodes have been deposited to GenBank, marking an unprecedented data release event in global fungal barcoding efforts to date. Its associated article is one of the most cited articles in Studies in Mycology published since 2019.

Paired Omics Data Platform

https://pairedomicsdata.bioinformatics.nl,

Stefan Verhoeven, Michelle Schorn, Marnix H. Medema, Pieter C. Dorrestein, Justin J.J. van der Hooft
Wageningen University

The Paired Omics Data Platform is a community-based initiative standardizing links between genomic and metabolomics data in a computer readable format to further the field of natural products discovery. The goals are to link molecules to their producers, find large scale genome-metabolome associations, use genomic data to assist in structural elucidation of molecules, and provide a centralized database for paired datasets.

Xeno-canto

www.xeno-canto.org

Bob Planqué and Willem-Pier Vellinga
Stichting Xeno-canto voor Natuurgeluiden

Xeno-canto (XC) is an online database that provides access to sound recordings of wildlife from around the world. The recordings are shared by a growing community of thousands of recordists from around the world, amateurs and professionals alike. The aim of Xeno-canto is to eventually represent sounds of all animals, meaning of all taxa, to subspecies level, of their complete repertoire, with all of the geographic variability, at all stages of development. The website of Xeno-canto was launched in 2005, and initially focussed on Neotropical birds. The scope of Xeno-canto is gradually widening, firstly in geographical scope, recently also in taxonomical scope with the adoption of Orthoptera in September 2022.

Social Sciences and Humanities

YOUth (cohort) study
https://www.uu.nl/en/research/youth-cohort-study
Coosje Veldkamp and Ron Scholten
Utrecht University

The YOUth study is a large-scale, longitudinal cohort following thousands of babies and children in their development from gestation until early adulthood. YOUth collects a vast amount of FAIR data through a variety of research techniques, including questionnaires, 3D-ultrasounds, EEG, MRI, eye-tracking, computers tasks, and biomaterials (see also our profile paper https://www.sciencedirect.com/science/article/pii/S1878929320301183).

EXCEPTIUS Dataset
https://dataverse.nl/dataverse/EXCEPT
Dr Clara Egger
University of Groningen

Building on a combination of automated and human coding of legal sources, the EXCEPTIUS dataset identifies and classifies COVID-19 containment measures taken daily and at the subnational levels of government (when relevant) in 24 countries of the European Economic Area from 30 January 2020 until 30 April 2021. They predominantly focus on measures related to democratic governance, human rights and daily liberties, international cooperation, and public administration. For each of the measures, the dataset identifies the authorities who adopted it, the geographical coverage of the measure, the groups targeted and the sanction associated with non-compliance. The EXCEPTIUS dataset contains the different versions of the data (v.1 and v.2.) as well as the corpus of sources used to identify the measures. Version 3.0 is currently being developed including a stringency index for all the measures, as well as the additional countries from the European Economic Area.

Film, theatre, and cultural industries in The Netherlands - Linked Open Datasets from the 16th century to today
https://lod.uba.uva.nl/Cinema-Context/Cinema-Context,
Leon van Wissen
University of Amsterdam

This set of three linked, open datasets opens up historic data on arts, theater and cinema. Developed and maintained at the University of Amsterdam, the databases can now be queried in novel ways that were not possible with their previous unlinked iterations. The Cinema Context dataset encompasses all of Amsterdam’s public theatre during the period 1637 - 1772. ECARTICO is a comprehensive collection of biographical data on painters, engravers, printers, book sellers and other artists of the Low Countries in the 16th and 17th centuries. Cinema Context is an online film encyclopaedia with more than 100,000 film screenings in all Dutch cinema’s since 1895.

Natural and Engineering Sciences

FutureStreams
https://doi.org/10.24416/UU01-T7TVTQ
Niko Wanders, Joyce Bosmans and Valerio Barbarossa
Utrecht University

FutureStreams provides streamflow and water temperature information at the global scale at high-spatial resolutions for current conditions and under climate change. It provides relevant information for scientists in the field of water quantity and quality as well as ecology, energy science. To increase accessibility and reuse derived ecological indicators have been included in the dataset as well as the original hydrological data. The data has been created using open-source models by a multi-disciplinary team of scientists. FutureStreams provides a crucial starting point for large-scale assessments of the implications of changes in streamflow and water temperature for society and freshwater ecosystems.

Materials In Paintings (MIP): An interdisciplinary dataset for perception, art history, and computer vision
https://materialsinpaintings.tudelft.nl/https://doi.org/10.4121/13679200.v1
Mitchell Van Zuijlen, Hubert Lin, Kavita Bala, Sylvia Pont, Maarten Wijntjes
TU Delft

In this dataset, they capture the painterly depictions of materials to enable the study of depiction and perception of materials through the artists' eye. Dataset of 19k paintings are annotated with 200k+ bounding boxes from which polygon segments were automatically extracted. Each bounding box was assigned a coarse label (e.g., fabric) and a fine-grained label (e.g., velvety, silky).

AqSolDB, a curated reference set of aqueous solubility and 2D descriptors for a diverse set of compounds
https://doi.org/10.7910/DVN/OVHAW8,
Süleyman Er
DIFFER

The aqueous solubility of compounds has a key role in various domains of natural sciences. To develop robust data-driven prediction models, it is essential that the underlying experimental calibration data is of high fidelity and quality. Existing solubility datasets show variance in chemical space, measurement methods, experimental conditions, but also in non-standard representations, size, and accessibility of data. To overcome these obstacles AqSolDB is created, a grand reference database of aqueous solubility. They merged nine different solubility datasets, curated the merged data, standardized and validated the compound representations, marked the data with reliability labels, and provided 2D descriptors of compounds.

The Dutch Data Prize committee can be found here

event registration made easy

The Dutch Data Prize

The Data Prize will again be awarded in three categories:

FAIR Data Day- 29 November 2022

Organised by