Click here to download the full report on F1000Research
This gap analysis report is the third in a series which has examined gaps in data standards. The first version of the report examined gaps in agriculture and food data (Pesce, Kayumbi, Tennison, Mey, and Zervas: 2016). A second version (Pesce, Tennison, Dodds and Zervas: 2017) examined the situation in the area of data standards for weather data (and closely related geospatial data), and particularly focused on weather data for use in farm management services.
This third version focuses on data standardisation gaps in specific use cases of aggregation of land data and nutrition data around indicators: the Land Portal and the Global Nutrition Report.
The report starts with a review of the relevant types of data for these use cases, then illustrates similarities between the two projects and similar standardisation gaps, and then moves to more specific challenges for the two individual projects.
The Land Portal (LP) gathers information from a broad range of land-related data and information providers. It is organised and visualised in ways that are intuitive and usable for researchers, private sector actors and policy makers at global and local levels. The information provided can strengthen research, advocacy, and policy making efforts by enabling a better understanding of land governance issues affecting various countries and regions.
The Global Nutrition Report (GNR) is a comprehensive narrative on global and country-level nutrition. GNR produces the Report annually and aggregates a wealth of nutrition and nutrition-related data from a wide range of sources. This data underpins the report itself as well as being used to produce a range of supplementary materials, including country, regional, and sub-regional profiles and data visualisation tools.
The main conclusions drawn from the report are that
The two use cases present many similarities. They both aggregate data from secondary sources, already partly normalised by global agencies; they both aggregate data around specific indicators; they aggregate from datasets with a similar structure (indicator, country, year, value).
The identified gaps in data standardisation are very similar. The names of countries and regions in data sources are not standardised or they are standardised according to different conventions; the names of the variables do not follow any convention; indicators are represented by strings and may change over the years (both their names and the measurement methods).
With reference to our data standard assessment criteria, the few standards used by the data sources (country naming conventions, value ranges, units of measurement) in both cases are not open and not very usable. The situation is different when it comes to the way the two projects re-publish the data: the GNR normalises values around some conventions, while the LP re-publishes everything according to Linked Data principles, and uses published vocabularies.