How to check your OCDS data validates
This post in our technical series explains how and when to use different tools to check your OCDS data.
The Open Contracting Data Standard specifies how to represent contracting processes as data. You’ve worked hard on your data. Finally, you’ve got a nice and clean first JSON file on your screen and you want to check that the data is compliant. While implementing the standard, it’s useful to be able to check your data frequently.
There are now two public open-source tools available, each designed for different use cases: One is the OCDS Validator provided by the OCP; the other is the jOCDS Validator provided by Development Gateway.
This blog post will help you assess when to use each.
Data Quality Assessment: OCP’s OCDS Validator
What is it?
The OCDS Validator is provided by the Open Contracting Partnership. It’s designed to help with data quality, for example, to check if the date format is correct. This is relevant particularly in the early stages of working with a dataset, either as a publisher or a data user, or when improving an existing publication. It’s among the tools that OCP shares to encourage the publication of good quality open data on contracting, and to encourage its use.
OCP’s OCDS Validator is available as a hosted validator and as a Python command-line utility and library and self-hosted web app.
What does it do?
The OCDS Validator has a web interface that accepts both spreadsheet and JSON serialisations of OCDS data, converting between the formats as necessary. It checks validation against the OCDS schema and performs a growing number of quality checks, and then immediately offers helpful feedback to the user on where any issues lie in the data. It checks for deprecated fields and displays non-schema fields to the user to help them identify potential typos and encourage mapping to OCDS fields. The OCDS validator is updated with the latest schema versions concurrently with their release, and automatically fetches and applies extensions to the schema when specified in the data.
When would you use it?
The OCDS validator has been designed with two main uses in mind: support when preparing to publish data and assessment of existing publications, either by the publisher or a potential user.
When preparing to publish data, it is useful to be able to quickly iterate, trying out changes to a system and assessing the results. The feedback provided is detailed and designed to be read by a user who understands both the data and OCDS. The OCDS Validator presents sample data alongside any issues, and summarizes basic insights into your data, as only you as the person publishing can check whether they’re correct or not.
When you’re already publishing data, it’s useful to be able to check the end product to ensure that nothing’s gone awry over time. The OCDS Validator is designed to allow users to quickly test samples of their data using the same process as they used during development.
As a data user, it’s useful to know the validity, quality, and scope of a dataset before using it. The validator helps users by reporting validation issues and providing basic insights into the dataset.
When wouldn’t you use it?
The OCDS Validator isn’t designed to be used to continuously monitor the quality of a data feed. In some cases, such monitoring is best implemented internally to the publishing system, with tests for each of its component as well as the system as a whole. The community of OCDS publishers and the OCDS Helpdesk can provide guidance on this.
The code behind the OCDS Validator may nonetheless be helpful to this use case and is available on GitHub under an open-source license. A command-line interface is included for users who wish to run the validator locally, integrate it with their own systems or use it as part of their own software.
Continuous Monitoring: Development Gateway’s jOCDS Validator
Development Gateway’s jOCDS validator has many of the same features as the OCDS Validator, with some important distinctions that make it more suitable in certain applications, particularly those that relate to continuous validation as part of a publication system. It was developed initially to validate a large dataset from the Vietnam Public Procurement Authority that could not be efficiently validated by the tools available at the time.
The jOCDS Validator is available as a Java library, command-line utility and self-hosted web app.
What is it?
jOCDS is a Java library that can be run as a standalone application or used as a library by other Java developers. It provides validation of OCDS data, and scales to millions of records.
When would you use it?
The jOCDS validator is designed for use in monitoring the validation status of an existing OCDS feed, as part of a pre-publication check, or as part of a civil society monitoring system. It’s designed to run without human interaction and can be connected to any desired alerting system. Since it is Java-based, and developed as a module, it can also make the life of Java application developers much easier when it comes to validating OCDS, although it can be used by non-Java developers too, through the API.
Because it’s been designed to scale, this validator is useful to those working with large data sets and who want to check for validity before loading into another tool for analysis.
When wouldn’t you use it?
When you’re just starting out with OCDS, and trying to learn about it, or when you’re trying to understand the quality of some data that you’ve got to assess, then the jOCDS validator won’t be the most helpful tool. Because it requires downloading and installing and use of the command-line to get running, it’s not the ideal choice for non-programmer users.