Creating a data standard for infrastructure transparency: building it
This blog is the second in a series by Open Data Services evaluating the lessons from developing a data standard for infrastructure. Open Data Services has worked with us and CoST — The Infrastructure Transparency Initiative to develop the Open Contracting for Infrastructure Data Standard, an open standard for the publication of joined-up data about infrastructure projects and contracts. The first part of this series is Creating a data standard for infrastructure transparency: laying the foundations.
In the first part of this series we made the case for OC4IDS — a standard for publishing joined-up data about infrastructure projects. We explained how the standard builds on the CoST Infrastructure Data Standard and the Open Contracting Data Standard.
In this blog post, we’re looking at how we built OC4IDS — how we structured the standard, how we thought more broadly about reuse, and the tools and resources we’ve built to support implementation.
Building the schema…
In order to determine the right data model, we took into account the findings from the research phase, the lessons learned from the development and implementation of OCDS, and our experience supporting the development, implementation and use of open data standards.
Our aim was to take the disclosure requirements from the CoST IDS and to turn them into a schema for the publication of open data on infrastructure projects and contracts which can be joined-up with detailed contracting data published in OCDS.
Let’s zoom in and look at an example of how OC4IDS adds structure to one of the disclosure requirements from the CoST IDS and how this makes publishing and using data easier.
The CoST IDS requires the disclosure of the Contract Firm(s) involved in an infrastructure project, but it provides no further guidance on the meaning of this requirement, on how to structure the data, or on how to interpret published data.
OC4IDS represents the firms involved in an infrastructure project using the Organization building block, made up of several fields.
For each field in OC4IDS, the schema provides:
- A title and description, so that publishers know what data to provide and users know how to interpret the data, e.g. “the party’s role(s) in the project, using the open partyRole codelist”
- A type and format, so that publishers know how to format the data and machines know how to process it, e.g. an array of strings
- Optionally, an associated codelist, which limits the possible values of the field so that data from different publishers is comparable, and provides titles and descriptions for each possible value, e.g. the partyRole codelist
As well as making data on infrastructure projects easier to publish and use, OC4IDS also opens up new opportunities for using data.
For example, rather than just including the names of the firms associated with an infrastructure project, OC4IDS also encourages publishers to provide organization identifiers.
Publishing identifiers for organizations makes it possible for users to identify where the same organization appears under different names, and to connect with other data sources on beneficial ownership, corporate filings, and more. This is important for many types of analysis, including identifying corruption, measuring competition and understanding the market.
We followed this process for all of the disclosure requirements in the CoST IDS, resulting in a schema that defines the structure, format and meaning of more than 200 individual data elements.
You can read more about the structure of the OC4IDS schema in the Getting Started documentation.
Building the documentation
We know from experience that good documentation is key to the successful adoption of a data standard, so alongside the schema and codelists we developed a comprehensive documentation site.
The site includes introductory materials for new users, reference tables for the schema and codelists, an interactive schema browser, and guidance for publishers and users.
Based on the research phase of the project, we developed guidance on how to include project identifiers in contracting data along with step by step guides on how to publish data from an infrastructure transparency portal and how to use data from procurement systems for infrastructure monitoring.
We also created a fully completed and annotated worked example, which publishers and users can use as a reference to supplement the schema and codelists.
Finally, whilst the OC4IDS can be used by anyone who wants to publish data on infrastructure projects, we documented two mappings for the benefit of CoST’s member programs:
- A mapping to the CoST IDS, which describes how to use OC4IDS to meet each of the disclosure requirements in the CoST IDS.
- A mapping to OCDS, which describes how existing OCDS data can be used to populate some of the fields in OC4IDS and thus meet some of the requirements in the CoST IDS.
Connecting the services: helpdesk, resources, the Data Review Tool and training materials
From our work supporting publishers of open data, we know that there can often be a gap between a commitment to publish and the technical reality of implementing a standard.
Recognizing this, we’ve worked with OCP, CoST and Centro De Desarrollo Sostenible to set up a number of support services.
The bilingual global helpdesk (English and Spanish) is a free service for anyone interested in publishing or using OC4IDS data, through which we provide advice and support at each stage of the implementation process.
Support can include helping potential publishers to scope the options for OC4IDS implementation in their context, advising on mapping existing data sources to OC4IDS, and providing feedback on the quality of published data.
Alongside our activities supporting publishers, we’ve also developed templates and resources to guide them through key stages of the implementation process. These include a scoping template and a field-level mapping template.
A key resource for both implementers and the helpdesk itself is the Data Review Tool, a self-service tool that provides feedback on the quality of OC4IDS data. Implementers can use the tool to get feedback as they work towards publication. The helpdesk also uses the tool to run checks on data shared by publishers.
The Data Review Tool is based on CoVE, which we created to power data review tools for OCDS, 360Giving and the Beneficial Ownership Data Standard. By reusing the core technology from CoVE we were able to quickly adapt and deploy a fully-featured Data Review Tool for OC4IDS without the time and cost involved in starting from scratch.
The final piece of the adoption support package is training. Through the helpdesk we’ve delivered training sessions, workshops and webinars on OC4IDS implementation to CoST members and other implementers. Along the way, we’ve created a library of reusable training resources, including slide decks and interactive workshop activities.
Part III of the series looks at what we’ve learnt from working with implementers to put the standard to use, and the challenges and opportunities for infrastructure procurement transparency.