Getting your data together: routes towards an OCDS API

This blog is out of date. Check out our new and improved guidance on system architectures for publishing OCDS data.

As we are learning from the feedback we receive on the implementation of the Open Contracting Data Standard, there are some topics that continue to emerge. We’d like to pull them together and share them in a series that looks at the technical implementation of the data standard. This first one looks at different opportunities for your API structure. If you are working on the OCDS, you might also want to look at our process for updating it.

The Open Contracting Data Standard (OCDS) provides a common schema that can be used to publish data on all stages of the contracting process in a range of different data file formats (CSV, Excel, JSON).

In the ideal scenario:

Individual releases and records for each contracting process should be available at unique persistent URLs;
Bulk downloads in CSV (and, if appropriate, Excel) format should be available covering set periods of contracting;
Users should be able to easily locate the collections of releases and records they want.

Although this can be achieved by writing individual files to a web-accessible file-system, in most cases we are seeing that publishers are choosing to opt for a database and API approach.

We are currently consulting on a simple API specification for OCDS, but in this blog post I want to reflect on the different approaches that can be taken in the architecture of an API.

Inside-out: getting data to the web

Publishing OCDS data usually involves a conversion step, where data is mapped from the internal data model used by the publishers, to the common OCDS schema. This can happen at a number of different points. Below we outline three approaches we’ve observed.

(1) Direct publication from live systems

In this scenario each originating system publishes data directly in OCDS format via API. Data is not stored according to OCDS but is converted via the API when an API call is received.

directPublication

key

Pros	Cons
Fewer systems to maintain	Where data originates in multiple systems, multiple APIs must be maintained and 3rd party systems may have to make calls to multiple APIs
	Complex or high volume API calls may place additional load on live system

(2) Separate OCDS datastore, pull and convert

In this scenario a middleware system sits between live systems and the internet facing API. An automated process pulls data from live systems to the middleware system which performs the conversion to OCDS and maintains a datastore in OCDS format.

pullAndConvert

key

Pros	Cons
Modular approach	Additional system to maintain
Single API to maintain and for 3rd party systems to call
Complex or high volume API calls do not place additional load on live system
Possible to share and re-use open source code for providing the datastore and API

A similar approach has been adopted by European Dynamics to support OCDS output from a new e-procurement system for the Zambian Public Procurement Agency, the key difference being that data is pushed rather than pulled from the live e-procurement systems whilst conversion takes place at the middleware layer.

(2.5) Separate OCDS datastore, convert and push

This scenario can be viewed as a combination of the two previous scenarios. Live systems perform the conversion of data to OCDS format and push this to a middleware system which maintains an OCDS format datastore and an internet facing API.

convertAndPush

key

Pros	Cons
Modular approach	Additional system to maintain
Single API to maintain and for 3rd party systems to call	Where data originates in multiple systems multiple OCDS conversions must be maintained
Complex or high volume API calls do not place additional load on live system
Possible to share and re-use open source code for providing the datastore and API
Middleware complexity reduced over scenario (2)

A similar approach has been adopted by the OpenProcurement system, developed in Ukraine and used as the basis for the Prozorro platform, which uses OCDS building blocks as the foundation for live systems data models, easing the conversion process. It should be noted that OCDS is not a framework for building an e-procurement system, however mapping against OCDS can help ensure e-procurement systems are capturing relevant data for disclosure.

(3) Separate OCDS datastore, manual import

In this scenario a middleware system sits between live systems and the internet facing API.

Data is manually exported from live systems for upload to the middleware system which performs conversion to OCDS and maintains a datastore in OCDS format.

key3

There’s a good documented example of this approach from the work Development Gateway have been carrying out in Vietnam.

Pros	Cons
Modular approach	Additional system to maintain
Single API to maintain and for 3rd party systems to call	Manual export/import process introduces potential for failure
Complex or high volume API calls do not place additional load on live system

Things to think about

Search endpoints. Your API may do more than just provide individual release and records. You may want to provide endpoints which can be used to fetch all the contracting processes involving a particular product type, a particular supplier, or a particular procuring agency.Consider whether these endpoints will provide only JSON, or whether they can also provide custom exports of CSV and Excel data for users who are more familiar with spreadsheets.

Documents. OCDS is not just about meta-data and data on contracting processes – it is also about disclosure of documents. In many cases we’ve found where systems link out to documents on external platforms, link-rot can quickly set-in.The best systems will ensure that documents are archived, and kept available permanently.

Generating records. OCDS has the idea of releases (snapshot information about a contracting process), and records (summary of the current state of the process, and links to what has gone before).Every time there is a new release in your system, you will need to update the corresponding record. This could be done at the time data is converted, or could be a separate process, triggered on each update.

Generating bulk exports. Periodic exports of your data are very useful to researchers, analysts and other users. Think about how data will be segmented across bulk files, particularly if you are bulk exporting records.