The territory and the map: storing and sharing open contracting data
This post in our technical series explains how to approach data modelling for storing, and sharing, open contracting data.
A number of the questions we get asked by implementers of the Open Contracting relate to the use of OCDS for the collection and storage of data inside their data systems.
It’s important in these cases to note that the primary use-case for OCDS is the sharing and exchange of data on contracting. It provides a map that represents a contracting process, but in creating that map, we have to abstract at times from all the granular detail of contracting processes as they happen in the world. As Borges has vividly illustrated, when the map becomes the territory, it ceases to be useful in its own right. So, whilst OCDS can be used to provide guidance on how to collect and store data, it should not be seen as definitive on such matters.
This might at times seem counter-intuitive. Why would we suggest that there are cases where an implementer would not follow OCDS for their internal systems? The answer relates to the design principles, and user needs guiding OCDS.
We can see this illustrated with the example of date information:
N | OCDS | ISO8601 with duration |
1 | {«startDate»:»2007-03-01T13:00:00Z», «durationInDays»:»423″ } | 2007-03-01T13:00:00Z/P1Y2M10D |
2 | {«durationInDays»:»55″} | P1M25D |
3 | {«durationInDays»:»423″, «endDate»:»2007-03-01T13:00:00Z» } | P1Y2M10D/2007-03-01T13:00:00Z |
Each of these blocks represent (more or less) the same thing:
- (1) Is a period starting on at 1pm on 1st March 2007 and lasting 1 year 2 months
- (2) Is an approximately 55 day period (1 month, 25 days)
- (3) is a period running for 1 year and 2 months, and ending by 1st March 2007
So why, if we’re using ISO8601 date-time strings to validate startDate
and endDate
don’t we use the whole of IS8601 in OCDS, and allow date strings that include periods directly? That would replace three fields in OCDS with just one and allow use of all sorts of additional features of ISO8601 such as fuzzy dates.
The main reason is usability. We know that many users of OCDS data will ultimately end up working with it using spreadsheet software. What is easier to use? A column containing:
period |
2007-03-01T13:00:00Z/P1Y2M10D |
P1M25D |
P1Y2M10D/2007-03-01T13:00:00Z |
2007-03-01T13:00:00Z/2008-04-27T09:00:00Z |
Or a table with:
period/startDate | period/endDate | period/durationInDays |
2007-03-01T13:00:00Z | 423 | |
55 | ||
2007-03-01T13:00:00Z | 423 | |
2007-03-01T13:00:00Z | 2008-04-27T09:00:00Z |
This latter format is arguably much easier to read, and allows a spreadsheet user to perform simple calculations (e.g. finding the difference between two dates, or using fuzzy date information to esimate the start or end date of a period).
In practice, many tools do not fully implement ISO8601 periods, and so we would either have situations where OCDS files contain dates that cannot be easily parsed, or we would have to specify which subset of ISO860 is allowed in OCDS.
By using date-time values for startDate
and endDate
, and introducing durationInDays
and maxExtentDate
for cases where a fixed start or end date is not specified, we seek to strike a compromise between the full expression of dates and periods, and generating data that can be more easily analysed.
We know that there may be cases where a publisher does not have specific time information, or may not yet even know the month or date of the month at which a period will start. However, in asking publishers in these cases to provide a reasonably estimate, or to set a business rule (such as setting all unknown times to 00:00:00 or 23:23:59), we place the burden of working out how such ambiguous dates should be handled onto the publisher rather than the user*.
Of course – this compromise does not meet all internal needs of transactional procurement systems. There may be cases where these systems need to store fuzzy dates, or manage information on periods which cannot be easily managed by the reduction in OCDS to periodInDays
(e.g. capturing that a period should be 6 month, but accepting that the duration in days will only be set later when the start date is known). Because the full ISO8601 standard is more expressive than OCDS dates, it is possible to store data internally in ISO8601, and then, making a number of assumptions, to publish this data in the OCDS format.
This is why we say that there are cases where the internal storage of a field, and the OCDS representation of that field legitimately vary, and why we encourage implementers to think carefully about the different use cases for data capture, and standardised sharing of data.
(*Though this is a balance we will be reviewing in the next iteration of OCDS.)