A gentle introduction to applying AI in procurement
Thanks to a small grant from Schmidt Futures, we had the opportunity to experiment with applying artificial intelligence technologies to green public procurement.
With so many actors in our field talking about AI more generally, this blog post is meant as a gentle introduction to getting started on applying AI in practice. The goal is to increase the understanding of AI in public procurement – by discussing the key decisions we had to make about our approach.
We set out to use AI to label sentences in procurement documents that match green criteria (that is, criteria designed to reduce the environmental impact of a purchase). This is relevant to many use cases: measuring the adoption of green criteria by different governments and parts of government, determining whether green criteria affect the success of the contracting process, checking whether a “green” procurement actually uses green criteria (greenwashing), and so on. This analysis could then be used to motivate and prioritize increased use of green criteria, as part of the green transition.
For our project, we compared green criteria against real tender notices, using data from the European Union and the Dominican Republic. We chose to focus initially on furniture purchases as governments buy products in this category regularly and their green requirements are well documented. We’ll be posting about the results of that work in an upcoming blog, but we wanted to start here by considering how artificial intelligence can be applied to procurement problems more generally.
1. Identify your task
“Artificial intelligence” describes a wide range of methods. To apply AI successfully, you need to first identify your task, to be able to select an appropriate method.
Let’s look at some of its potential applications in public procurement and their corresponding tasks.
Imagine you had a collection of long Request For Proposal documents and wanted to generate summaries of them. Applying AI could help you create short versions of the documents retaining the important information, by either extracting existing text or generating new text. Your task would be summarization.
Or if a government was managing a large infrastructure construction project, it might want to monitor the community’s reaction to it. You could use AI to review social media posts or monitoring reports discussing the project, and label the text as having either a positive, negative or neutral emotion or opinion. Your task here would be sentiment analysis.
In our case, we’re looking for phrases that are alike in tender notices and predefined green criteria (including various ecolabels and certifications, requirements for recyclable or low toxicity materials, sustainably-sourced components and so on). Our task is sentence similarity. We want to compare two sentences and assign them a similarity score (more on that in a moment).
There are many more tasks to which AI can be applied, and the inputs and outputs can be almost any kind of data: text, numbers, images, audio, video and/or datatables. The Hugging Face website has a good overview of AI tasks. Have a browse, and perhaps you’ll think of other uses for AI in procurement we haven’t mentioned here.
2. Select your method
Once you know your task, a web search is often enough to find an appropriate method.
For sentence similarity, an appropriate method is to convert the text into vector embeddings and to use cosine similarity as the similarity score.
“You lost me.”
Computers don’t understand human language. They need to operate on numbers. We can represent text and other information as numerical values with vector embeddings. A vector is a list of numbers that, in the context of AI, helps us express the meaning of information and its relationship to other information.
Text can be converted into vectors using a model. This Sentence Transformers model, which we used for our project, converts a sentence into a vector of 384 numbers. For example, the sentence “don’t panic and always carry a towel” becomes the numbers 0.425…, 0.385…, 0.072…, and so on.
These numbers represent the meaning of the sentence.
Let’s compare this sentence to another: “keep calm and never forget your towel» which has the vector (0.434…, 0.264…, 0.123…, …).
One way to determine their similarity score is to use cosine similarity to calculate the distance between the vectors of the two sentences. Put simply, the closer the vectors are, the more alike the sentences are. The result of this calculation will always be a number from -1 (the sentences have opposite meanings) to 1 (same meaning). You could also calculate this using other trigonometric measures such as Euclidean distance.
For our two sentences above, performing this mathematical operation returns a similarity score of 0.869.
Now let’s consider the sentence “do you like cheese?” which has the vector (-0.167…, -0.557…, 0.066…, …). It returns a similarity score of 0.199. Hooray! The computer is correct!
But, this method is not fool-proof. Let’s try another: “do panic and never bring a towel” (0.589…, 0.255…, 0.0884…, …). The similarity score is 0.857. The score is high, because the words are similar… but the logic is opposite!
“This doesn’t sound very intelligent.”
If you got the feeling that 384 numbers couldn’t possibly capture the detail, nuance and beauty of all human expression, then you are correct! AI has limits – and, sorry, increasing the length of the vector embedding is not the simple fix.
Training models is the hard part (making everything fast is the other hard part). It takes enormous computing power to run calculations on the very large datasets used to train models, and state-of-the-art models still have performance issues. But researchers are continually improving and inventing techniques for training models. For example, the model used for our green project takes a mere 5 seconds to run a calculation that takes 65 hours with the older model it’s based on!
Besides the techniques used in training, a model’s performance also depends on the datasets used in training. For example, MS MARCO is an appropriate dataset for question-answer tasks (as you might use for a chatbot that answers customer queries based on the company’s FAQs). For our case, Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks describes how to create an appropriate model.
After selecting your method, the next step is to choose an appropriate model, which depends on the details of your task and the input that you provide to the model (in our case, rather technical text related to procurement in 27 languages).
3. Narrow your task
Spend some time reading about your task, to find out whether there are any important distinctions to make.
For example, remember how we mentioned using sentiment analysis to gauge people’s opinions earlier? Two variations on that task involve measuring the polarity of the sentiment (positive or negative) or also its degree (intensity).
In our case, “sentence similarity” can involve a variety of subtasks. The most important distinction lies in how long the result should be when searching for a match. For our project, we were interested in symmetric similarity. When we ran our search, we wanted our query (a green requirement) to return a match that was similar in length and content. For example, entering the query “sustainable metal parts” might retrieve “eco-friendly metal work” as a result. In contrast, asymmetric similarity involves a short query (like a question or keywords) and the match is longer (like an answer or paragraphs). For example, a web user might search “What is Jubislide?” and the search engine should return a longer description of the dance move.
4. Understand your input
A model performs best if the input data resembles its training datasets. For example, if your input is in multiple languages, then a model trained on only English text will not perform well. If your input is wildlife photography, then a model trained on scanned statements will not perform well. And so on.
In this step, try to think of anything that is likely to trip up a model. Are your images action shots, whereas the datasets are stock photos and headshots? Is your text academic dissertations, whereas the datasets are Reddit posts? (If your task is sensitive, like criminal sentencing – well, consider reading books like Weapons of Math Destruction, first.)
In our case, our input is in multiple languages, uses formal expressions, and is about public procurement, among other relevant factors.
5. Select your model
You now have enough information to select your first pre-trained model (or three!). Hugging Face is a popular platform for sharing models and datasets. Try to find a model that fits “well enough” with your input.
As you test different models, you might discover that you need to prepare your inputs to achieve better performance. In our case, we split paragraphs into sentences, removed short sentences, and so on.
You might also discover that no pre-trained model is good enough. In that case, you might consider training your own model (gulp!).
“Why all this trouble? Can’t ChatGPT do everything?”
Short answer: No. Generative large-language models (LLMs) are not general-purpose natural language processing (NLP) task solvers. That said, OpenAI does offer APIs to turn text into embeddings using its small selection of models.
You now have all the key pieces you need to reuse open-source models and libraries to perform your task.
Having read this article, we hope you feel that AI holds less mystery, and that you feel more confident in navigating this technological landscape.
And watch out for our next post where we’ll share our lessons learned and next steps.