OCDS Analytics SaaS and predictions

date 18-04-2019

What is OCDS?

Developed by the Open Contracting Partnership, Open Contracting Data standard (OCDS) is a recognized open data standard for a structured presentation of data related to the contracting process. The standard was designed to promote the accessibility of contracting information and facilitate analysis of contracting metadata. It brings benefits of sharing information and enables better connections among systems by promoting data interoperability.

OCDS logo

Usage of the OCDS enables analysts and citizens to work with joined-up data across countries and sectors. It offers thorough analysis, new approaches, deeper insights. The OCDS is also a yardstick of a well-structured presentation of the data. It describes in detail the procedure of releasing data and associated documents at various stages of the contracting process, offering a framework for governments to continuously collect and publish their information.

Why there is a need to publish OCDS data? The answer is the same as to the question of why we replace traditional paper tenders with an e-procurement system - it is a great tool in fighting corruption as it enables greater transparency and is a gold mine for data scientists since it allows analyzing efficiency, effectiveness, and fairness of public procurement of goods, works and services. The conclusions drawn from the data can facilitate the enhancement of the contracting process.

What is OCDS Analytics SaaS?

OCDS Analytics SaaS is an API-accessible service offering analytic tools for OCDS datasets. We provide the tools to import large volumes of data quickly, synchronize with the OCDS API in real time, write complex queries via our API or GraphQL interface, including aggregation, filtering, and other analytic instruments.

OCDS ANALYTICS logo

We are starting with the Ukrainian Public Procurement system ProZorro. The main aim of this e-procurement system is to provide transparent and efficient spending of public funds. ProZorro has been working since 2015 and currently contains over 3 million tenders.

We intend to import the ProZorro database and provide you with tools to analyze its data. Please follow our Facebook page or subscribe to our newsletter to keep up-to-date with any important news and new releases.

How are Machine learning and Artificial intelligence coming into play?

Machine learning and Artificial intelligence (AI) were successfully applied in the OCDS Analytics subsystem Predictions. We will try to explain how it works here.

While creating a tender in ProZorro system (or in any other e-procurement system, really) there are required fields that have to be filled in, like tender’s title, contact details of your organization, estimated price and minimum step, items that you are going to procure (including their measurements and official classification codes), deadlines, etc.

OCDS offers standard fields for procurement and contracting information and ProZorro uses these fields to store your information. For instance:

tender.title - It is the name of the tender, displayed in an e-procurement system and/or electronic marketplaces. Usually procuring entity describes the item being procured here.

tender.description - This field contains a detailed description of tender: requirements to an item being procured, timelines, periodicity, etc.

item.description - A description of the goods or services to be provided.

item.classification.id - This field is used to indicate a classification code drawn from the selected scheme/codelist used in a tender. CPV scheme codes are recommended.

item.unit.id - It is a measurement unit of an item that is being procured. Commonly, UN/CEFACT Recommendation 20 unit codes are being applied for this field.

Let’s take a classification as an example. It is necessary to specify a classification code for the item that is being procured. ProZorro and many other public e-procurement systems use the Common Procurement Vocabulary (CPV) scheme that has been developed by the European Union to facilitate the description of the subject matter of public contracts. Here are some examples:

  • 15810000-9 Bread products, fresh pastry goods and cakes
  • 15861000-1 Coffee
  • 15863000-5 Tea
  • 30000000-9 Office and computing machinery, equipment and supplies except furniture and software packages
  • 71355000-1 Surveying services
  • 72000000-5 IT services: consulting, software development, Internet and support
  • 73000000-2 Research and development services and related consultancy services
  • 80420000-4 E-learning services

Sometimes it is hard to correctly find and specify the necessary code when creating a tender. Our predictions can help in this situation.

How to use Predictions?

Since OCDS Analytics prepared and offered structured high-quality datasets based on the data from the Ukrainian Public Procurement system ProZorro, we received a great source of historical data for machine learning. So we employed a machine learning framework to create and train a model on these datasets. As a result, we developed a plugin to the analytics system that "predicts" or recommends classification (scheme code and title) with high probability based on the inputs of such fields as:

  • tender.title - "Tender title" / "Назва",
  • tender.description - "Tender description" / "Опис",
  • item.description - "Item description" / "Предмет закупівлі"
  • item.unit.id - "Item unit" / "Одиниця виміру"

You can use either our API or the GraphQL interface to manually issue GraphQL queries. By submitting the above-mentioned fields you will receive a list of the most probable CPV codes and their probability percentage.

If you are procuring coffee, your request will look as follows:

  • tender.title - "Кава" (coffee)
  • item.description - "Кава та кавові напої" (coffee and coffee drinks)
  • item.unit.id - "KGM" (code for kilogram measurement unit)
  • probability - minimum probability 5%
{
  Predictions {
    Classification(
      page: { limit: 3 }
      filters: [
        { eq: { field: "tender.title", value: "Кава" } }
        { eq: { field: "tender.description", value: "" } }
        { eq: { field: "item.description", value: "Кава та кавові напої" } }
        { eq: { field: "item.unit.id", value: "KGM" } }
        { gte: { field: "probability", value: "0.05" } }
      ]
    ) {
      values {
        entity {
          id
          description
          scheme
        }
        probability
      }
    }
  }
}

You receive the following results:

{
  "data": {
    "Predictions": {
      "Classification": {
        "values": [
          {
            "entity": {
              "id": "15861000-1",
              "description": "Кава",
              "scheme": "CPV"
            },
            "probability": 0.7185789
          },
          {
            "entity": {
              "id": "15862000-8",
              "description": "Замінники кави",
              "scheme": "CPV"
            },
            "probability": 0.1997253
          },
          {
            "entity": {
              "id": "15860000-4",
              "description": "Кава, чай та супутня продукція",
              "scheme": "CPV"
            },
            "probability": 0.06485825
          }
        ]
      }
    }
  }
}

So most probably you will have to use CPV code 15861000-1 for Coffee. Or you can try code 15862000-8 for coffee substitutes.

Sounds interesting and useful?

Read our documentation for more information or try it out live.

There is also an online example of machine learning and AI integration. We will describe it in more detail in our next blog post. Stay tuned!