
Overview
This service concerns the curation of ground truth annotations for testing and experimentation data. More specifically, data labelling concerns the association of portions of data (e.g., image regions) to descriptions that provide information about the significance of those portions. In particular, labelling is required whenever the data are meant to be used for model training. We also provide labelling services for domain-specific data attributes, such as disease indicators and their confounding factors, by experts in agronomy. The labelled data will be accompanied by a report synthesising quality metrics and statistics like the number of data points annotated, the percentage of the full set covered for the annotations, the number of annotators and the inter-annotator agreement.
More about the service
This service offers to the customer access to an expert team comprising engineers trained in the development and use of AI models and agronomists with extensive knowledge of real-world use cases.
This initial analysis will also provide an estimate of the time required to execute the service. In this definition phase, we will select the most appropriate type of annotations to produce (e.g., pixel-level annotations of images vs. a single label for the whole image), as well as the level of granularity of labels (e.g., plant colour, variety, crop class).
We will also agree with the customer on the expected annotation format (e.g., JSON, CSV) and annotation tool to be used (e.g., in case the customers want to label additional data by themselves in the future).Once the definition phase of the service is complete, AgrifoodTEF will proceed with the setup and execution of the labelling. The final outcome of the service is the labelled datasets, accompanied by a report with evaluations of the data based on quality metrics.
Peronospora symptoms can be detected by inspecting changes on the leaf surface (appearance of small spots, gradual changes in the leaf colour). The customer has already implemented a computer vision algorithm to classify leaves as healthy or unhealthy from images. However, additional data are required to improve the performance and robustness of the solution. These data have been collected in the field via service S00113, but the customer requires that they are thoroughly annotated by expert agronomists who can recognise the presence of Peronospora symptoms at the level of individual leaves.To fulfil the request from the customer, service S00290 is executed by first identifying segmentation masks as the most suitable annotation format. Albeit more costly and onerous to produce, polygonal masks can more precisely represent the leaf regions affected by the disease than, for example, rectangular bounding boxes that also enclose healthy leaf regions.
Together with the customer, it is then agreed that 60% of the collected images will be annotated by 5 agronomy experts with: i) segmentation masks for each region, if any region affected by the disease is found, ii) each region will be labelled as either “healthy” or “unhealthy”. Moreover, we will rely on Fleiss’ Kappa to ensure that a significant level of agreement (0.60 or higher) has been achieved among annotators when marking leaves as “healthy/unhealthy”.