DATA the fuel of machine learning

The environment which surrounds us defines the risks we bear every day. In Perceptolab we spend significant efforts to disentangle the complexity behind perils in everyday surroundings.

Indoor environments are usually tied to the activity for which they are used and the objects they contain thus, objects are de facto connected with the concepts of risk and safety. Items present in the environment interact with each other and might cause dangerous events. Materials play their role as well. Material properties such as flammability and water resistivity restrict the range of objects with which they can safely interact. As an example, a house can easily catch fire from the interaction of cooking equipment with pots, the proximity of portable heaters with fabrics and the closeness of candles with paper [7]. The landscape of objects commonly found in everyday environments is very broad and diversified. Some items result more dangerous than others (e.g. the fireplace is often involved in house fires), objects apparently harmless become dangerous in case of misuse (e.g. electrical appliances in wet areas), some materials are more subject to damages (e.g. parquet vs ceramic flooring) and there are objects known as “safety devices” which can drastically change the outcomes of a home accident.

The smartphones we have in our pockets are equipped with sensors every day more capable of representing reality. The Deep Learning team in Perceptolab takes the challenge of using those capabilities to increase safety in daily surroundings. Through computer vision, a single picture is used to detect objects, their interactions, the type of environment and the materials involved. The task of “Object Detection” is a core capability Perceptolab has been developing in the past years. The object detection engine aims to detect valuables which might be damaged, risky objects and a broad range of safety devices. A proper risk analysis is then carried out to detect dangerous situations and to estimate an index of risk. Perceptolab allows to create targeted risk mitigation strategies tailored on the specific situation which is being processed. Each user is profiled with a risk index and in case of claim, a simplified and controlled procedure is fired on top of the digitized inventory. The whole pipeline is designed to be modular and in future releases new capabilities will be added. As an example, the Object Identification Engine will be able to identify the specific object (as done in face recognition with faces) enabling accurate fraud detection and possible item substitution policies.

The proposed pipeline relies heavily on large scale datasets which are often complex to gather and manage. In the next section, we present some of the problems that Perceptolab has overcome to build a stable and reliable solution. For an in depth view of the connection between the technology developed by Perceptolab and a real product check out our linkedin article.


Recently, computer vision has reached impressive performance thanks to the use of deep learning. Convolutional Neural Networks (CNN) have been applied to all the computer vision tasks beating old approaches based on hand-crafted features in the vast majority of benchmarks, particularly for the object detection task [4] through the supervised learning setting.

"In real world pictures, a single object might appear very differently depending on the perspective, the lighting conditions, possible occlusions and several other factors. State of Art approaches might require thousands of intermediate representations (hidden layers) [2] and millions of parameters to learn such complexities. The newest CNN architectures such as ResNeXt-152 have more than 90 million parameters whose values have to be tuned on data.

Data is one of the key factors for the success of these methodologies. However, images are not the only data required by the training procedure of object detectors. Each image used for training and evaluation must be annotated with the position and the category of each object we are interested in.

Perceptolab aims to recognize a broad range of objects connected to risks: appliances, furniture, valuables, safety devices, fireplaces, stairs, and other structural elements connected to danger. To avoid distribution shift [8], the images must come from a wide range of environments and scenarios in which the engine operates in production and the process to build and refine such a dataset is long and complex.

The annotation process of a significant number of images often requires huge efforts. A large scale dataset such as Open images [6] has approximately 2M images, 600 different object categories and 15M annotations for object detection. To make things worse, every mistake in the annotation process ( wrong, inaccurate or missing annotation ) implies a wrong signal optimized during training or a wrong feedback during testing.

“Fully annotated datasets” are defined as set of images for which every object of interest has been correctly annotated. However, in real world scenarios is uncommon to deal with a perfect dataset. COCO [9] and Open images are two of the most widely used datasets for research on object detection. They are considered as benchmarks for developing and testing new algorithms and thousands of researchers make use of them all over the world. However, looking deeper into these resources is possible to find questionable annotations. Here are reported two annotations found in the Open images dataset.

Large scale datasets, often collected from users, are considered to have noisy, limited, or imprecise annotations and are then defined as “weak supervision datasets”. Unfortunately, the errors present in weak supervision annotations lead often to poor performances if used to train machine learning models.


The work of [1] studies the effect of a particular type of error in weak supervision datasets for object detection: the missing annotations.

The publicly available PASCAL VOC dataset is considered as fully annotated dataset. Different percentages of annotations (0.1, 0.2, 0.3, … 0.9) are deleted from initial annotations to simulate errors in the dataset.

Methods able to reach State of art performances on the fully annotated PASCAL VOC (RCNN, Faster-RCNN, YOLO and SSD) [5] and methodologies of weak supervision (WSOD proposed by [3] ) are trained on the dataset with missing labels and their performances are evaluated on the test set (with no missing labels).

Their results show that the performances of state of art detectors are drastically dependent on the quality of the dataset. The best performances can be reached only when a fully annotated dataset is available and all the state of art detectors show a significant decrease of performances since the first missing annotations of the dataset. Thus, even a small amount of errors in the dataset can make a significant difference for the training of deep learning models. Weakly supervised methodologies instead, are independent from the annotations quality, but they reach much lower performances as a drawback.


A fully annotated dataset is much more worth than a weak supervision dataset because it leads to Machine Learning solutions substantially more performant. It allows the development of new capabilities that aren’t possible otherwise and thus, it creates new value.

The vast majority of datasets available in corporations have no annotations or they can be considered large scale weak supervision datasets. Thus, several efforts have been made to define a procedure to correct annotation errors in such cases.

In the work of [1] a method to iteratively add new annotations to a weak supervision dataset is proposed. In each iteration, the most confident predictions of a Teacher model are added to the dataset annotations and are then used to train a Student model which later becomes the Teacher model. However, there are two main weaknesses. Firstly, an initial Teacher model trained on the same task with “decent performances” is considered to be available but this is not always the case. If the initial predictions are not precise enough, the student model is fed with signals even more noisy and the whole procedure could diverge. Secondly, only missing labels are taken into account. However, this is only one of the possible errors in weak supervision datasets.

In Perceptolab, huge efforts have been made to transform weak supervision datasets into fully annotated datasets. Our procedures mix the predictions of a Teacher model with the existing annotations to detect all types of annotation errors:

1. missing annotation

2. correct location (bounding box) but wrong category

3. correct category but imprecise location (bounding box)

4. wrong annotation ( → wrong category and location)

Once errors have been detected, we make corrections in existing dataset annotations. The corrections can be done automatically or can be supervised manually in a time efficient way. A manual supervision might be preferred to deal with imprecise initial detections and avoid divergence.

A specific tool was developed for the manual supervision. Here are two cases of missing annotation and correct location but wrong category.

This framework has been able to detect a significant amount of dataset errors. The errors regarded mostly objects in the background of the image which however had a significant impact on performances, rising recall up to 10% for some object category.

In Perceptolab, a constant loop is performed to improve the quality of the datasets and consequently the performance of predictive models. Periodically, data collected from users are passed through the correction process and are then included in the existing datasets. A new model is trained on the augmented dataset which will hopefully have better performances. The model is then passed through a strict evaluation procedure and it’s included into the engine.

At each iteration, the correction process becomes faster due to the improved performances of the models, the risk analysis becomes more accurate and the processed environments less risky.


1. Xu, Mengmeng, Yancheng Bai and Bernard Ghanem. “Missing Labels in Object Detection.”

2. Chang, Bo et al. “Reversible Architectures for Arbitrarily Deep Residual Neural Networks.” AAAI (2017).

3. Zhang, Yongqiang, et al. "W2f: A weakly-supervised to fully-supervised framework for object detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.

4. Zhao, Zhong-Qiu, et al. "Object detection with deep learning: A review." IEEE transactions on neural networks and learning systems (2019).




8. Quionero-Candela, Joaquin, et al. Dataset shift in machine learning. The MIT Press, 2009.


This site uses cookies to offer you a better browsing experience on the site. By continuing to browse you accept the use of cookies.

Learn more.