Image Classification vs Semantic Segmentation vs Instance Segmentation


Image Classification

Let us consider that we have thousands of dog and cat images and we need to identify and categorise those images into two groups. If a human were to do this task, it would definitely take up a large amount of time and hence we try to design a model that can do this task. Now, there are two methods of doing this, the first method is to teach the model using a set of sample images and then let the model do its task. The second method is to let the model learn by itself and it will somehow be able to categorise the images. The first method is supervised learning and the second method is called unsupervised learning. Both have their own merits and demerits and can perform the classification task effectively. The final output from the image classification is a class and that class is the label given by the model to the input image. This task could be done using a simple CNN.


Now, that we have classified the image, we only know the label of the object, but we are not sure in which part of the image the object is present. To know the area covered by the object, we can generate a box around the object and this box will indicate the extremes of the object. This box is called as the bounding box. Only one bounding box is generated and that box covers all the objects in the image. This is called as localisation. Both image classification and localisation can be done only for images with one object.

Object Detection

Let us consider an image in which there is an apple and an orange. We can draw a box around both the objects and try to label them as apples and oranges. This is where the difference between classification and detection comes into picture. Classification could only label them as fruit, wheras object detection could draw a box around them and tell the name of the fruit as well.

Semantic Segmentation

Segmentation is the next level of object detection, where the bounding boxes are generated for every object. Instead of generating a box around the objects, we try to draw a boundary around every object and know the pixel level details. Semantic segmentation means labelling every pixel in the image and knowing to which class it belongs. We can assign a colour for every class. For example, look at the above image,there are three type sof objects in that image,namely,’sheep’, ’road’ and ‘grass’. Every class is assigned one colour. Now while segmenting the image, if the model encounters a pixel that belongs to the class ‘grass’, it assigns green colour to that pixel. Similary, if it encounters someother class, it assigns the corresponding colour of that class. This type of segmentation is very much useful in autonomous driving.

Instance Segmentation

Instance Segmentation is one step ahead of semantic segmentation. Instead of assigning same pixel values to all objects of same class, it tries to segment and show different instances of the same class. It assigns an instance ID to the instances of a class and the end result will have an image where all the objects are separated by pixel boundaries. For example, if two pens are present in the image, it would label them as pen1 and pen2. Instance segmentation has various applications like, ship detection,human pose estimation and crowd count. Both types of segmentation can be done using mask RCNN.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Nirmala Murali

Nirmala Murali


Research scholar at Indian Institute of Space Science and Technology