Laboratory One Research Blog

Naive Object Detector

August 04, 2018

Cats and Dogs - 16

Let’s explore a very naive method of detecting objects in an image. We will attempt to use an image classifier to detect objects and draw bounding boxes around them. While we know this is possible, aren’t sure how it’s done. How close can we get before we need to explore the literature?

The Problem

For a given image, how can we find objects, and how well can we draw bounding boxes around them?


There are a few components to this problem:

  • object classification
  • object localization
  • drawing appropriately sized bounding boxes

We will start by trying to handle the simplest case first.

A single object, well-cropped object

Given an image of a single object, which is well-cropped, how could we proceed?

First, we will classify the object in the image. I decided to use a pretrained Convolutional Neural Network, ResNet50. This allowed me to focus on the problem instead of building and training on an image classifier. We can always swap out the classifier for better accuracy later.

Next, let’s learn to draw a bounding box. I used MatPlotLib to draw the bounding box around the whole image. AFter I added the label to the image. We will use abstract this function and use it later.

Cup - 1

Ok, that was the simplest case. How could we adapt this to multiple objects?

Multiple objects

An image with multple objects will not be well-cropped. What do in that case? Perhaps we could divide up the image evenly using a grid, hoping for there to be at most 1 object in each cell.

I started by evenly spliting up the image. Then I ran each cell through the classifier, and drew bounding boxes with their labels. Clearly this is ineffective. There is only 1 object, but there are many boxes and many labels.

Cup - 4

This make more sense if we choose an image with multiple objects.

Cups - 4

Next, let’s get rid of poor labels by filtering out classifications of low confidence.

Cups - 3

Cups - 2

Cups - 1

We have a method to deal with multiple objects, but the boxes are too large and poorly localized.

Bounding box sizes

We’ve been dividing the input image into 4 cells. Could we get better sized bounding boxes if we increase the number of cells?

Cups - 5

Doing so results in much more accurate box placement and sizing. However, it seems that we need to play with the threshold to handle multiple objects.

Now that we are able to create better bounding boxes, how can we deal with different objects?

Multiple different objects

Given an image with many different object, how well does a 2x2 grid perform?

Cats and Dogs - 4

It would seem that it can classify the different objects but the boxes are too large.

Cats and Dogs - 9

Cats and Dogs - 16

Even changing the number of cells isn’t the most effective. It would seem our grid system can’t overcome some issues in localization and sizing.

Next steps

While we were able to poorly perform object detection with out very naive methodologies, the results were weak. Using a grid is a good idea, but I’m missing some key components. I’m unable to overcome the issue of bounding box sizing and position. I’m sure this has to do with a lack of a feedback loop to assess the bounding box size and location. We’ve been hand-coding them and visually assessing the results.

I’ll need to study the literature to understand how the best results have been obtained. I found that Andrew Ng’s Convolutional Neural Network course to be enlightening on this subject. He builds up to the highly effective, real-time YOLO algorithm; a sophisticated object detector.

I will return with a YOLO implementation. It will be lit 🔥.

Peter Chau

Written by Peter Chau, a Canadian Software Engineer building AIs, APIs, UIs, and robots.