Object Detection Pipeline using Remo

In this tutorial, we will use Remo to accelerate and improve the process of building a transfer learning pipeline for an Object Detection task.

In particular, we will:

  • Use Remo to browse through our images and annotations
  • Use Remo to understand the properties of the dataset and annotations by visualizing statistics.
  • Create a custom train, test, valid split in-place using Remo image tags.
  • Fine tune a pre-trained FasterRCNN model from torchvision and do some inference
  • Visually compare bounding box predictions with the ground truth

Along the way, we will see how the Dataset visualization provided Remo helps to gather insights to improve the dataset and the model.

Let's start by importing the relevant libraries:

from PIL import Image
import os
import glob
import random
import csv

import pandas as pd
import numpy as np
import tqdm

import torch
from torch.utils.data import DataLoader, Dataset

import torchvision
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.rpn import AnchorGenerator
import torchvision.transforms as transforms

import remo

Adding Data to Remo

  • The dataset used in this example is a subset of the Open Images Dataset.

  • The directory structure of the dataset is:

    ├── object_detection_dataset
        ├── images
            ├── image_1.jpg
            ├── image_2.jpg
            ├── ...
        ├── annotations
            ├── annotations.csv
            ├── model_predictions.csv
# The dataset will be extracted in a new folder
if not os.path.exists('object_detection_dataset.zip'):
    !wget https://s-3.s3-eu-west-1.amazonaws.com/object_detection_dataset.zip
    !unzip -qq object_detection_dataset.zip
    print('Files already downloaded')
# The path to the folders
path_to_images =  './object_detection_dataset/images/'
path_to_annotations = './object_detection_dataset/annotations/'

annotations_file_path = os.path.join(path_to_annotations, 'annotations.csv')

To visualise the labels as strings rather than IDs, we can use a dictionary mapping the two of them.

# Mapping between Class name and Index
cat_to_index = {'Wheel'        : 1, 
                'Car'          : 2,
                'Person'       : 3, 
                'Land vehicle' : 4, 
                'Human body'   : 5, 
                'Plant'        : 6, 
                'Tire'         : 7, 
                'Vehicle'      : 8, 
                'Vehicle registration plate' : 9}

Train / test split

In Remo, we can use tags to organise our images. Among other things, this allows us to generate train / test splits without the need to move image files around.

To do this, we just need to pass a dictionary (mapping tags to the relevant images paths) to the function remo.generate_image_tags().

im_list = [os.path.abspath(i) for i in glob.glob(path_to_images + '/**/*.jpg', recursive=True)]
im_list = random.sample(im_list, len(im_list))

# Definining the train test split
train_idx = round(len(im_list) * 0.4)
valid_idx = train_idx + round(len(im_list) * 0.3)
test_idx  = valid_idx + round(len(im_list) * 0.3)

# Creating a dictionary with tags
tags_dict =  {'train' : im_list[0:train_idx], 
              'valid' : im_list[train_idx:valid_idx], 
              'test' : im_list[valid_idx:test_idx]}

train_test_split_file_path = os.path.join(path_to_annotations, 'images_tags.csv') 
remo.generate_image_tags(tags_dictionary  = tags_dict, 
                         output_file_path = train_test_split_file_path, 
                         append_path = True)

Create a dataset

To create a dataset we can use remo.create_dataset(), specifying the path to data and annotations.

The class encoding (if required) is passed via a dictionary.

For a complete list of formats supported, you can refer to the docs.

# The annotations.csv is generated in the same path of the sub-folder
object_detection_dataset =  remo.create_dataset(name = 'object-detection-dataset', 
                                                local_files = [path_to_images, path_to_annotations],
                                                annotation_task = 'Object Detection')

Visualizing the dataset

To view and explore images and labels, we can use Remo directly from the notebook. We just need to call dataset.view().



Looking at the dataset, we notice some interesting points:

  • There is a significant degree of overlap in bounding boxes of different classes (e.g. Wheel and Car)
  • Bounding box sizes vary a good amount across Wheel and Car objects
  • Pictures of Cars can be taken from different angles

Dataset Statistics

Using Remo, we can quickly visualize some key Dataset properties that can help us with our modelling, without needing to write extra boilerplate code.

This can be done either from code, or using the visual interface.


[{'AnnotationSet ID': 347, 'AnnotationSet name': 'Object detection', 'n_images': 7, 'n_classes': 9, 'n_objects': 50, 'top_3_classes': [{'name': 'Wheel', 'count': 28}, {'name': 'Car', 'count': 9}, {'name': 'Tire', 'count': 4}], 'creation_date': None, 'last_modified_date': '2020-09-01T11:10:37.164406Z'}]



Looking at the statistics we can gain some useful insights like:

  • Some labels are not present in the test and valid set, but are present in the training set. This means we will not get an indicative model performance for these class (which is fine for the tutorial's sake, but in real life we would want to fix that)

  • The Wheel class has more instances than any other class in the dataset. Higher reported performance on this class might be caused by this.

Feeding Data into PyTorch

Here we start working with PyTorch. To load the data, we will define a custom PyTorch Dataset object (as usual with PyTorch).

In order to adapt this to your dataset, the following are required:

  • train_test_valid_split (Path to Tags): path to tags csv file for Train, Test, Validation split. Format: file_name, tag.
  • annotations (Path to Annotations): path to the annotations CSV File. Format : file_name, classes, xmin, ymin, xmax, ymax,
  • mapping (Mapping): a dictionary containing mapping of class name and class index. Format : {'class_name' : 'class_index'}
class ObjectDetectionDataset(Dataset):
    Custom PyTorch Dataset Class to facilitate loading data for the Object Detection Task
    def __init__(self, 
                 mapping = None, 
                 mode = 'train', 
                 transform = None): 
            annotations: The path to the annotations CSV file. Format: file_name, classes, xmin, ymin, xmax, ymax
            train_test_valid_split: The path to the tags CSV file for train, test, valid split. Format: file_name, tag
            mapping: a dictionary containing mapping of class name and class index. Format : {'class_name' : 'class_index'}, Default: None
            mode: Mode in which to instantiate class. Default: 'train'
            transform: The transforms to be applied to the image data

            image : Torch Tensor, target: Torch Tensor, file_name : str
        self.mapping = mapping
        self.transform = transform
        self.mode = mode

        self.path_to_images = './object_detection_dataset/images/'
        # Loading the annotation file (same format as Remo's)
        my_data = pd.read_csv(annotations)
        # Here we append the file path to the filename. 
        # If dataset.export_annotations_to_file was used to create the annotation file, it would feature by default image file paths
        my_data['file_name'] = my_data['file_name'].apply(lambda x : os.path.abspath(f'{self.path_to_images}{x}'))
        my_data = my_data.set_index('file_name')

        # Loading the train/test split file (same format as Remo's)
        my_data['tag'] = pd.read_csv(train_test_valid_split, index_col='file_name')

        my_data = my_data.reset_index()
        # Load only Train/Test/Split depending on the mode
        my_data = my_data.loc[my_data['tag'] == mode].reset_index(drop=True)
        self.data = my_data

        self.file_names = self.data['file_name'].unique()

    def __len__(self) -> int:
        return self.file_names.shape[0]

    def __getitem__(self, index: int):

        file_name = self.file_names[index]
        records = self.data[self.data['file_name'] == file_name].reset_index()       
        image = np.array(Image.open(file_name), dtype=np.float32)
        image /= 255.0

        if self.transform:
            image = self.transform(image)  

        # here we are assuming we don't have labels for the test set
        if self.mode != 'test':
            boxes = records[['xmin', 'ymin', 'xmax', 'ymax']].values
            area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
            area = torch.as_tensor(area, dtype=torch.float32)

            if self.mapping is not None:
                labels = np.zeros((records.shape[0],))

                for i in range(records.shape[0]):
                    labels[i] = self.mapping[records.loc[i, 'classes']]

                labels = torch.as_tensor(labels, dtype=torch.int64)

                labels = torch.ones((records.shape[0],), dtype=torch.int64)

            iscrowd = torch.zeros((records.shape[0],), dtype=torch.int64)

            target = {}
            target['boxes'] = boxes
            target['labels'] = labels
            target['image_id'] = torch.tensor([index])
            target['area'] = area
            target['iscrowd'] = iscrowd 
            target['boxes'] = torch.stack(list((map(torch.tensor, target['boxes'])))).type(torch.float32)

            return image, target, file_name
            return image, file_name

def collate_fn(batch):
    return tuple(zip(*batch))

The train, test and validation datasets are instantiated and wrapped around a DataLoader method.

tensor_transform = transforms.Compose([transforms.ToTensor()])

# Here the operations provided with Remo are integrated into a workflow in PyTorch 
# by using the custom ObjectDetectionDataset method.

train_dataset = ObjectDetectionDataset(annotations = annotations_file_path,  
                                       train_test_valid_split = train_test_split_file_path,
                                       transform = tensor_transform,
                                       mapping = cat_to_index,
                                       mode = 'train')

test_dataset = ObjectDetectionDataset(annotations = annotations_file_path,  
                                       train_test_valid_split = train_test_split_file_path,
                                       transform = tensor_transform,
                                       mapping = cat_to_index,
                                       mode = 'test')

train_data_loader = DataLoader(train_dataset, batch_size=1, shuffle=False, num_workers=0, collate_fn=collate_fn)
test_data_loader  = DataLoader(test_dataset, batch_size=1, shuffle=False, num_workers=0, collate_fn=collate_fn)

Training the Model

In this tutorial, we use a Faster RCNN architecture with a ResNet-50 Backbone, pre-trained on on COCO train2017. This is loaded directly from torchvision.models

To train the model, we specify the following details:

  • Model: The edited version of the pre-trained model.
  • num_classes: The number of classes present in the dataset = actual n of classes + 1 for background of the image (that's a peculiarity of Faster RCNN)
  • Optimizer: The optimizer used for training the network
  • Num_epochs: The number of epochs for which we would like to train the network.
device      = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
num_classes = 10
loss_value  = 0.0
num_epochs  = 5
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)

in_features = model.roi_heads.box_predictor.cls_score.in_features
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)


params = [p for p in model.parameters() if p.requires_grad]

optimizer = torch.optim.SGD(params, lr=0.005, momentum=0.9, weight_decay=0.0005)
# The training loop trains the model for the total number of epochs.
# (1 epoch = one complete pass over the entire dataset)

for epoch in range(num_epochs):

    train_data_loader = tqdm.tqdm(train_data_loader)
    for images, targets, image_ids in train_data_loader:

        images = list(image.to(device) for image in images)
        targets = [{k: v.to(device) for k, v in t.items()} for t in targets]

        loss_dict = model(images, targets)

        losses = sum(loss for loss in loss_dict.values())
        loss_value = losses.item()

    print('\nTraining Loss : {:.5f}'.format(loss_value))

Visualizing Predictions

Using Remo, we can easily iterate through the images to compare the model predictions against the original labels.

To do this, we just need to upload the model predictions to a new AnnotationSet, which we call model_predictions

# Mapping Between Predicted Index and Class Name
mapping = { value : key for (key, value) in cat_to_index.items()}

detection_threshold = 0.3
results = []

test_data_loader = tqdm.tqdm(test_data_loader)

with torch.no_grad():
    for images, image_ids in test_data_loader:

        images = list(image.to(device) for image in images)
        outputs = model(images)

        for i, image in enumerate(images):

            boxes = outputs[i]['boxes'].data.cpu().numpy()
            scores = outputs[i]['scores'].data.cpu().numpy()
            boxes = boxes[scores >= detection_threshold].astype(np.int32)
            scores = scores[scores >= detection_threshold]
            image_id = image_ids[i]

            for box, labels in zip(boxes, outputs[i]['labels']):
                results.append({'file_name' : os.path.basename(image_id), 
                                'classes'   : mapping[labels.item()], 
                                'xmin'      : box[0],
                                'ymin'      : box[1],
                                'xmax'      : box[2],
                                'ymax'      : box[3]})

model_predictions_path = path_to_annotations + 'model_predictions.csv'

with open(model_predictions_path, 'w') as csvfile:
    writer = csv.DictWriter(csvfile, fieldnames=['file_name', 'classes', 'xmin', 'ymin', 'xmax', 'ymax'])
predictions = object_detection_dataset.create_annotation_set(annotation_task='Object Detection', 
                                                             name = 'model-predictions-oid',
                                                             paths_to_files = [train_test_split_file_path, model_predictions_path])

By visualizing the predicted boxes against the ground truth, we can go past summary performance metrics, and visually inspect model biases and iterate to improve it.

For example, we might notice in the picture below how the model incorrectly but clearly predicts the left car lamp to be a "Wheel", perhaps due to the shape being similar.