Image Classification Pipeline using Remo

In this tutorial, we will use Remo to speed up the process of building a transfer learning pipeline for an Image Classification task.

In particular, we will:

  • Use Remo to visualize and explore our images and annotations
  • Use Remo to quickly access some key statistics of our Dataset
  • Create custom train/test/val splits in PyTorch without needing to move data around (thanks to Remo image tags)
  • Visually compare our model predictions with our ground-truth

Along the way, we will see how the Dataset visualization provided Remo helps to gather insights to improve the dataset and the model.

Let's start by importing the relevant libraries:

from PIL import Image
import os
import glob
import random

import pandas as pd
import numpy as np
import tqdm
from sklearn.model_selection import train_test_split

import torch
import torch.nn as nn
from import Dataset
from import DataLoader
from torchvision import transforms
import torchvision.models as models
import torch.optim as optim

import remo

Adding Data to Remo

  • For this example, our dataset is a subset of the Flowers 102 Dataset.
  • In the next cell we download the data as a zip file and extract the files in a new folder.

  • The folder structure of the dataset is:

    ├── small_flowers
        ├── images
            ├── 0
                ├── image_1.jpg
                ├── image_2.jpg
                ├── ...
            ├── 1
                ├── image_3.jpg
                ├── image_4.jpg
                ├── ...
        ├── annotations
            ├── images_tags.csv
            ├── annotations.csv
# The dataset will be extracted in a new folder
if not os.path.exists(''):
    !unzip -qq
    print('Files already downloaded')
# The path to the folders
path_to_images =  './small_flowers/images/'
path_to_annotations = './small_flowers/annotations/'


We can easily generate annotations from a series of folders, by passing the root directory path to remo.generate_annotations_from_folders().

annotations_file_path = os.path.join(path_to_annotations, 'annotations.csv')
remo.generate_annotations_from_folders(path_to_data_folder = path_to_images, 
                                       output_file_path = annotations_file_path)

To visualise the labels as strings rather than IDs, we can use a dictionary mapping the two of them.

cat_to_index = { 0 : 'Pink Primrose',  
                 1 : 'Hard-leaved Pocket Orchid', 
                 2 : 'Canterbury Bells'}

Train / test split

In Remo, we can use tags to organise our images. Among other things, this allows us to generate train / test splits without the need to move image files around.

To do this, we just need to pass a dictionary (mapping tags to the relevant images paths) to the function remo.generate_image_tags().

annotations = pd.read_csv(annotations_file_path)

temp_train, test = train_test_split(annotations, stratify=annotations["class_name"], test_size=0.1)
train, val = train_test_split(temp_train, stratify=temp_train["class_name"], test_size=0.1)

# Creating a dictionary with tags
tags_dict =  {'train' : train["file_name"].to_list(), 
              'val' : val["file_name"].to_list(), 
              'test' : test["file_name"].to_list()}

train_test_split_file_path = os.path.join(path_to_annotations, 'images_tags.csv') 
remo.generate_image_tags(tags_dictionary  = tags_dict, 
                         output_file_path = train_test_split_file_path)

Create a dataset

To create a dataset we can use remo.create_dataset(), specifying the path to data and annotations.

The class encoding is passed via a dictionary.

For a complete list of formats supported, you can refer to the docs.

# The annotations.csv is generated in the same path of the sub-folder
flowers =  remo.create_dataset(name = 'flowers', 
                              local_files = [path_to_images, path_to_annotations],
                              annotation_task = 'Image classification',
                              class_encoding = cat_to_index)

Visualizing the dataset

To view and explore images and labels, we can use Remo directly from the notebook. We just need to call dataset.view().



Looking at the dataset, we notice some interesting points:

  • The flowers have a distinct structure, which is useful for distinguishing between the classes
  • The images contain not only the flower, but other background information like grass and leaves which might confuse the classifier.
  • The colors of the three flowers are similar.
  • The images have been taken from varied angles and distances.

Dataset Statistics

Remo alleviates the need to write extra boilerplate for accessing dataset properties.

This can be done either using code, or via the visual interface.


[{'AnnotationSet ID': 348, 'AnnotationSet name': 'Image classification', 'n_images': 140, 'n_classes': 3, 'n_objects': 0, 'top_3_classes': [{'name': 'Hard-leaved pocket orchid', 'count': 60}, {'name': 'Canterbury bells', 'count': 40}, {'name': 'Pink primrose', 'count': 40}], 'creation_date': None, 'last_modified_date': '2020-09-02T08:30:19.135869Z'}, {'AnnotationSet ID': 349, 'AnnotationSet name': 'model_predictions', 'n_images': 14, 'n_classes': 3, 'n_objects': 0, 'top_3_classes': [{'name': 'Hard-leaved pocket orchid', 'count': 7}, {'name': 'Pink primrose', 'count': 4}, {'name': 'Canterbury bells', 'count': 3}], 'creation_date': None, 'last_modified_date': '2020-09-02T08:32:51.212628Z'}]



Looking at the statistics, we gain some insights like:

  • There are more examples of Hard-Leaved Pocket Orchid compared to the other two classes.
  • The number of examples of Pink Primrose and Canterbury Bells are equal.

Feeding Data into PyTorch

Here we start working with PyTorch. To load the data, we will define a custom PyTorch Dataset object (as usual with PyTorch).

In order to adapt this to your dataset, the following are required:

  • train_test_valid_split (Path to Tags): path to tags csv file for Train, Test, Validation split. Format: file_name, tag.
  • annotations (Path to Annotations): path to the annotations CSV File. Format : file_name, class_name
  • mapping (Mapping): a dictionary containing mapping of class name and class index. Format : {'class_name' : 'class_index'}
class FlowerDataset(Dataset):
    Custom PyTorch Dataset Class to facilitate loading data for the Image Classifcation Task
    def __init__(self, annotations, train_test_valid_split, mapping = None, mode = 'train', transform = None):
            annotations: The path to the annotations CSV file. Format: file_name, class_name
            train_test_valid_split: The path to the tags CSV file for train, test, valid split. Format: file_name, tag
            mapping: a dictionary containing mapping of class name and class index. Format : {'class_name' : 'class_index'}, Default: None
            mode: Mode in which to instantiate class. Default: 'train'
            transform: The transforms to be applied to the image data

            image : Torch Tensor, label_tensor : Torch Tensor, file_name : str

        my_data = pd.read_csv(annotations, index_col='file_name')
        my_data['tag'] = pd.read_csv(train_test_valid_split, index_col='file_name')
        my_data = my_data.reset_index()

        self.mapping = mapping
        self.transform = transform
        self.mode = mode

        my_data = my_data.loc[my_data['tag'] == mode].reset_index(drop=True) = my_data
    def __len__(self):
        return len(

    def __getitem__(self, idx):
        if self.mapping is not None:
            labels = int(self.mapping[[idx, 'class_name'].lower()])
            labels = int([idx, 'class_name'])

        im_path =[idx, 'file_name']

        label_tensor =  torch.as_tensor(labels, dtype=torch.long)
        im =

        if self.transform:
            im = self.transform(im)

        if self.mode == 'test':
            # For saving the predictions, the file name is required
            return {'im' : im, 'labels': label_tensor, 'im_name' :[idx, 'file_name']}
            return {'im' : im, 'labels' : label_tensor}
# Channel wise mean and standard deviation for normalizing according to ImageNet Statistics
means =  [0.485, 0.456, 0.406]
stds  =  [0.229, 0.224, 0.225]

# Transforms to be applied to Train-Test-Validation
train_transforms      =  transforms.Compose([
                         transforms.Normalize(means, stds)])

test_valid_transforms =  transforms.Compose([
                         transforms.Normalize(means, stds)])

The train, test and validation datasets are instantiated and wrapped around a DataLoader method.

train_dataset = FlowerDataset(annotations =  annotations_file_path,
                              train_test_valid_split = train_test_split_file_path,
                              transform =  train_transforms,
                              mode =  'train')

valid_dataset = FlowerDataset(annotations = annotations_file_path,
                              train_test_valid_split = train_test_split_file_path,
                              transform = test_valid_transforms,
                              mode = 'valid')

test_dataset  = FlowerDataset(annotations = annotations_file_path,
                              train_test_valid_split = train_test_split_file_path,
                              transform = test_valid_transforms,
                              mode = 'test')

# If you face issues in operating systems like Windows, you can set num_workers=0.
train_loader =, batch_size=5, shuffle=True, num_workers=1)
val_loader   =, batch_size=1,  shuffle=False, num_workers=1)
test_loader  =,batch_size=1, shuffle=False, num_workers=1)

Training the Model

We use a ResNet-18 architecture, with weight pre-trained on ImageNet.

To train the model, the following details are to be specified:

  1. Model: The edited version of the pre-trained model.
  2. Data Loaders: The dictionary containing our training and validation dataloaders
  3. Criterion: The loss function used for training the network
  4. Num_epochs: The number of epochs for which we would like to train the network.
  5. dataset_size: an additional parameter which is used to correctly scale the loss, the method for this is specified in the DataLoader cell
  6. num_classes: Number of classes in the dataset
model = models.resnet18(pretrained=True)

num_classes = 3

# Freezing the weights
for param in model.parameters():
    param.required_grad = False

# Replacing the final layer
model.fc =  nn.Sequential(nn.Linear(512, 256), 
                         nn.Linear(256, num_classes), 
# Model Parameters
optimizer    =  optim.Adam(model.fc.parameters(), lr=0.001)
criterion    =  nn.NLLLoss()
num_epochs   =  2
data_loaders =  {'train' : train_loader, 'valid': val_loader}
dataset_size =  {'train' : len(train_dataset), 'valid' : len(valid_dataset)}

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# This method pushes the model to the device.
model = 
# The training loop trains the model for the total number of epochs.
# (1 epoch = one complete pass over the entire dataset)

for epoch in range(num_epochs):

    model.train() # This sets the model back to training after the validation step
    print('\nEpoch Number {}'.format(epoch+1))

    training_loss = 0.0
    val_loss = 0.0
    val_acc = 0
    correct_preds = 0
    best_acc = 0
    validation = 0.0
    total = 0

    train_data_loader = tqdm.tqdm(data_loaders['train'])

    for x, data in enumerate(train_data_loader):
        inputs, labels = data['im'].to(device), data['labels'].to(device)
        outputs = model(inputs)

        loss = criterion(outputs, labels)
        training_loss += loss.item()

    epoch_loss = training_loss / dataset_size['train']
    print('Training Loss : {:.5f}'.format(epoch_loss))
    valid_data_loader = tqdm.tqdm(data_loaders['valid'])

    # Validation step after every epoch
    # The gradients are not required at inference time, hence the model is set to eval mode
    with torch.no_grad():
        for x, data in enumerate(valid_data_loader):
            inputs, labels = data['im'].to(device), data['labels'].to(device)
            outputs = model(inputs)

            val_loss = criterion(outputs, labels)
            _, index = torch.max(outputs, 1)

            total += labels.size(0)
            correct_preds += (index == labels).sum().item()

            validation += val_loss.item()

        val_acc = 100 * (correct_preds / total)

        print('Validation Loss : {:.5f}'.format(validation / dataset_size['valid']))
        print('Validation Accuracy is: {:.2f}%'.format(val_acc))
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

test_data_loader =  tqdm.tqdm(test_loader)

total =  0
correct_preds =  0
pred_list =  {}

with torch.no_grad():
    for x, data in enumerate(test_data_loader):
        single_im, label = data['im'].to(device), data['labels'].to(device)
        im_name = data['im_name']

        pred = model(single_im)

        _, index = torch.max(pred, 1)

        total += label.size(0)
        correct_preds += (index == label).sum().item()

        pred_list[os.path.basename(im_name[0])] = cat_to_index[index.item()]

df = pd.DataFrame(pred_list.items(), columns=['file_name', 'class_name'])

model_predictions_path = os.path.join(path_to_annotations, 'model_predictions.csv')

with open(model_predictions_path, 'w') as f:
    df.to_csv(f, index=False)

print('Accuracy of the network on the test images: %d %%' % (100 * (correct_preds / total)))

Visualizing Predictions

Using Remo, we can visually compare the model predictions against the original labels.

To do this we create a new AnnotationSet, and upload predictions as a csv file

predictions = flowers.create_annotation_set(annotation_task='Image Classification', 
                                            name = 'model_predictions',
                                            paths_to_files = [train_test_split_file_path, model_predictions_path])

By visualizing the predicted labels against the ground truth, we can get visual feedback on how the model is performing and gain some insights.

Via this workflow, we can not only understand the model through its metrics, but also visually inspect its biases and iterate to improve the model.

Some insights include, that the model is able to perform well on both the test and validation set, which indicates that the model has learned to classify the flowers in this dataset accurately.

Results Comparison

Visualize the dataset in Python

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

        _, index = torch.max(pred, 1)

        total += label.size(0)
        correct_preds += (index == label).sum().item()

        pred_list[os.path.basename(im_name[0])] = cat_to_index[index.item()]

    df.to_csv(f, index=False)

print('Accuracy of the network on the test images: %d %%' % (100 * (correct_preds / total)))

for image: