Annotation Formats¶
Remo can read and output annotations in a variety of formats. In this section, we present details about the specific formats and options available.
At high-level, we aim to offer native support for:
- most used formats by the Computer Vision community
- simple formats to be used as conversion points to process custom annotation formats
Supported formats¶
Here is a quick summary of the supported file formats across different tasks. For each format, you can refer to the relevant section further below for more information.
Tags
Regardless of the task selected, you can upload or export image-level tags using a CSV file
Supported Format | Example | Input / Output |
---|---|---|
Plain CSV | tags.csv | Both |
Image Classification
Supported Format | Example | Input / Output |
---|---|---|
Plain CSV | plain_img_class.csv | Both |
Pascal VOC XML | voc_img_class.xml | Input |
Remo JSON | remo_img_class.json | Output |
Object Detection
Supported Format | Example | Input / Output |
---|---|---|
Coco JSON | coco_obj_det_io.json | Both |
Plain CSV | obj_det.csv | Both |
Pascal VOC XML | voc_obj_det.xml | Input |
Open Images CSV | old_obj_det_i.csv | Input |
Remo JSON | plain_obj_det_o.json | Output |
Instance Segmentation
Supported Format | Example | Input / Output |
---|---|---|
Coco JSON | coco_inst_seg.json | Both |
Remo JSON | plain_inst_seg.json | Output |
Plain CSV | plain_inst_seg_o.csv | Output |
Additional info¶
Image files¶
Image files can have any of these extensions: .png
, .jpg
, .jpeg
, .tif
, .tiff
.
Class encoding¶
With class encoding we indicate some set of rules that convert actual labels in some IDs. Some prominent Computer Vision datasets make use of this.
A custom class encoding csv file can be passed to convert between IDs and labels. This requires the following formatting:
- use commas to delimit the columns
- no headers
- example: class_encoding_example.csv
Some pre-made class encodings are also available:
Objects coordinates¶
Across formats, objects coordinates follow the following logic:
- can be expressed in either pixels or percentages wherever possible
(0,0)
indicates the top left corner of an image- percentage example:
(x, y) = (0.6, 0.9)
means(x = 0.6*width , y=0.9*height)
Image Classification¶
Image classification refers to the task of assigning labels to an image as a whole. Remo supports the following formats:
Format | File type | Required fields | Optional Fields | Example |
---|---|---|---|---|
Pascal VOC | XML | filename, object | voc_img_class.xml | |
Plain CSV file | CSV | file_name, class_name | plain_img_class.csv |
Pascal VOC¶
The Pascal VOC XML annotation format became popular thanks to the ImageNet challenge, which adopted the format of the PASCAL VOC challenge.
By default, on import classes follow the WordNet encoding, which you can download here
Plain CSV¶
This is a custom simple format we support to facilitate converting data between formats.
- Columns in the file have to be separated by commas
- you can use any of the following variations on column headers:
file_name | class_name |
---|---|
image , image_id , image id , file_name , filename , file , file name , file_path , path , file path |
class , label , name , class_name , class name , label_name , label name , category |
Object Detection¶
Object detection refers to the task of identify objects in an image using rectangular boxes. Remo supports the following formats:
Format | File type | Required fields | Optional Fields | Example |
---|---|---|---|---|
Coco Object Detection | JSON | file_name, segmentation, area, iscrowd, image_id, bbox, category_id, id | info | coco_obj_det_io.json |
Pascal VOC | XML | filename, size, object | voc_obj_det.xml | |
Open Images | CSV | ImageID, LabelName, Confidence,XMin, XMax, YMin, YMax | Source, IsOccluded, IsGroupOf, IsDepiction, IsInside | old_obj_det_i.csv |
Plain CSV file | CSV | file_name, class_name, xmin, ymin, xmax, ymax | - | obj_det.csv |
Coco¶
You can read more about the Common Objects in Context challenge here
Coordinates for bounding boxes are expressed in pixel values as [xmin, ymin, width, height].
Pascal VOC¶
You can read more about the Pascal Visual Objects Classes challenge here.
Classes are WordNet encoded which you can download here.
Open Images¶
You can read more about the Open Images challenge here.
Classes (LabelName
) are Google Knowledge Graph encoded which you can access here.
Plain CSV¶
We support the following variations on column headers:
file_name | class_name | xmin | xmax | ymin | ymax |
---|---|---|---|---|---|
image , image_id , image id , file_name , filename , file , file name , file_path , path , file path |
class , label , name , class_name , class name , label_name , label name , category |
xmin , x_min , x min , x1 , x_1 |
xmax , x_max , x max , x2 , x_2 |
ymin , y_min , y min , y1 , y_1 |
ymax , y_max , y min , y2 , y_2 |
Instance Segmentation¶
Instance segmentation refers to the task of detecting and delineating each distinct object of interest appearing in an image.
Remo supports the following formats:
Format | File type | Required fields | Optional Fields | Example |
---|---|---|---|---|
Coco Instance Segmentation | JSON | file_name, segmentation, area, iscrowd, image_id, category_id, id | info | coco_inst_seg.json |
Coco¶
You can read more about the Common Objects in Context challenge here
Coordinates are expressed in pixel values as [x, y] pairs.