Dataset
A Dataset combines together images and annotations to provide quick functionalities to manage the data.
Use a Dataset to:
-
query images and annotations in remo
-
annotate
-
export annotations
-
feed data to a training model
-
upload model predictions
class remo.Dataset¶
Remo dataset
documentation
class remo.Dataset(id: int = None, name: str = None, quantity: int = 0, \*\*kwargs)
-
Parameters
-
id – dataset id
-
name – dataset name
-
quantity – number of images
-
add_annotations¶
Fast upload of annotations to the Dataset.
If annotation_set_id is not provided, annotations will be added to:
-
the only annotation set present, if the Dataset has exactly one Annotation Set and the tasks match
-
a new annotation set, if the Dataset doesn’t have any Annotation Sets or if create_new_annotation_set = True
Otherwise, annotations will be added to the Annotation Set specified by annotation_set_id.
Example::
urls = [‘[https://remo-scripts.s3-eu-west-1.amazonaws.com/open_images_sample_dataset.zip](https://remo-scripts.s3-eu-west-1.amazonaws.com/open_images_sample_dataset.zip)’]
my_dataset = remo.create_dataset(name = ‘D1’, urls = urls)
image_name = ‘000a1249af2bc5f0.jpg’
annotations = []
annotation = remo.Annotation()
annotation.img_filename = image_name
annotation.classes=’Human hand’
annotation.bbox=[227, 284, 678, 674]
annotations.append(annotation)
annotation = remo.Annotation()
annotation.img_filename = image_name
annotation.classes=’Fashion accessory’
annotation.bbox=[496, 322, 544,370]
annotations.append(annotation)
my_dataset.add_annotations(annotations)
documentation
add_annotations(annotations: List[remo.domain.annotation.Annotation], annotation_set_id: int = None, create_new_annotation_set: bool = False)
-
Parameters
-
annotations – list of Annotation objects
-
annotation_set_id – annotation set id
-
create_new_annotation_set – if True, a new annotation set will be created
-
add_data¶
Adds images and/or annotations to the dataset.
Use the parameters as follows:
-
Use
local files
to link (rather than copy) images. -
Use
paths_to_upload
if you want to copy image files or archive files. -
Use
urls
to download from the web images, annotations or archives.
In terms of supported formats:
-
Adding images: support for
jpg
,jpeg
,png
,tif
-
Adding annotations: to add annotations, you need to specify the annotation task and make sure the specific file format is one of those supported. See documentation here: https://remo.ai/docs/annotation-formats/
-
Adding archive files: support for
zip
,tar
,gzip
Example::
! wget ‘[https://s-3.s3-eu-west-1.amazonaws.com/open-images.zip](https://s-3.s3-eu-west-1.amazonaws.com/open-images.zip)’
! unzip open-images.zip
urls = [‘[https://s-3.s3-eu-west-1.amazonaws.com/open-images.zip](https://s-3.s3-eu-west-1.amazonaws.com/open-images.zip)’]
my_dataset = remo.create_dataset(name = ‘D1’)
my_dataset.add_data(local_files=[‘./open-images’], annotation_task = ‘Object detection’)
documentation
add_data(local_files: List[str] = None, paths_to_upload: List[str] = None, urls: List[str] = None, annotation_task: str = None, folder_id: int = None, annotation_set_id: int = None, class_encoding=None, wait_for_complete=True)
-
Parameters
-
dataset_id – id of the dataset to add data to
-
local_files – list of files or directories containing annotations and image files Remo will create smaller copies of your images for quick previews but it will point at the original files to show original resolutions images. Folders will be recursively scanned for image files.
-
paths_to_upload – list of files or directories containing images, annotations and archives. These files will be copied inside .remo folder. Folders will be recursively scanned for image files. Unpacked archive will be scanned for images, annotations and nested archives.
-
urls – list of urls pointing to downloadable target, which can be image, annotation file or archive.
-
annotation_task – annotation tasks tell remo how to parse annotations. See also:
remo.task
. -
folder_id – specifies target virtual folder in the remo dataset. If None, it adds to the root level.
-
annotation_set_id – specifies target annotation set in the dataset. If None, it adds to the default annotation set.
-
class_encoding – specifies how to convert labels in annotation files to readable labels. If None, Remo will try to interpret the encoding automatically - which for standard words, means they will be read as they are. See also:
remo.class_encodings
. -
wait_for_complete – if True, the function waits for upload data to complete
-
-
Returns
Dictionary with results for linking files, upload files and upload urls:
{ 'files_link_result': ..., 'files_upload_result': ..., 'urls_upload_result': ... }
annotation_sets¶
Lists the annotation sets within the dataset.
documentation
annotation_sets()
-
Returns
List[
remo.AnnotationSet
]
annotations¶
Returns all annotations for a given annotation set. If no annotation set is specified, the default annotation set will be used
documentation
annotations(annotation_set_id: int = None)
-
Parameters
annotation_set_id – annotation set id
-
Returns
List[
remo.Annotation
]
classes¶
Lists all the classes within the dataset
documentation
classes(annotation_set_id: int = None)
-
Parameters
annotation_set_id – annotation set id. If not specified the default annotation set is considered.
-
Returns
List of classes
create_annotation_set¶
Creates a new annotation set within the dataset If paths_to_files is provided, it populates it with the given annotations. The first created annotation set for the given dataset, is considered the default one.
documentation
create_annotation_set(annotation_task: str, name: str, classes: List[str] = [], paths_to_files: List[str] = None)
-
Parameters
-
annotation_task – annotation task. See also:
remo.task
-
name – annotation set name
-
classes – list of classes to prepopulate the annotation set. Example: [‘Cat’, ‘Dog’]. Default is no classes
-
paths_to_files – list of paths to files or directories containing files to be uploaded. Useful to upload annotatations while creating an annotation set. Default: None
-
-
Returns
remo.AnnotationSet
default_annotation_set¶
If the dataset has only one annotation set, it returns that annotation set. Otherwise, it raises an exception.
documentation
default_annotation_set()
delete¶
Deletes dataset
documentation
delete()
export_annotations_to_file¶
Exports annotations for a given annotation set in a given format and saves it to a file. If export_tags = True, output_file needs to be a .zip file.
It offers some convenient export options, including:
-
Methods to append the full_path to image filenames,
-
Choose between coordinates in pixels or percentages,
-
Export tags to a separate file
-
Export annotations filtered by user-determined tags.
Example::
# Download and unzip this sample dataset: [https://s-3.s3-eu-west-1.amazonaws.com/dogs_dataset.json](https://s-3.s3-eu-west-1.amazonaws.com/dogs_dataset.json)
dogs_dataset = remo.create_dataset(name = ‘dogs_dataset’,
> local_files = [‘dogs_dataset.json’],
> annotation_task = ‘Instance Segmentation’)
dogs_dataset.export_annotations_to_file(output_file = ‘./dogs_dataset_train.json’,
annotation_format = ‘coco’,
append_path = False,
export_tags = False,
filter_by_tags = ‘train’)
documentation
export_annotations_to_file(output_file: str, annotation_set_id: int = None, annotation_format: str = 'json', export_coordinates: str = 'pixel', append_path: bool = True, export_tags: bool = True, filter_by_tags: list = None)
-
Parameters
-
output_file – output file to save. Includes file extension and can include file path. If export_tags = True, output_file needs to be a .zip file
-
annotation_set_id – annotation set id
-
annotation_format – can be one of [‘json’, ‘coco’, ‘csv’]. Default: ‘json’
-
append_path – if True, it appends the image path to the filename, otherwise it uses just the filename. Default: True
-
export_coordinates – converts output values to percentage or pixels, can be one of [‘pixel’, ‘percent’]. Default: ‘pixel’
-
export_tags – if True, it also exports tags to a separate CSV file. Default: True
-
filter_by_tags – allows to export annotations only for images containing certain image tags. It can be of type List[str] or str. Default: None
-
fetch¶
Updates dataset information from server
documentation
fetch()
get_annotation_set¶
Retrieves annotation set with given id. If no annotation set id is passed:
if the dataset has only one annotation set, it returns that one
if the dataset has multiple annotation sets, it raises an error
documentation
get_annotation_set(annotation_set_id: int = None)
-
Parameters
annotation_set_id – annotation set id
-
Returns
remo.AnnotationSet
get_annotation_statistics¶
Retrieves annotation statistics of a given annotation set. If annotation_set_id is not provided, it retrieves the statistics of all the available annotation sets within the dataset.
documentation
get_annotation_statistics(annotation_set_id: int = None)
-
Returns
list of dictionaries with fields annotation set id, name, num of images, num of classes, num of objects, top3 classes, release and update dates
image¶
Returns the remo.Image
with matching img_filename or img_id.
Pass either img_filename or img_id.
documentation
image(img_filename=None, img_id=None)
-
Parameters
-
img_filename – filename of the Image to retrieve
-
img_id – id of the the Image to retrieve
-
-
Returns
remo.Image
images¶
Lists images within the dataset
documentation
images(limit: int = None, offset: int = None)
-
Parameters
-
limit – the number of images to be listed
-
offset – specifies offset
-
-
Returns
List[
remo.Image
]
Example::
my_dataset.images()
info¶
Prints basic info about the dataset:
-
Dataset name
-
Dataset ID
-
Number of images contained in the dataset
-
Number of annotation sets contained in the dataset
documentation
info()
list_image_annotations¶
Retrieves annotations for a given image
documentation
list_image_annotations(annotation_set_id: int, image_id: int)
-
Parameters
-
annotation_set_id – annotation set id
-
image_id – image id
-
-
Returns
List[
remo.Annotation
]
search_images¶
Search images by filename, classes and tags
Examples::
my_dataset.search_images(classes = [“dog”,”person”])
my_dataset.search_images(image_name_contains = “pic2”)
documentation
search_images(annotation_sets_id: int = None, classes: str = None, classes_not: str = None, tags: str = None, tags_not: str = None, image_name_contains: str = None, limit: int = None)
-
Parameters
-
annotation_sets_id – the annotation sets ID to search into (can be multiple, e.g. [1, 2]). No need to specify it if the dataset has only one annotation set
-
classes – string or list of strings - search for images which have objects of all the given classes
-
classes_not – string or list of strings - search for images excluding those that have objects of all the given classes
-
tags – string or list of strings - search for images having all the given tags
-
tags_not – string or list of strings - search for images excluding those that have all the given tags
-
image_name_contains – search for images whose name contains the given string
-
limit – limits number of search results (by default returns all results)
-
-
Returns
List[
remo.AnnotatedImage
]
view¶
Opens browser on dataset page
documentation
view()
view_annotate¶
Opens browser on the annotation tool for the given annotation set
documentation
view_annotate(annotation_set_id: int = None)
-
Parameters
annotation_set_id – annotation set id. If the dataset has only one annotation set, there is no need to specify the annotation_set_id.
view_annotation_stats¶
Opens browser on annotation set insights page
documentation
view_annotation_stats(annotation_set_id: int = None)
-
Parameters
annotation_set_id – annotation set id. If the dataset has only one annotation set, there is no need to specify the annotation_set_id.
view_image¶
Opens browser on image view page for the given image
documentation
view_image(image_id: int)
-
Parameters
image_id – image id