First steps with Remo python library¶
The Remo python library provides an intuitive way to visualize, clean and work with images for a variety of computer vision tasks.
Create a dataset¶
Adding data to Remo is as easy as passing the path or URL of your data's location to the remo.create_dataset()
method of the library.
In this example, a sample dataset hosted online is added to Remo directly via URL
import remo
import pandas as pd
# To seamlessly use Remo within the Jupyter Notebook, use the following setting
remo.set_viewer('jupyter')
urls = ['https://remo-scripts.s3-eu-west-1.amazonaws.com/open_images_sample_dataset.zip']
my_dataset = remo.create_dataset(name = 'open images detection',
urls = urls,
annotation_task = "Object detection")
Acquiring data - completed
Processing data - completed
Data upload completed
That's it! Your images are now accessible and stored in a centralised place.
Remo supports a number of annotation formats and tasks out of the box. You can read more in the documentation.
Manage multiple datasets¶
Within Remo, you can host multiple datasets and retrieve one when needed.
This allows you to organize and reuse your data across projects.
Let's list all the datasets and retrieve one:
remo.list_datasets()
[Dataset 1 - 'ocr_symbols', Dataset 2 - 'test', Dataset 8 - 'open_images', Dataset 9 - 'test', Dataset 12 - 'open images detection']
# make sure to use the right ID when running the tutorial
new_dataset = remo.get_dataset(1)
Visualize¶
Providing an easy to use interface is another way in which Remo makes your life working on a computer vision project easier.
By calling dataset.view()
method, you can open an interactive interface which allows you to visually inspect your images and the corresponding annotations.
You can visualise your dataset directly in Jupyter (or in a separate window if you are not a fan of notebooks)
my_dataset.view()
Annotation Statistics¶
Once data is in Remo, you can easily explore the statistics and other important properties of your data.
For example, you can quickly see:
- what's contained in the annotations
- check if there are unbalanced classes
- spot if some objects are only contained in a few images
You can do this by printing the stats of an annotation set or using the interactive UI.
Calling my_dataset.get_annotation_statistics()
will print annotation statistics to the screen
my_dataset.get_annotation_statistics()
[{'AnnotationSet ID': 41, 'AnnotationSet name': 'Object detection', 'n_images': 10, 'n_classes': 18, 'n_objects': 98, 'top_3_classes': [{'name': 'Fruit', 'count': 27}, {'name': 'Sports equipment', 'count': 12}, {'name': 'Human arm', 'count': 10}], 'creation_date': None, 'last_modified_date': '2020-05-29T13:38:52.259776Z'}]
Calling my_dataset.view_annotation_stats()
will show an interactive dashboard.
Here you can inspect annotations more in details and manage your classes and tags
my_dataset.view_annotation_stats()
Export Annotations¶
In order to use the dataset for training a model, you can export the annotations to a standardised format such as CSV, JSON, etc
my_dataset.export_annotations_to_file('output.zip', annotation_format='csv')
Further functionalities¶
You can refer to other tutorials and the documentation to further explore the library and see how to use it to better manage your datasets.
Some of the other things you can do include:
- Easily experimenting with choice of annotations from code
- Custom uploading of annotations and predictions and joint visualization
- Advanced images search by classes, tags and filenames