Adding data¶
Remo has been designed to be a layer on top of your data to facilitate its management. As such, adding data to Remo is key to leverage its smart datasets management functionalities.
When it comes to adding images, there are two main options available:
- Linking images
- Uploading images
Whenever adding annotations, information is simply decoded and stored in the PostgreSQL database.
Duplicate images¶
After an image has been added to Remo, you can add other copies of the same image to other Datasets. These will be virtual copies, minimizing disk storage space.
Remo also automatically checks and prevents to have duplicate images within the same dataset.
Linking images¶
When linking images, Remo opens them directly from your hard disk. The advantage is that Remo will not create new copies of your data, thus minimizing disk storage. This however also means that if you move or delete your images after linking them, Remo will fail to find them.
NB: even when linking data, Remo still creates a copy of smaller version of the images to show thumbnails and previews in an efficient way. Those shouldn't amount to more than 15% of the original size of the data.
To link data, you can select the "Use local data" tab from the UI, or use the local_files
parameter when using add_data
method of the Python Library.
Linking is not possible if you are passing archive files or when using remote URLs.
Also, if you are using a Docker version of Remo, you'd have to explicitly expose the folder to the container. See the Remo Docker installation page for more information on this.
Uploading images¶
When uploading images, a copy of the data will be uploaded and stored in the media subfolder within the .remo folder.
This is the default behaviour when passing archive files, files from web URLs and when using a Docker installation of Remo.
The advantage of this method is that regardless to what happens to your original data, you will have a copy of it stored in Remo.