[FEATURE] Controls for data schema for images when exporting datasets and records #5458

burtenshaw · 2024-09-04T10:58:07Z

Is your feature request related to a problem? Please describe.

When using argilla responses in a downstream task like model training, only some of the information from argilla is necessary. Mainly the responses to questions.

Also, if Argilla datasets contain larger media formats like images, getting just these responses is cumbersome and time consuming. Users might want to skip these fields, or get the original local file paths.

Describe the solution you'd like

A simple solution is to support with_fields=False in DatasetRecords so that a user can iterate over only the responses and align them with the source dataset based on record id
A more advance feature would allow the user to define a mapping between argilla and a hf dataset. In the same way that DatasetRecord.log works. So that sub components of Argilla fields and questions could be assigned to specific dataset columns, using dot notation.
For ImageField specifically, a record attribute that relates to other string formats of images could be stored (url, uri, filepaths), so that users can retrieve those instead of the PIL object.

Describe alternatives you've considered

The only current solution is to export everything to_datasets and drop or manipulat rows.

Additional context

The text was updated successfully, but these errors were encountered:

burtenshaw · 2024-09-26T12:50:16Z

I think that we should implement:

A simple solution is to support with_fields=False in DatasetRecords so that a user can iterate over only the responses and align them with the source dataset based on record id

However, this goes against the backend data model, where fields are a part of the Record object and other attributes are not. The fields would then need to be removed, rather than not added, like suggestions etc.

@frascuchon @jfcalvo How do you think we should approach this?

burtenshaw added this to the v2.2.0 milestone Sep 4, 2024

burtenshaw assigned frascuchon and burtenshaw Sep 4, 2024

burtenshaw changed the title ~~[FEATURE] Controls for data schema when exporting datasets and records~~ [FEATURE] Controls for data schema for images when exporting datasets and records Sep 4, 2024

nataliaElv modified the milestones: v2.2.0, v2.3.0 Sep 10, 2024

burtenshaw modified the milestones: v2.3.0, v2.4.0 Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Controls for data schema for images when exporting datasets and records #5458

[FEATURE] Controls for data schema for images when exporting datasets and records #5458

burtenshaw commented Sep 4, 2024

burtenshaw commented Sep 26, 2024

[FEATURE] Controls for data schema for images when exporting datasets and records #5458

[FEATURE] Controls for data schema for images when exporting datasets and records #5458

Comments

burtenshaw commented Sep 4, 2024

burtenshaw commented Sep 26, 2024