You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
When using argilla responses in a downstream task like model training, only some of the information from argilla is necessary. Mainly the responses to questions.
Also, if Argilla datasets contain larger media formats like images, getting just these responses is cumbersome and time consuming. Users might want to skip these fields, or get the original local file paths.
Describe the solution you'd like
A simple solution is to support with_fields=False in DatasetRecords so that a user can iterate over only the responses and align them with the source dataset based on record id
A more advance feature would allow the user to define a mapping between argilla and a hf dataset. In the same way that DatasetRecord.log works. So that sub components of Argilla fields and questions could be assigned to specific dataset columns, using dot notation.
For ImageField specifically, a record attribute that relates to other string formats of images could be stored (url, uri, filepaths), so that users can retrieve those instead of the PIL object.
Describe alternatives you've considered
The only current solution is to export everything to_datasets and drop or manipulat rows.
Additional context
The text was updated successfully, but these errors were encountered:
burtenshaw
changed the title
[FEATURE] Controls for data schema when exporting datasets and records
[FEATURE] Controls for data schema for images when exporting datasets and records
Sep 4, 2024
A simple solution is to support with_fields=False in DatasetRecords so that a user can iterate over only the responses and align them with the source dataset based on record id
However, this goes against the backend data model, where fields are a part of the Record object and other attributes are not. The fields would then need to be removed, rather than not added, like suggestions etc.
Is your feature request related to a problem? Please describe.
When using argilla responses in a downstream task like model training, only some of the information from argilla is necessary. Mainly the responses to questions.
Also, if Argilla datasets contain larger media formats like images, getting just these responses is cumbersome and time consuming. Users might want to skip these fields, or get the original local file paths.
Describe the solution you'd like
with_fields=False
inDatasetRecords
so that a user can iterate over only the responses and align them with the source dataset based on recordid
DatasetRecord.log
works. So that sub components of Argilla fields and questions could be assigned to specific dataset columns, using dot notation.Describe alternatives you've considered
The only current solution is to export everything
to_datasets
and drop or manipulat rows.Additional context
The text was updated successfully, but these errors were encountered: