Skip to content

Conversation

saphjra
Copy link
Collaborator

@saphjra saphjra commented Aug 21, 2025

Description

Implemented loading text stimuli in the Dataset.load() method.

Implemented changes

Additions:

  • property Dataset.stimuli
  • method Dataset.load_text_stimuli(), called in Dataset.load()
  • method dataset_files.load_text_stimuli_files() which takes the dataset definition, fileinfo and path as arguments
  • method dataset_files.load_text_stimuli_file(), called in dataset_files.load_text_stimuli_files() that uses the TextStimulus.from_file() (which has been made static by us)
  • content type 'stimuli' to the DatasetDefinition.filename_format

How Has This Been Tested?

All the previously implemented tests passed

Added a dataset_type "ToyAOI", which includes a stimuli folder with files imported from tests/files/aoi_multipleye_stimuli_toy_x_1 (not generated in contrast to the other dataset_types)

We added a test for

  • test_stimuli_list_exists(), which runs for all types of toy datasets
  • test_stimuli_list_not_empty(), which checks our added toy dataset config "ToyAOI"
  • test_loaded_text_stimuli_list_correct(), which checks if the number of files loaded is correct (number of AOI files provided in the folder), if the content of the first 10 rows of the first AOI file provided match, and if the column number corresponds

Type of change

  • New functionality
  • Documentation update

Context

Partially resolves

related issues:

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules
  • I have checked my code and corrected any misspellings

Future work and comment on warnings

  • include more tests, for testing incompatible file types and throwing other errors
  • add support for custom stimuli_dirname argument in dataset_files.load_text_stimuli_files(), which is causing the warnings right now
  • only .csv AOI files can be loaded at the moment, but different separators and Excel files are common, too, so they need to be supported in the future
  • add a tutorial notebook using the new loading feature
  • right now only the TextStimulus is supported, ImageStimulus class needs to be modified significantly

Copy link

codecov bot commented Aug 21, 2025

Codecov Report

❌ Patch coverage is 98.18182% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 99.97%. Comparing base (5734a24) to head (87f20cc).
⚠️ Report is 9 commits behind head on main.

Files with missing lines Patch % Lines
src/pymovements/dataset/dataset_files.py 95.45% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##              main    #1267      +/-   ##
===========================================
- Coverage   100.00%   99.97%   -0.03%     
===========================================
  Files          104      104              
  Lines         4512     4554      +42     
  Branches       783      788       +5     
===========================================
+ Hits          4512     4553      +41     
- Partials         0        1       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@saphjra saphjra marked this pull request as ready for review August 21, 2025 09:58
@saphjra saphjra changed the title Feature/add load stimuli in dataset Feature/add load text stimuli in dataset Aug 21, 2025
@dkrako dkrako changed the title Feature/add load text stimuli in dataset add: add load_text_stimuli() to Dataset Aug 21, 2025
@dkrako dkrako changed the title add: add load_text_stimuli() to Dataset feat: add load_text_stimuli() to Dataset Aug 21, 2025
@github-actions github-actions bot added the enhancement New feature or request label Aug 21, 2025
Copy link
Member

@SiQube SiQube left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really cool feature, thank you for working on it! some minor feedback (already)

@pytest.mark.parametrize(
'expected',
[
EXPECTED_AOI_MULTIPLEYE_STIMULI_TOY_X_1_TEXT_1_1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add an additional test using the existing aoi file in tests/files

def test_stimuli_list_exists(gaze_dataset_configuration):
dataset = Dataset(**gaze_dataset_configuration['init_kwargs'])

assert isinstance(dataset.stimuli, list)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could also be an empty list

Copy link
Contributor

@dkrako dkrako left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already looking really great! Apart from adding some tests for the existing AOI files, I would probably revert the hard removal of the from_file() to avoid a breaking change for now. I deprecate this in a follow-up PR.

@@ -71,6 +71,7 @@ def __init__(
self.events: list[Events] = []
self.precomputed_events: list[PrecomputedEventDataFrame] = []
self.precomputed_reading_measures: list[ReadingMeasures] = []
self.stimuli: list[TextStimulus] = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine for this PR, but we should probably create a more general Stimulus base class. Moreover, we should think about Stimulus collections, as an individual Stimulus may be mapped to some trials in the experiment. I will create an issue regarding these.

@@ -131,6 +133,8 @@ def load(
:py:meth:`pymovements.Dataset.path`.
This argument is used only for this single call and does not alter
:py:meth:`pymovements.Dataset.preprocessed_rootpath`. (default: None)
stimuli_dirname: str | None
:py:meth:`pymovements.Dataset.stimuli_rootpath`. (default: None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably missed to paste

            One-time usage of an alternative directory name to save data relative to
            :py:meth:`pymovements.Dataset.path`.
            This argument is used only for this single call and does not alter

if self.definition.resources.has_content('stimuli'):
self.load_text_stimuli()
# stimuli_dirname=stimuli_dirname, # TODO custom dir name
# extension=extension,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

passing the extension won't be necessary, as it's more related to the gaze files.

@@ -203,6 +203,28 @@ class DatasetDefinition:
transformations. If not specified, the constant eye-to-screen distance will be taken from
the experiment definition. This column will be renamed to ``distance``. (default: None)

aoi_content_column: str | None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this PR I would leave these fields at this level, but they should probably be moved to the ResourceDefinition level before the next release. I need to think about how to integrate these best without crowding the ResourceDefinition with lots of fields.

Maybe we should create classes like StimulusResourceDefinition, SamplesResourceDefinition, EventsResourceDefinition, LabelsResourceDefinition. These would then inherit from ResourceDefinition but include the more specific fields associated with these content types.

Alternatively, #1270 paved the way for having load_kwargs in the DatasetDefinition. Maybe we could use these.

I'll need to think about this a bit more and write up an issue.


def from_file(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably like to deprecate this function instead of removing it to prevent a breaking change.
But I'll do this in a separate PR after merging this. For your just revert the removal and reuse TextStimulus.from_file() in the old function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants