feat: add `load_text_stimuli()` to `Dataset` #1267

saphjra · 2025-08-21T09:20:26Z

Description

Implemented loading text stimuli in the Dataset.load() method.

Implemented changes

Additions:

property Dataset.stimuli
method Dataset.load_text_stimuli(), called in Dataset.load()
method dataset_files.load_text_stimuli_files() which takes the dataset definition, fileinfo and path as arguments
method dataset_files.load_text_stimuli_file(), called in dataset_files.load_text_stimuli_files() that uses the TextStimulus.from_file() (which has been made static by us)
content type 'stimuli' to the DatasetDefinition.filename_format

How Has This Been Tested?

All the previously implemented tests passed

Added a dataset_type "ToyAOI", which includes a stimuli folder with files imported from tests/files/aoi_multipleye_stimuli_toy_x_1 (not generated in contrast to the other dataset_types)

We added a test for

test_stimuli_list_exists(), which runs for all types of toy datasets
test_stimuli_list_not_empty(), which checks our added toy dataset config "ToyAOI"
test_loaded_text_stimuli_list_correct(), which checks if the number of files loaded is correct (number of AOI files provided in the folder), if the content of the first 10 rows of the first AOI file provided match, and if the column number corresponds

Type of change

New functionality
Documentation update

Context

Partially resolves

add stimulus to Dataset #1233

related issues:

improve documentation #1210

Checklist:

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
Any dependent changes have been merged and published in downstream modules
I have checked my code and corrected any misspellings

Future work and comment on warnings

include more tests, for testing incompatible file types and throwing other errors
add support for custom stimuli_dirname argument in dataset_files.load_text_stimuli_files(), which is causing the warnings right now
only .csv AOI files can be loaded at the moment, but different separators and Excel files are common, too, so they need to be supported in the future
add a tutorial notebook using the new loading feature
right now only the TextStimulus is supported, ImageStimulus class needs to be modified significantly

…in Dataset

- fixed the test for correct text stimulus loading - changed mock_toy function and added a ToyAOI value to the fixture

for more information, see https://pre-commit.ci

codecov · 2025-08-21T09:26:26Z

Codecov Report

❌ Patch coverage is 98.18182% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 99.97%. Comparing base (5734a24) to head (87f20cc).
⚠️ Report is 9 commits behind head on main.

Files with missing lines	Patch %	Lines
src/pymovements/dataset/dataset_files.py	95.45%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##              main    #1267      +/-   ##
===========================================
- Coverage   100.00%   99.97%   -0.03%     
===========================================
  Files          104      104              
  Lines         4512     4554      +42     
  Branches       783      788       +5     
===========================================
+ Hits          4512     4553      +41     
- Partials         0        1       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

shorten example aoi file

deleted unnecessary aoi file

…com/aeye-lab/pymovements into feature/add-load-stimuli-in-dataset

for more information, see https://pre-commit.ci

SiQube

really cool feature, thank you for working on it! some minor feedback (already)

SiQube · 2025-08-21T17:36:13Z

tests/unit/dataset/dataset_test.py

+@pytest.mark.parametrize(
+    'expected',
+    [
+        EXPECTED_AOI_MULTIPLEYE_STIMULI_TOY_X_1_TEXT_1_1,


please add an additional test using the existing aoi file in tests/files

SiQube · 2025-08-21T17:36:43Z

tests/unit/dataset/dataset_test.py

+def test_stimuli_list_exists(gaze_dataset_configuration):
+    dataset = Dataset(**gaze_dataset_configuration['init_kwargs'])
+
+    assert isinstance(dataset.stimuli, list)


this could also be an empty list

dkrako

This is already looking really great! Apart from adding some tests for the existing AOI files, I would probably revert the hard removal of the from_file() to avoid a breaking change for now. I deprecate this in a follow-up PR.

dkrako · 2025-08-27T07:28:02Z

src/pymovements/dataset/dataset.py

@@ -71,6 +71,7 @@ def __init__(
        self.events: list[Events] = []
        self.precomputed_events: list[PrecomputedEventDataFrame] = []
        self.precomputed_reading_measures: list[ReadingMeasures] = []
+        self.stimuli: list[TextStimulus] = []


I think this is fine for this PR, but we should probably create a more general Stimulus base class. Moreover, we should think about Stimulus collections, as an individual Stimulus may be mapped to some trials in the experiment. I will create an issue regarding these.

dkrako · 2025-08-27T07:29:42Z

src/pymovements/dataset/dataset.py

@@ -131,6 +133,8 @@ def load(
            :py:meth:`pymovements.Dataset.path`.
            This argument is used only for this single call and does not alter
            :py:meth:`pymovements.Dataset.preprocessed_rootpath`. (default: None)
+        stimuli_dirname: str | None
+            :py:meth:`pymovements.Dataset.stimuli_rootpath`. (default: None)


probably missed to paste

One-time usage of an alternative directory name to save data relative to :py:meth:`pymovements.Dataset.path`. This argument is used only for this single call and does not alter

dkrako · 2025-08-27T07:31:17Z

src/pymovements/dataset/dataset.py

+        if self.definition.resources.has_content('stimuli'):
+            self.load_text_stimuli()
+            # stimuli_dirname=stimuli_dirname, # TODO custom dir name
+            # extension=extension,


passing the extension won't be necessary, as it's more related to the gaze files.

dkrako · 2025-08-27T07:38:01Z

src/pymovements/dataset/dataset_definition.py

@@ -203,6 +203,28 @@ class DatasetDefinition:
        transformations. If not specified, the constant eye-to-screen distance will be taken from
        the experiment definition. This column will be renamed to ``distance``. (default: None)

+    aoi_content_column: str | None


For this PR I would leave these fields at this level, but they should probably be moved to the ResourceDefinition level before the next release. I need to think about how to integrate these best without crowding the ResourceDefinition with lots of fields.

Maybe we should create classes like StimulusResourceDefinition, SamplesResourceDefinition, EventsResourceDefinition, LabelsResourceDefinition. These would then inherit from ResourceDefinition but include the more specific fields associated with these content types.

Alternatively, #1270 paved the way for having load_kwargs in the DatasetDefinition. Maybe we could use these.

I'll need to think about this a bit more and write up an issue.

dkrako · 2025-08-27T07:48:00Z

src/pymovements/stimulus/text.py


-def from_file(


I would probably like to deprecate this function instead of removing it to prevent a breaking change.
But I'll do this in a separate PR after merging this. For your just revert the removal and reuse TextStimulus.from_file() in the old function.

MirceaMM and others added 8 commits August 19, 2025 12:08

Add multipleye toy text aoi files in test files folder

a7f5207

add tests for stimuli path loading and make some preliminary changes …

88a99a1

…in Dataset

started adding the text stimulus to the dataset class

7ce61d9

- made from_file a static method of TextStimulus

aa733da

- fixed the test for correct text stimulus loading - changed mock_toy function and added a ToyAOI value to the fixture

fix broken docstrings

498c309

[pre-commit.ci] auto fixes from pre-commit.com hooks

8a791cb

for more information, see https://pre-commit.ci

Merge branch 'main' into feature/add-load-stimuli-in-dataset

8ce7fa0

fix doc type hint and remove separate from_file entry from documentation

7c16885

saphjra marked this pull request as ready for review August 21, 2025 09:58

saphjra requested review from dkrako, SiQube and prassepaul as code owners August 21, 2025 09:58

saphjra changed the title ~~Feature/add load stimuli in dataset~~ Feature/add load text stimuli in dataset Aug 21, 2025

saphjra and others added 11 commits August 21, 2025 14:19

Update toy_text_3_8_aoi.csv

1bc511c

shorten example aoi file

Delete tests/files/aoi_multipleye_stimuli_toy_x_1/toy_text_4_4_aoi.csv

c99ebb1

deleted unnecessary aoi file

deleted rows in toy_text_2_5_aoi.csv

5b4ff84

Update length of aois_list for test_loaded_text_stimuli_list_correct

2267c8c

Merge branch 'feature/add-load-stimuli-in-dataset' of https://github.…

665da50

…com/aeye-lab/pymovements into feature/add-load-stimuli-in-dataset

deleted unnecessary rows in toy_text_1_1_aoi.csv

ca5d4f2

write tests for missing column names ValueError

04e4e07

Merge branch 'feature/add-load-stimuli-in-dataset' of https://github.…

6d38c9f

…com/aeye-lab/pymovements into feature/add-load-stimuli-in-dataset

[pre-commit.ci] auto fixes from pre-commit.com hooks

59c6d4d

for more information, see https://pre-commit.ci

updated missing column value error test

4976c51

[pre-commit.ci] auto fixes from pre-commit.com hooks

4ba0d0b

for more information, see https://pre-commit.ci

dkrako changed the title ~~Feature/add load text stimuli in dataset~~ add: add load_text_stimuli() to Dataset Aug 21, 2025

dkrako changed the title ~~add: add load_text_stimuli() to Dataset~~ feat: add load_text_stimuli() to Dataset Aug 21, 2025

updated missing column value error test, fixed missing bracket

8f6602f

github-actions bot added the enhancement New feature or request label Aug 21, 2025

deleted the todos

87f20cc

SiQube requested changes Aug 21, 2025

View reviewed changes

dkrako requested changes Aug 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add `load_text_stimuli()` to `Dataset` #1267

feat: add `load_text_stimuli()` to `Dataset` #1267

Uh oh!

saphjra commented Aug 21, 2025 •

edited by stremoka

Loading

Uh oh!

codecov bot commented Aug 21, 2025 •

edited

Loading

Uh oh!

SiQube left a comment •

edited

Loading

Uh oh!

SiQube Aug 21, 2025

Uh oh!

SiQube Aug 21, 2025

Uh oh!

dkrako left a comment

Uh oh!

dkrako Aug 27, 2025

Uh oh!

dkrako Aug 27, 2025

Uh oh!

dkrako Aug 27, 2025

Uh oh!

dkrako Aug 27, 2025

Uh oh!

dkrako Aug 27, 2025

Uh oh!

Uh oh!

feat: add load_text_stimuli() to Dataset #1267

Are you sure you want to change the base?

feat: add load_text_stimuli() to Dataset #1267

Uh oh!

Conversation

saphjra commented Aug 21, 2025 • edited by stremoka Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Implemented changes

How Has This Been Tested?

Type of change

Context

related issues:

Checklist:

Future work and comment on warnings

Uh oh!

codecov bot commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

SiQube left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SiQube Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

SiQube Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

dkrako left a comment

Choose a reason for hiding this comment

Uh oh!

dkrako Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

dkrako Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

dkrako Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

dkrako Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

dkrako Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

feat: add `load_text_stimuli()` to `Dataset` #1267

feat: add `load_text_stimuli()` to `Dataset` #1267

saphjra commented Aug 21, 2025 •

edited by stremoka

Loading

codecov bot commented Aug 21, 2025 •

edited

Loading

SiQube left a comment •

edited

Loading