Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MRG: Update repository structure #21

Merged
merged 13 commits into from
Apr 29, 2022
Merged

MRG: Update repository structure #21

merged 13 commits into from
Apr 29, 2022

Conversation

mscheltienne
Copy link
Member

As per #19, the structure could be something like that.

I chose label as name for the entry-point: adding one for the entire package (mne_icalabel) and one for each model (submodule, only iclabel for now). If you have a better name in mind, please comment 😉
That would bring the public API to:

from mne_icalabel import label
label(raw, ica, method='iclabel')

And

from mne_icalabel.iclabel import label
label(raw, ica)

I added an underscore to all private functions. I chose to keep the feature extraction get_features, the network ICLabelNet, the forward pass run_iclabel public on top of the entry-point label.

WDYT?

@mscheltienne mscheltienne mentioned this pull request Apr 28, 2022
@codecov-commenter
Copy link

codecov-commenter commented Apr 28, 2022

Codecov Report

Merging #21 (5efbd0c) into main (7bf7956) will decrease coverage by 0.49%.
The diff coverage is 93.15%.

@@            Coverage Diff             @@
##             main      #21      +/-   ##
==========================================
- Coverage   97.09%   96.60%   -0.50%     
==========================================
  Files           4        7       +3     
  Lines         379      412      +33     
==========================================
+ Hits          368      398      +30     
- Misses         11       14       +3     
Impacted Files Coverage Δ
mne_icalabel/iclabel/features.py 96.46% <87.09%> (ø)
mne_icalabel/iclabel/utils.py 92.39% <87.50%> (ø)
mne_icalabel/__init__.py 100.00% <100.00%> (ø)
mne_icalabel/iclabel/__init__.py 100.00% <100.00%> (ø)
mne_icalabel/iclabel/label_components.py 100.00% <100.00%> (ø)
mne_icalabel/iclabel/network.py 100.00% <100.00%> (ø)
mne_icalabel/label_components.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7bf7956...5efbd0c. Read the comment docs.

@adam2392
Copy link
Member

adam2392 commented Apr 28, 2022

Thanks for getting to this right away!

I feel like label might be too generic and not explicit enough. Perhaps label_ic? WDTY?

@mscheltienne
Copy link
Member Author

How about label_components since MNE has the methods get_components and plot_components for ICA?

@adam2392
Copy link
Member

How about label_components since MNE has the methods get_components and plot_components for ICA?

I like it!

@mscheltienne
Copy link
Member Author

Done, and I added the simplest of test cases to run label_components on the MNE raw sample dataset.. and it's already failing! 😅

@adam2392
Copy link
Member

adam2392 commented Apr 28, 2022

make isort make black should do the auto-formatting now with #22

make run-checks runs all the checks we want (e.g. pydocstyle, isort, flake8, black, check-manifest, and mypy).

Hopefully this tightens up the process and makes life easier. I think I made the MANIFEST accurately reflect what was happening in this PR too, but if not, I can fix it later too.

The docs folder was added, but since this is on Jacob's gh, we'll just wait till we migrate the repo to mnetools to setup the actual CI to build the docs.

@mscheltienne
Copy link
Member Author

For the error, it's because autocorrelation ends with a resampling:

# resample to 1 second at 100 samples/sec
resamp = resample_poly(ac.T, 100, np.round(raw.info["sfreq"])).T

The sample dataset sampling rate is 600.614990234375 Hz. It was complaining that the up-scaling and the down-scaling factors must be integers, so I rounded raw.info['sfreq']. But that's not the correct fix as now, instead of resampling to 100 samples, it resampled to 99.. thus failing the reshaping at a later stage.

I am not sure how to correctly fix this at the moment, I'll think about it.

@adam2392
Copy link
Member

For the error, it's because autocorrelation ends with a resampling:

# resample to 1 second at 100 samples/sec
resamp = resample_poly(ac.T, 100, np.round(raw.info["sfreq"])).T

The sample dataset sampling rate is 600.614990234375 Hz. It was complaining that the up-scaling and the down-scaling factors must be integers, so I rounded raw.info['sfreq']. But that's not the correct fix as now, instead of resampling to 100 samples, it resampled to 99.. thus failing the reshaping at a later stage.

I am not sure how to correctly fix this at the moment, I'll think about it.

I think (pretty sure) this is an "extreme" edge case because the imperfect sampling rate usually arises due to some machine precision error. E.g. in this case the actual sampling rate is actually 600 Hz.

I'm leaning towards: The correct fix would be for the user to pass it in correctly. So the info['sfreq'] is checked when the user calls label_ic, and a custom error message is raised explaining the issue for ICLabel. Then, in the test, we could test two cases:

  1. the error is raised
  2. the error is not raised and things "work" when you pass in np.floor(raw.info['sfreq'])

WDYT?

@mscheltienne
Copy link
Member Author

mscheltienne commented Apr 29, 2022

I agree that some error checking is necessary with some warning:

  • Is the reference a common average? (we need more info than 'custom_ref_applied' provide, there is an issue somewhere on the main repo about storing the reference for EEG dataset)
  • Is the sampling frequency an integer? (not sure, but I think a float sfreq would crash in EEGLAB)
  • Is the dataset BP between 0 and 100 Hz?

and some raises:

  • Do we have the same channel in the ICA and in the instance? (ideally, you need to provide the instance used for fit)

We'll discuss those bullet points later, but for the frequency, I don't think we should force it to be integer because if a user has a dataset like the MNE sample, he can not easily change the sampling frequency. The field .info['sfreq'] is among the lock fields that should not be tampered with and raises if you try to directly change it. Resampling doesn't seem like the way to go either as it will alter the data.. for no real reason.


Instead, I would simply figure out if we have to floor or ceil the sampling frequency in the autocorrelation function to get the 100 samples. I am not (yet) convinced floor for every case is the correct method.

@mscheltienne
Copy link
Member Author

mscheltienne commented Apr 29, 2022

I think that fixes it, but it's not super clean.

# the resampling must output an array of shape (components, 101), thus
# respecting '100 < ac.T.shape[0] * 100 / down <= 101'.
down = int(raw.info['sfreq'])
if 101 < ac.shape[1] * 100 / down:
    down += 1
elif ac.shape[1] * 100 / down <= 100:
    down -= 1
resamp = resample_poly(ac.T, 100, down).T

EDIT: Removed the elif statement, because the conversion to int floors, so we should never have to lower further the sampling rate by 1. The only case that could occur is that the floor lowered too much the value, below the criteria.

@mscheltienne mscheltienne changed the title Update repository structure MRG: Update repository structure Apr 29, 2022
@adam2392
Copy link
Member

adam2392 commented Apr 29, 2022

I agree that some error checking is necessary with some warning:

Agreed w/ all your warning proposals. Shall we merge this and go for it in another PR?

Do we have the same channel in the ICA and in the instance? (ideally, you need to provide the instance used for fit)

I wonder if someone just passes in instance, that we should run ICA for them by default? I.e. make passing in the ICA instance optional.

We'll discuss those bullet points later, but for the frequency, I don't think we should force it to be integer because if a user has a dataset like the MNE sample, he can not easily change the sampling frequency. The field .info['sfreq'] is among the lock fields that should not be tampered with and raises if you try to directly change it. Resampling doesn't seem like the way to go either as it will alter the data.. for no real reason.

Perhaps we have an optional argument sfreq=None, which is used to override the sampling rate. I think intelligently flooring/ceiling it is not possible since one could easily have 127.9584 or 128.12040 for example sampling rate, which actually corresponds to 128 Hz. WDYT of this soln? Con is adding an additional kwarg, but I don't see an easy way around w/o too much hacking.

@@ -7,7 +7,7 @@
from numpy.typing import NDArray
from scipy.signal import resample_poly

from .utils import _next_power_of_2, gdatav4, mne_to_eeglab_locs, pol2cart
from .utils import _gdatav4, _mne_to_eeglab_locs, _next_power_of_2, _pol2cart


def get_features(inst: Union[BaseRaw, BaseEpochs], ica: ICA):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking maybe we even rename this to get_iclabel_features, since we'll presumably have other models which also have feature engineering which would have get_<model_X>_features.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, good idea.

Comment on lines +20 to +21
@pytest.mark.filterwarnings("ignore::RuntimeWarning")
@pytest.mark.filterwarnings("ignore::FutureWarning")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will clean this up in the next PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much appreciated!

@adam2392
Copy link
Member

Going to merge this for the sake of working on the docs to address Alexandre Gramfort's issues.

We can continue discussing some of the points I raised here and then handle in a downstream PR.

@adam2392 adam2392 merged commit 32edd7d into mne-tools:main Apr 29, 2022
@mscheltienne
Copy link
Member Author

For the warnings, yes I will add those later on.
For the optional ICA, sounds good to me! But we will have to document which settings we use by default.

For the sampling rate, I think intelligent flooring/ceiling is possible, but your example is a very good point and I suspect it will fail with the current code. I am not that convince by the additional argument which complicates needlessly the API.
I will add a test case for the resampling alone, and hopefully a solution that will convince both of us that correct flooring/ceiling is possible :)

@mscheltienne mscheltienne deleted the structure branch April 29, 2022 15:41
@adam2392
Copy link
Member

adam2392 commented Apr 29, 2022

Sounds good! FYI the repo is moved to my GitHub for now so I can work on the CI and docs before doing another migration to mne.tools :p

You'll have to reset your remote URL

git remote remove <current one>
then
git add remote <new one>

Also renamed the repository to mne-icalabel to fit the naming scheme in mne.tools.

FYI: @anandsaini024 @mscheltienne @jacobf18 ^ will need to reset your GitHub config. Sorry but will need a bit of these book-keeping until we migrate officially to mne org.

@mscheltienne
Copy link
Member Author

Perfect, don't worry about the move(s) ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants