Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nbeatsx RuntimeError on Example #1143

Open
Songloading opened this issue Sep 5, 2024 · 5 comments · May be fixed by #1168
Open

Nbeatsx RuntimeError on Example #1143

Songloading opened this issue Sep 5, 2024 · 5 comments · May be fixed by #1168
Labels

Comments

@Songloading
Copy link

What happened + What you expected to happen

I was trying to run Nbeatsx example. Got error

RuntimeError: It looks like your LightningModule has parameters that were not used in producing the loss returned by training_step. If this is intentional, you must enable the detection of unused parameters in DDP, either by setting the string value `strategy='ddp_find_unused_parameters_true'` or by setting the flag in the strategy with `strategy=DDPStrategy(find_unused_parameters=True)`.

The trace back shows it was the line when Trainer tries to fit, i.e. loc 356 of neuralforecast/common/_base_model.py. However I am able to run Nhits with same settings successfully.

Versions / Dependencies

python==3.11.8
neuralforecast==1.7.4
pytorch_lightning==2.4.0
torch==2.4.0

Reproduction script

import pandas as pd
import matplotlib.pyplot as plt

from neuralforecast import NeuralForecast
from neuralforecast.models import NBEATSx
from neuralforecast.losses.pytorch import MQLoss
from neuralforecast.utils import AirPassengersPanel, AirPassengersStatic

Y_train_df = AirPassengersPanel[AirPassengersPanel.ds<AirPassengersPanel['ds'].values[-12]] # 132 train
Y_test_df = AirPassengersPanel[AirPassengersPanel.ds>=AirPassengersPanel['ds'].values[-12]].reset_index(drop=True) # 12 test

model = NBEATSx(h=12, input_size=24,
                loss=MQLoss(level=[80, 90]),
                scaler_type='robust',
                dropout_prob_theta=0.5,
                stat_exog_list=['airline1'],
                futr_exog_list=['trend'],
                max_steps=200,
                val_check_steps=10,
                early_stop_patience_steps=2)

nf = NeuralForecast(
    models=[model],
    freq='M'
)
nf.fit(df=Y_train_df, static_df=AirPassengersStatic, val_size=12)
Y_hat_df = nf.predict(futr_df=Y_test_df)

# Plot quantile predictions
Y_hat_df = Y_hat_df.reset_index(drop=False).drop(columns=['unique_id','ds'])
plot_df = pd.concat([Y_test_df, Y_hat_df], axis=1)
plot_df = pd.concat([Y_train_df, plot_df])

plot_df = plot_df[plot_df.unique_id=='Airline1'].drop('unique_id', axis=1)
plt.plot(plot_df['ds'], plot_df['y'], c='black', label='True')
plt.plot(plot_df['ds'], plot_df['NBEATSx-median'], c='blue', label='median')
plt.fill_between(x=plot_df['ds'][-12:], 
                 y1=plot_df['NBEATSx-lo-90'][-12:].values, 
                 y2=plot_df['NBEATSx-hi-90'][-12:].values,
                 alpha=0.4, label='level 90')
plt.legend()
plt.grid()
plt.plot()

Issue Severity

High: It blocks me from completing my task.

@Songloading Songloading added the bug label Sep 5, 2024
@jmoralez
Copy link
Member

jmoralez commented Sep 5, 2024

Linking #1110. The fix should be very similar.

@elephaint elephaint linked a pull request Sep 24, 2024 that will close this issue
2 tasks
@elephaint
Copy link
Contributor

Running the example code above on #1023 doesn't give an error, so I'll assume that merging that PR fixes this issue.

@jmoralez
Copy link
Member

The error is produced when using ddp on multiple gpus.

@elephaint
Copy link
Contributor

elephaint commented Sep 25, 2024

The error is produced when using ddp on multiple gpus.

I'll not let you know how I tested this (as it's really shabby), but I didn't get an error (it involves emulating multiple GPUs on a single GPU....) 😬

@jmoralez
Copy link
Member

I'm able to reproduce the issue with two gpus and I see this with the env variables described in #1110

[rank0]:[I926 16:30:05.924442400 reducer.cpp:1989] [Rank 0] Parameter: out.bias did not get gradient in backwards pass.
[rank0]:[I926 16:30:05.924452500 reducer.cpp:1989] [Rank 0] Parameter: out.weight did not get gradient in backwards pass.

It doesn't seem like this layer is being used at all in the network:

self.out = nn.Linear(
in_features=h, out_features=h * self.loss.outputsize_multiplier
)

@elephaint elephaint removed a link to a pull request Sep 30, 2024
2 tasks
@elephaint elephaint linked a pull request Sep 30, 2024 that will close this issue
@elephaint elephaint linked a pull request Sep 30, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants