Nbeatsx RuntimeError on Example #1143

Songloading · 2024-09-05T16:16:15Z

What happened + What you expected to happen

I was trying to run Nbeatsx example. Got error

RuntimeError: It looks like your LightningModule has parameters that were not used in producing the loss returned by training_step. If this is intentional, you must enable the detection of unused parameters in DDP, either by setting the string value `strategy='ddp_find_unused_parameters_true'` or by setting the flag in the strategy with `strategy=DDPStrategy(find_unused_parameters=True)`.

The trace back shows it was the line when Trainer tries to fit, i.e. loc 356 of neuralforecast/common/_base_model.py. However I am able to run Nhits with same settings successfully.

Versions / Dependencies

python==3.11.8
neuralforecast==1.7.4
pytorch_lightning==2.4.0
torch==2.4.0

Reproduction script

import pandas as pd
import matplotlib.pyplot as plt

from neuralforecast import NeuralForecast
from neuralforecast.models import NBEATSx
from neuralforecast.losses.pytorch import MQLoss
from neuralforecast.utils import AirPassengersPanel, AirPassengersStatic

Y_train_df = AirPassengersPanel[AirPassengersPanel.ds<AirPassengersPanel['ds'].values[-12]] # 132 train
Y_test_df = AirPassengersPanel[AirPassengersPanel.ds>=AirPassengersPanel['ds'].values[-12]].reset_index(drop=True) # 12 test

model = NBEATSx(h=12, input_size=24,
                loss=MQLoss(level=[80, 90]),
                scaler_type='robust',
                dropout_prob_theta=0.5,
                stat_exog_list=['airline1'],
                futr_exog_list=['trend'],
                max_steps=200,
                val_check_steps=10,
                early_stop_patience_steps=2)

nf = NeuralForecast(
    models=[model],
    freq='M'
)
nf.fit(df=Y_train_df, static_df=AirPassengersStatic, val_size=12)
Y_hat_df = nf.predict(futr_df=Y_test_df)

# Plot quantile predictions
Y_hat_df = Y_hat_df.reset_index(drop=False).drop(columns=['unique_id','ds'])
plot_df = pd.concat([Y_test_df, Y_hat_df], axis=1)
plot_df = pd.concat([Y_train_df, plot_df])

plot_df = plot_df[plot_df.unique_id=='Airline1'].drop('unique_id', axis=1)
plt.plot(plot_df['ds'], plot_df['y'], c='black', label='True')
plt.plot(plot_df['ds'], plot_df['NBEATSx-median'], c='blue', label='median')
plt.fill_between(x=plot_df['ds'][-12:], 
                 y1=plot_df['NBEATSx-lo-90'][-12:].values, 
                 y2=plot_df['NBEATSx-hi-90'][-12:].values,
                 alpha=0.4, label='level 90')
plt.legend()
plt.grid()
plt.plot()

Issue Severity

High: It blocks me from completing my task.

The text was updated successfully, but these errors were encountered:

jmoralez · 2024-09-05T17:52:06Z

Linking #1110. The fix should be very similar.

elephaint · 2024-09-24T20:49:01Z

Running the example code above on #1023 doesn't give an error, so I'll assume that merging that PR fixes this issue.

jmoralez · 2024-09-24T20:53:35Z

The error is produced when using ddp on multiple gpus.

elephaint · 2024-09-25T18:50:42Z

The error is produced when using ddp on multiple gpus.

I'll not let you know how I tested this (as it's really shabby), but I didn't get an error (it involves emulating multiple GPUs on a single GPU....) 😬

jmoralez · 2024-09-26T16:34:20Z

I'm able to reproduce the issue with two gpus and I see this with the env variables described in #1110

[rank0]:[I926 16:30:05.924442400 reducer.cpp:1989] [Rank 0] Parameter: out.bias did not get gradient in backwards pass.
[rank0]:[I926 16:30:05.924452500 reducer.cpp:1989] [Rank 0] Parameter: out.weight did not get gradient in backwards pass.

It doesn't seem like this layer is being used at all in the network:

neuralforecast/neuralforecast/models/nbeatsx.py

Lines 432 to 434 in b9e1c8e

    
           self.out = nn.Linear( 
        
               in_features=h, out_features=h * self.loss.outputsize_multiplier 
        
           )

Songloading added the bug label Sep 5, 2024

elephaint linked a pull request Sep 24, 2024 that will close this issue

[FIX] Code refactoring #1023

Open

2 tasks

elephaint removed a link to a pull request Sep 30, 2024

[FIX] Code refactoring #1023

Open

2 tasks

elephaint linked a pull request Sep 30, 2024 that will close this issue

[FIX] Remove unused output layer NBEATSx #1168

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nbeatsx RuntimeError on Example #1143

Nbeatsx RuntimeError on Example #1143

Songloading commented Sep 5, 2024

jmoralez commented Sep 5, 2024

elephaint commented Sep 24, 2024

jmoralez commented Sep 24, 2024

elephaint commented Sep 25, 2024 •

edited

Loading

jmoralez commented Sep 26, 2024

Nbeatsx RuntimeError on Example #1143

Nbeatsx RuntimeError on Example #1143

Comments

Songloading commented Sep 5, 2024

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Issue Severity

jmoralez commented Sep 5, 2024

elephaint commented Sep 24, 2024

jmoralez commented Sep 24, 2024

elephaint commented Sep 25, 2024 • edited Loading

jmoralez commented Sep 26, 2024

elephaint commented Sep 25, 2024 •

edited

Loading