Speed up backtest in Ensembles #653

alex-hse-repository · 2022-04-19T07:12:33Z

🚀 Feature Request

Make the backtest time complexity O(n+m) instead of O(n*m), where n, m - number of folds for the backtest on fit and the actual backtest.

Motivation

Make our ensembles more efficient

Proposal

In BasePipeline:

Add attribute:
- fold_id: Optional[int] = None
In method _run_fold:
- Add parameter fold_id: int to the signature
- Set attribute fold_id after this line
In method backtest:
- Pass fold_id to the _run_fold here

In StackingEnsemble and VotingEnsemble:

Change the signature of _backtest_pipeline and make it static:
- _backtest_pipeline(pipeline: BasePipeline, ts: TSDataset, n_folds: Union[int, List[FoldMask]])
Move it to EnsembleMixin
Improve the call of this method everywhere(pass self.n_folds)
Add attribute:
- forecasts: Optional[List[List["TSDataset"]]] = None

In VotingEnsemble:

Override method backtest:

If self.weights="auto" and n_folds is integer:
- Set self.forecasts using the _backtest_pipeline with n_folds = n_folds + self.n_folds
Call the backtest of the superclass with the same parameters
Set self.forecasts = None

Change method fit:

Fit the pipelines here only if self.forecasts = None

Change method _process_weights:

If self.forecasts != None it should get the nessesary forecasts from self.forecasts using the self.fold_id here

Change method _forecast:

If self.forecasts != None it should get the necessary forecasts from self.forecasts using the self.fold_id here

In StackingEnsemble:

Override method backtest:

If n_folds is integer:
- Set self.forecasts using the _backtest_pipeline with n_folds = n_folds + self.n_folds
Call the backtest of the superclass with the same parameters
Set self.forecasts = None

Change method fit:

If self.forecasts != None it should get the necessary forecasts from self.forecasts using the self.fold_id here
Fit the pipelines here only if self.forecasts = None

Change method _forecast:

If self.forecasts != None it should get the nessesary forecasts from self.forecasts using the self.fold_id here

Notes:

In all the places the comparison with None should be done using is or is not
In all the places where self.forecasts = None leave the old logic

Test cases

No response

Alternatives

No response

Additional context

No response

Checklist

I discussed this issue with ETNA Team

The text was updated successfully, but these errors were encountered:

alex-hse-repository · 2022-08-17T09:45:50Z

Will be closed with adding caching in #655

alex-hse-repository added the enhancement New feature or request label Apr 19, 2022

alex-hse-repository added this to the Optimization milestone Jun 8, 2022

alex-hse-repository closed this as completed Aug 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up backtest in Ensembles #653

Speed up backtest in Ensembles #653

alex-hse-repository commented Apr 19, 2022 •

edited

Loading

alex-hse-repository commented Aug 17, 2022

Speed up backtest in Ensembles #653

Speed up backtest in Ensembles #653

Comments

alex-hse-repository commented Apr 19, 2022 • edited Loading

🚀 Feature Request

Motivation

Proposal

Test cases

Alternatives

Additional context

Checklist

alex-hse-repository commented Aug 17, 2022

alex-hse-repository commented Apr 19, 2022 •

edited

Loading