Skip to content

Speed up backtest in Ensembles #653

Closed
1 task
alex-hse-repository opened this issue Apr 19, 2022 · 1 comment
Closed
1 task

Speed up backtest in Ensembles #653

alex-hse-repository opened this issue Apr 19, 2022 · 1 comment
Labels
enhancement New feature or request
Milestone

Comments

@alex-hse-repository
Copy link
Collaborator

alex-hse-repository commented Apr 19, 2022

🚀 Feature Request

Make the backtest time complexity O(n+m) instead of O(n*m), where n, m - number of folds for the backtest on fit and the actual backtest.

Motivation

Make our ensembles more efficient

Proposal

In BasePipeline:

  1. Add attribute:
    • fold_id: Optional[int] = None
  2. In method _run_fold:
    • Add parameter fold_id: int to the signature
    • Set attribute fold_id after this line
  3. In method backtest:
    • Pass fold_id to the _run_fold here

In StackingEnsemble and VotingEnsemble:

  1. Change the signature of _backtest_pipeline and make it static:
    • _backtest_pipeline(pipeline: BasePipeline, ts: TSDataset, n_folds: Union[int, List[FoldMask]])
  2. Move it to EnsembleMixin
  3. Improve the call of this method everywhere(pass self.n_folds)
  4. Add attribute:
    • forecasts: Optional[List[List["TSDataset"]]] = None

In VotingEnsemble:

  1. Override method backtest:
  • If self.weights="auto" and n_folds is integer:
    • Set self.forecasts using the _backtest_pipeline with n_folds = n_folds + self.n_folds
  • Call the backtest of the superclass with the same parameters
  • Set self.forecasts = None
  1. Change method fit:
  • Fit the pipelines here only if self.forecasts = None
  1. Change method _process_weights:
  • If self.forecasts != None it should get the nessesary forecasts from self.forecasts using the self.fold_id here
  1. Change method _forecast:
  • If self.forecasts != None it should get the necessary forecasts from self.forecasts using the self.fold_id here

In StackingEnsemble:

  1. Override method backtest:
  • If n_folds is integer:
    • Set self.forecasts using the _backtest_pipeline with n_folds = n_folds + self.n_folds
  • Call the backtest of the superclass with the same parameters
  • Set self.forecasts = None
  1. Change method fit:
  • If self.forecasts != None it should get the necessary forecasts from self.forecasts using the self.fold_id here
  • Fit the pipelines here only if self.forecasts = None
  1. Change method _forecast:
  • If self.forecasts != None it should get the nessesary forecasts from self.forecasts using the self.fold_id here

Notes:

  1. In all the places the comparison with None should be done using is or is not
  2. In all the places where self.forecasts = None leave the old logic

Test cases

No response

Alternatives

No response

Additional context

No response

Checklist

  • I discussed this issue with ETNA Team
@alex-hse-repository alex-hse-repository added the enhancement New feature or request label Apr 19, 2022
@alex-hse-repository alex-hse-repository added this to the Optimization milestone Jun 8, 2022
@alex-hse-repository
Copy link
Collaborator Author

Will be closed with adding caching in #655

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
Status: Done
Development

No branches or pull requests

1 participant