Skip to content

POC: duplicated runs fold caching for backtest and stackingensemble via joblib.Memory #655

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

martins0n
Copy link
Contributor

IMPORTANT: Please do not create a Pull Request without creating an issue first.

Before submitting (must do checklist)

  • Did you read the contribution guide?
  • Did you update the docs? We use Numpy format for all the methods and classes.
  • Did you write any new necessary tests?
  • Did you update the CHANGELOG?

Type of Change

  • Examples / docs / tutorials / contributors update
  • Bug fix (non-breaking change which fixes an issue)
  • Improvement (non-breaking change which improves an existing feature)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Proposed Changes

Related Issue

Closing issues

@martins0n
Copy link
Contributor Author

from etna.datasets import TSDataset, generate_ar_df
from etna.pipeline import Pipeline
from etna.ensembles import StackingEnsemble
from etna.models import CatBoostModelMultiSegment, LinearPerSegmentModel, NaiveModel
from etna.transforms import StandardScalerTransform, LagTransform, SegmentEncoderTransform, TimeSeriesImputerTransform, DateFlagsTransform
from etna.metrics import SMAPE, MAE
from etna.analysis import plot_backtest, plot_forecast
from copy import deepcopy
import warnings


warnings.filterwarnings("ignore")

df = generate_ar_df(periods=500, start_time="2021-01-01", n_segments=20)

ts = TSDataset.to_dataset(df[["target", "segment", "timestamp"]])

ts = TSDataset(ts, freq="D")

pipe_naive_one = Pipeline(model=NaiveModel(7), horizon=7)
pipe_naive_two = Pipeline(model=NaiveModel(1), horizon=7)

stack_pipe = StackingEnsemble(
    pipelines=[
        pipe_naive_one, pipe_naive_two
    ], n_jobs=1, joblib_params=dict(verbose=0, backend="multiprocessing", mmap_mode="c"), n_folds=2
)

stack_pipe_copy = deepcopy(stack_pipe)
import time
import shelve


with shelve.open('counter') as db:
    db['counter'] = 0

start = time.monotonic()

metrics, _ , _ = stack_pipe.backtest(ts, n_jobs=1, n_folds=2, metrics=[MAE()], joblib_params=dict(verbose=0, backend="multiprocessing", mmap_mode="c"))

print(time.monotonic() - start)

with shelve.open('counter') as db:
    print(db['counter'])
    
print(metrics.MAE.mean())

Run with enabled cache ETNA_CACHE=1 python script.py

@github-actions
Copy link

🚀 Deployed on https://deploy-preview-655--etna-docs.netlify.app

@github-actions github-actions bot temporarily deployed to pull request April 20, 2022 15:47 Inactive
@martins0n martins0n marked this pull request as draft June 8, 2022 07:13
@martins0n martins0n changed the title Fold cache poc POC: duplicated runs fold caching for backtest and stackingensemble via joblib Jun 22, 2022
@martins0n martins0n changed the title POC: duplicated runs fold caching for backtest and stackingensemble via joblib POC: duplicated runs fold caching for backtest and stackingensemble via joblib.Memory Jun 22, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant