Skip to content
This repository has been archived by the owner on Jul 27, 2023. It is now read-only.

⚠️ Data Leakage: Must not use test data when fitting MinMaxScaler() #126

Open
shure-dev opened this issue Apr 19, 2023 · 2 comments
Open

Comments

@shure-dev
Copy link

Probably, I found a serious error.

If I'm correct, we cannot use any information from test data when preprocessing data.

However, your code applied fit_transform() to train and test data.

This means train data can contain information from test data and effects accuracy.

Please correct me if my idea is wrong, thank you.

@shure-dev shure-dev changed the title Leakage: Must not use test data when fitting MinMaxScaler() Data Leakage: Must not use test data when fitting MinMaxScaler() Apr 19, 2023
@shure-dev shure-dev changed the title Data Leakage: Must not use test data when fitting MinMaxScaler() [[Data Leakage]]: Must not use test data when fitting MinMaxScaler() Apr 19, 2023
@shure-dev shure-dev changed the title [[Data Leakage]]: Must not use test data when fitting MinMaxScaler() [[Data Leakage !!!]]: Must not use test data when fitting MinMaxScaler() Apr 19, 2023
@shure-dev shure-dev changed the title [[Data Leakage !!!]]: Must not use test data when fitting MinMaxScaler() Data Leakage !!!: Must not use test data when fitting MinMaxScaler() Apr 19, 2023
@shure-dev shure-dev changed the title Data Leakage !!!: Must not use test data when fitting MinMaxScaler() ⚠️ Data Leakage: Must not use test data when fitting MinMaxScaler() Apr 19, 2023
@shure-dev
Copy link
Author

This answer seems working well for this issue.

https://stackoverflow.com/questions/70923839/sklearn-preprocessing-with-a-rolling-window

@shure-dev
Copy link
Author

Probably, also we have to care about stationarity, when we treat time series data

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant