Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low_Memory Warning #129

Open
AnzorGozalishvili opened this issue Feb 22, 2020 · 0 comments
Open

Low_Memory Warning #129

AnzorGozalishvili opened this issue Feb 22, 2020 · 0 comments

Comments

@AnzorGozalishvili
Copy link

Brief Description

While reading a DataFrame there is one parameter which is called low_memory and it's set to True by default. It's function is to decide minimal data type that is required to fit values of each column which seems to be for memory optimization purposes. In order to detect correct data type we need to consider all values in a column which doesn't seem to be optimal for big DataFrame because of 2 reasons I guess: memory and data loading time. And my assumption is that Pandas is optimizing both. That's why this parameter is True by default. I didn't dig into the implementation of that optimized version, how it detects data types (maybe reading some chunk of DataFrame take the minimal requirement).
The problem is that sometimes it gives unexpected results. Once I spent one week of some heavy calculations on chunks of data with a hope that I could assemble it back using index which was definitely unique. But I didn't check one specific detail that index was 8digit at the beginning of data and it was becoming 16digits (it was takes from some db with different versions primary key). While reading chunks of data I was actually getting first 8digits from 16digit index since low_memory was set to True by default and didn't check all index values. Finally I ended up with the calculations with no hope to assemble back and merge to original data.
I told such a long and dramatic story because that low_memory option is very strange, nobody takes it seriously but it becomes very critical in some cases.
So, please consider that case and put some warnings about that in dovpanda.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant