Skip to content
This repository has been archived by the owner on Aug 13, 2020. It is now read-only.

support memory = FALSE like in spark #5

Open
randomgambit opened this issue Sep 19, 2019 · 4 comments
Open

support memory = FALSE like in spark #5

randomgambit opened this issue Sep 19, 2019 · 4 comments

Comments

@randomgambit
Copy link

Hi,

Assuming that is (even technically) possible, it would be useful to have the data indexed (but not loaded yet in the RAM) like in sparklyr (see https://www.rdocumentation.org/packages/sparklyr/versions/1.0.2/topics/spark_read_parquet)

That would allow the user to load very large parquet files but pay only for what is actually used (similarly to what vroom does https://github.com/r-lib/vroom)

what do you think?
Thanks!

@hannes
Copy link
Owner

hannes commented Sep 20, 2019

Yes, I plan to implement ALTREP features also for the parquet reader similar to VROOM.

@hannes hannes closed this as completed Sep 20, 2019
@hannes hannes reopened this Sep 20, 2019
@randomgambit
Copy link
Author

great idea!! maybe you should work with Jim Hester (@jimhester, vroom author) to get a single package that handles csv + parquet super fast? that would be a killer package in my opinion! and more dev are needed to fix bugs and other inefficiencies. what do you think?

@hannes
Copy link
Owner

hannes commented Sep 23, 2019

Check out the altrep branch in this repo... for now, it materialises everything at once, but things like this should no longer read any unrelated payload data:

a <- miniparquet::read_parquet("...")
names(a)
mean(a$col)

@hannes
Copy link
Owner

hannes commented Sep 24, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants