speed-up read.snapshot #101

orichters · 2024-10-08T07:35:30Z

using data.table::fread

> t <- Sys.time(); d <- as.quitte("/p/projects/remind/users/oliverr/data/NGFS5-S14_2024-07-18-snapshot_R5.csv"); print(Sys.time() - t)
Time difference of 1.512139 mins
> t <- Sys.time(); d <- read.snapshot("/p/projects/remind/users/oliverr/data/NGFS5-S14_2024-07-18-snapshot_R5.csv"); print(Sys.time() - t)
Time difference of 1.474763 secs

laurinks

Thanks a lot, Oliver! Time improvements look very promising. I have made no separate tests, but will be happy to check performance once it is merged.

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q

You do not need to include a new dependency for that.

t <- Sys.time(); d <- read.quitte("/p/projects/remind/users/oliverr/data/NGFS5-S14_2024-07-18-snapshot_R5.csv", check.duplicates = FALSE, drop.na = TRUE); print(Sys.time() - t)
Time difference of 2.771989 secs

orichters · 2024-10-08T08:56:14Z

I wasn't aware that the duplicate check was so time-consuming. Still, reading larger snapshots, this setup seems to be much faster

> t <- Sys.time(); d <- read.quitte("/p/projects/piam/scenariomip/scenario_explorer/data/scenarios_scenariomip_2024-10-02.csv", check.duplicates=F); print(Sys.time() - t)                   
|==================================================================| 100% 437 MB
Time difference of 1.458424 mins
> t <- Sys.time(); d <- read.snapshot("/p/projects/piam/scenariomip/scenario_explorer/data/scenarios_scenariomip_2024-10-02.csv"); print(Sys.time() - t)
Time difference of 10.31954 secs

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q · 2024-10-08T09:33:49Z

> devtools::load_all()
ℹ Loading quitte
> f <- '/p/projects/piam/scenariomip/scenario_explorer/data/scenarios_scenariomip_2024-10-02.csv'
> bench::mark(
+ `read.snapshot` = { read.snapshot(f); TRUE },
+ `read.quitte`   = { read.quitte(f, check.duplicates = FALSE, drop.na = TRUE); TRUE })
# A tibble: 2 × 13
  expression      min median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time
  <bch:expr>    <bch> <bch:>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm>
1 read.snapshot 11.3s  11.3s    0.0885      12GB    0.708     1     8      11.3s
2 read.quitte   23.6s  23.6s    0.0424    17.2GB    0.170     1     4      23.6s
# ℹ 4 more variables: result <list>, memory <list>, time <list>, gc <list>
Warning message:
Some expressions had a GC in every iteration; so filtering is disabled.

Pretty much a constant factor (if you use all the relevant arguments).
And I would expect these files to be read once and then cached.

orichters · 2024-10-08T14:47:03Z

Ok, hope that is sufficiently fast for the scenario mip people. Thanks for your intervention.

speed-up read.snapshot

96c23ef

orichters requested a review from laurinks October 8, 2024 07:35

laurinks approved these changes Oct 8, 2024

View reviewed changes

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q requested changes Oct 8, 2024

View reviewed changes

undo everything, add check.duplicates = FALSE to read.snapshot

c94bb00

orichters requested a review from 0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q October 8, 2024 14:42

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q approved these changes Oct 8, 2024

View reviewed changes

orichters merged commit cffc818 into pik-piam:master Oct 8, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

speed-up read.snapshot #101

speed-up read.snapshot #101

orichters commented Oct 8, 2024

laurinks left a comment

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q left a comment •

edited

Loading

orichters commented Oct 8, 2024

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented Oct 8, 2024 •

edited

Loading

orichters commented Oct 8, 2024

speed-up read.snapshot #101

speed-up read.snapshot #101

Conversation

orichters commented Oct 8, 2024

laurinks left a comment

Choose a reason for hiding this comment

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q left a comment • edited Loading

Choose a reason for hiding this comment

orichters commented Oct 8, 2024

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented Oct 8, 2024 • edited Loading

orichters commented Oct 8, 2024

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q left a comment •

edited

Loading

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented Oct 8, 2024 •

edited

Loading