Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

collector: always consider all monomorphic functions to be 'mentioned' #122862

Closed
wants to merge 2 commits into from

Conversation

RalfJung
Copy link
Member

@RalfJung RalfJung commented Mar 22, 2024

This would fix #122814. But it's probably not going to be cheap...

Ideally we'd avoid building the optimized MIR for these new roots, and only request mir_drops_elaborated_and_const_checked -- but that MIR is often getting stolen so I don't see a way to do that. (Zulip)

r? @oli-obk @tmiasko

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 22, 2024
@RalfJung
Copy link
Member Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 22, 2024
@bors
Copy link
Contributor

bors commented Mar 22, 2024

⌛ Trying commit 59803ef with merge d0df954...

bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 22, 2024
collector: always consider all monomorphic functions to be 'mentioned'

This would fix rust-lang#122814. But it's probably not going to be cheap...

Ideally we'd avoid building the optimized MIR for these new roots, and only request `mir_drops_elaborated_and_const_checked` -- but that MIR is often getting stolen so I don't see a way to do that.

TODO before landing:
- [ ] Figure out if there is a testcase [here](rust-lang#122814 (comment)).

r? `@oli-obk` `@tmiasko`
@bors
Copy link
Contributor

bors commented Mar 22, 2024

☀️ Try build successful - checks-actions
Build commit: d0df954 (d0df954d8bedc6b4baa80485170b02fda0e0042f)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (d0df954): comparison URL.

Overall result: ❌ regressions - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.1% [0.2%, 4.0%] 66
Regressions ❌
(secondary)
1.3% [0.3%, 4.3%] 23
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 1.1% [0.2%, 4.0%] 66

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.9% [1.9%, 1.9%] 1
Regressions ❌
(secondary)
4.8% [2.9%, 8.4%] 14
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 1.9% [1.9%, 1.9%] 1

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
3.0% [1.5%, 7.2%] 16
Regressions ❌
(secondary)
2.2% [1.4%, 2.7%] 4
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-2.2% [-2.2%, -2.2%] 1
All ❌✅ (primary) 3.0% [1.5%, 7.2%] 16

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 667.777s -> 669.759s (0.30%)
Artifact size: 315.07 MiB -> 315.10 MiB (0.01%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Mar 22, 2024
@RalfJung
Copy link
Member Author

RalfJung commented Mar 22, 2024

It's again mostly incr builds which are affected -- I guess that makes sense as then the collector represents a larger fraction of the total rustc execution time than for full builds.

This seems to affect different benchmarks than #122568.

Would be interesting to figure out where the extra time is spent; this time it can't be metadata (de)serialization. I wonder if skipping MIR opts would help or if the actual cost is elsewhere. @saethlin I think you had a working setup for getting cachegrind diffs?

@RalfJung
Copy link
Member Author

Oli made a suggestion for how to about the MIR opts here; that should probably be the next step. I have to put this on hold for now though.

@RalfJung RalfJung added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 22, 2024
@Kobzol
Copy link
Contributor

Kobzol commented Mar 22, 2024

FWIW, it you click on a row with a specific benchmark result, it will show you a command that you can copy paste to get a cachegrind diff.

@RalfJung
Copy link
Member Author

RalfJung commented Mar 22, 2024 via email

@saethlin
Copy link
Member

Here's the top of the most-regressed primary benchmark (unicode-normalization debug incr-unchanged)

35,048,416  PROGRAM TOTALS

--------------------------------------------------------------------------------
Ir          file:function
--------------------------------------------------------------------------------
-3,749,950  ???:<hashbrown::raw::RawTable<(rustc_ast::node_id::NodeId, rustc_hir::hir_id::ItemLocalId)>>::reserve_rehash::<hashbrown::map::make_hasher<rustc_ast::node_id::NodeId, rustc_hir::hir_id::ItemLocalId, core::hash::BuildHasherDefault<rustc_hash::FxHasher>>::{closure#0}>
 3,749,950  ???:<hashbrown::raw::RawTable<(rustc_span::def_id::LocalDefId, rustc_hir::hir_id::ItemLocalId)>>::reserve_rehash::<hashbrown::map::make_hasher<rustc_span::def_id::LocalDefId, rustc_hir::hir_id::ItemLocalId, core::hash::BuildHasherDefault<rustc_hash::FxHasher>>::{closure#0}>
 2,790,508  ???:<rustc_query_system::dep_graph::graph::DepGraphData<rustc_middle::dep_graph::DepsType>>::try_mark_previous_green::<rustc_query_impl::plumbing::QueryCtxt>
 2,640,064  ???:<rustc_metadata::rmeta::decoder::DecodeContext as rustc_span::SpanDecoder>::decode_span
 1,781,794  <all-jemalloc-files>:<all-jemalloc-functions>
 1,627,701  ???:<rustc_middle::mir::BasicBlockData as rustc_data_structures::stable_hasher::HashStable<rustc_query_system::ich::hcx::StableHashingContext>>::hash_stable
 1,575,493  ???:<rustc_middle::mir::interpret::AllocId as rustc_data_structures::stable_hasher::HashStable<rustc_query_system::ich::hcx::StableHashingContext>>::hash_stable::{closure#0}
-1,515,559  ???:<rustc_middle::mir::interpret::AllocId as rustc_data_structures::stable_hasher::HashStable<rustc_query_system::ich::hcx::StableHashingContext>>::hash_stable
 1,248,878  ???:<rustc_span::caching_source_map_view::CachingSourceMapView>::span_data_to_lines_and_cols
 1,148,391  ???:<rustc_middle::mir::Body as rustc_data_structures::stable_hasher::HashStable<rustc_query_system::ich::hcx::StableHashingContext>>::hash_stable
 1,129,590  ???:<rustc_data_structures::sip128::SipHasher128>::short_write_process_buffer::<8>
   971,146  ???:<rustc_middle::ty::context::CtxtInterners>::intern_ty
   920,975  ???:<rustc_middle::mir::Body as rustc_serialize::serialize::Decodable<rustc_metadata::rmeta::decoder::DecodeContext>>::decode
   848,301  ???:<rustc_middle::ty::Ty as rustc_serialize::serialize::Decodable<rustc_metadata::rmeta::decoder::DecodeContext>>::decode
   846,228  ???:rustc_incremental::persist::load::setup_dep_graph
   807,165  ???:rustc_monomorphize::collector::collect_items_rec
   526,864  ???:<rustc_data_structures::sip128::SipHasher128>::finish128
   465,911  ???:<rustc_middle::query::on_disk_cache::CacheDecoder as rustc_span::SpanDecoder>::decode_def_id
   425,077  ???:rustc_query_system::query::plumbing::try_execute_query::<rustc_query_impl::DynamicConfig<rustc_query_system::query::caches::DefIdCache<rustc_middle::query::erase::Erased<[u8; 8]>>, false, false, false>, rustc_query_impl::plumbing::QueryCtxt, true>
   380,687  ???:rustc_data_structures::unord::hash_iter_order_independent::<rustc_query_system::ich::hcx::StableHashingContext, (&rustc_span::def_id::DefId, &rustc_span::def_id::DefId), std::collections::hash::map::Iter<rustc_span::def_id::DefId, rustc_span::def_id::DefId>>
   360,997  ???:<rustc_middle::ty::generic_args::ArgFolder as rustc_type_ir::fold::TypeFolder<rustc_middle::ty::context::TyCtxt>>::fold_ty
   342,871  /usr/src/debug/glibc/glibc/string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:__memcpy_avx_unaligned_erms
   333,648  ???:rustc_monomorphize::collector::visit_instance_use

I looked at a few other of the top primary regressions and they all look almost exactly the same in cachegrind

@RalfJung
Copy link
Member Author

RalfJung commented Mar 22, 2024 via email

@saethlin
Copy link
Member

@RalfJung
Copy link
Member Author

I do see metadata_decode_entry_optimized_mir there. So... all the extra time is spent loading the MIR...?

@saethlin
Copy link
Member

Yup, that would be my first guess.

@RalfJung
Copy link
Member Author

RalfJung commented Mar 22, 2024

In that case maybe a dedicated query for "mentioned and required items" would indeed help as it would not have to load the entire MIR for that. (This was proposed by Oli.)

@RalfJung RalfJung marked this pull request as draft March 24, 2024 08:48
@RalfJung
Copy link
Member Author

RalfJung commented Apr 3, 2024

In that case maybe a dedicated query for "mentioned and required items" would indeed help as it would not have to load the entire MIR for that. (This was proposed by Oli.)

I have zero knowledge about the crate metadata handling and I'm unlikely to have the time to learn about it any time soon -- so if anyone wants to pick this up, please feel free to do so. Meanwhile I will close this PR as I'm not currently working on this.

@RalfJung RalfJung closed this Apr 3, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Apr 5, 2024
Create a separate query for required and mentioned items instead of tracking them in the MIR body

implements rust-lang#122862 (comment)

May permit further improvements without sacrificing perf... iff this PR isn't horrible for perf 🙃
bors added a commit to rust-lang-ci/rust that referenced this pull request Apr 5, 2024
Create a separate query for required and mentioned items instead of tracking them in the MIR body

implements rust-lang#122862 (comment)

May permit further improvements without sacrificing perf... iff this PR isn't horrible for perf 🙃
bors added a commit to rust-lang-ci/rust that referenced this pull request Apr 5, 2024
Create a separate query for required and mentioned items instead of tracking them in the MIR body

implements rust-lang#122862 (comment)

May permit further improvements without sacrificing perf... iff this PR isn't horrible for perf 🙃
bors added a commit to rust-lang-ci/rust that referenced this pull request Apr 5, 2024
Create a separate query for required and mentioned items instead of tracking them in the MIR body

implements rust-lang#122862 (comment)

May permit further improvements without sacrificing perf... iff this PR isn't horrible for perf 🙃
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
perf-regression Performance regression. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
7 participants