Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instrumentation for delayed UB stemming from uninitialized memory #3374

Merged
merged 64 commits into from
Aug 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
b2e6d8f
Instrumentation for delayed UB stemming from uninitialized memory
artemagvanian Jul 23, 2024
9d3783e
Merge branch 'main' into delayed-ub
artemagvanian Jul 23, 2024
40c8047
Add minimal statics support, clean up atomics
artemagvanian Jul 24, 2024
6c9989d
Merge branch 'main' into delayed-ub
artemagvanian Jul 24, 2024
9df425a
Add tests for different ways to trigger delayed UB
artemagvanian Jul 24, 2024
ee0f863
Merge branch 'main' into delayed-ub
artemagvanian Jul 24, 2024
268885f
Make sure statics can also be analysis targets
artemagvanian Jul 24, 2024
2d99ffa
Add support for closures, fix function call handling
artemagvanian Jul 24, 2024
3a0c190
Merge branch 'main' into delayed-ub
artemagvanian Jul 24, 2024
7d69b02
Formatting nit
artemagvanian Jul 24, 2024
6810535
Fix bugs with inter-function analysis, add more tests
artemagvanian Jul 25, 2024
cf13e0f
Merge branch 'main' into delayed-ub
artemagvanian Jul 25, 2024
a508e62
Fix methods changed by merge
artemagvanian Jul 25, 2024
bf1ffc6
Mitigate issues with non-deterministic memory initialization instrume…
artemagvanian Jul 26, 2024
85a550a
Correctness fixes and performance improvements
artemagvanian Jul 26, 2024
6d4bf1c
Merge branch 'main' into delayed-ub
artemagvanian Jul 26, 2024
e597689
Add more tests, remove unnecessary nodes from the graph
artemagvanian Jul 26, 2024
afb9df7
Apply clippy suggestion
artemagvanian Jul 26, 2024
08494ed
Fix an issue with terminator handling in `InstrumentationVisitor`
artemagvanian Jul 26, 2024
0f11c95
Fix expected test output files
artemagvanian Jul 29, 2024
ba56f12
Merge branch 'main' into delayed-ub
artemagvanian Jul 29, 2024
c5d25ed
More efficient intersection handling
artemagvanian Jul 29, 2024
e7bb5b8
Add a better explanation for running reachability analysis twice
artemagvanian Jul 29, 2024
1792a68
Move points-to analysis into a separate module, organize imports
artemagvanian Jul 29, 2024
f4601c1
Use StaticDef as analysis target
artemagvanian Jul 29, 2024
1186426
Add a comment in `initial_target_visitor.rs`
artemagvanian Jul 29, 2024
cf877b1
Fix small bug in delayed UB instrumentation visitor
artemagvanian Jul 29, 2024
0a6be41
Systematic ptr/ref transmute handling
artemagvanian Jul 29, 2024
f527f2e
Changes from code review
artemagvanian Jul 29, 2024
5d6beee
Merge branch 'main' into delayed-ub
artemagvanian Jul 29, 2024
50ef746
Formatting change
artemagvanian Jul 29, 2024
bd7e960
Link to the closed StableMIR PR
artemagvanian Jul 29, 2024
5fe54f2
Merge branch 'main' into delayed-ub
artemagvanian Jul 30, 2024
076f5ad
Handle spurious failures caused by delayed UB instrumentation
artemagvanian Jul 30, 2024
8014b42
Merge branch 'main' into delayed-ub
artemagvanian Jul 31, 2024
b5b7660
Add more comments to the points_to_analysis
artemagvanian Jul 31, 2024
6812fce
Merge branch 'main' into delayed-ub
artemagvanian Jul 31, 2024
ac6591d
Add more comments to the analysis
artemagvanian Jul 31, 2024
c6187b3
Merge branch 'main' into delayed-ub
artemagvanian Aug 1, 2024
dcdc019
Merge branch 'main' into delayed-ub
artemagvanian Aug 1, 2024
bedaa46
Wrap `PointsToAnalysis::run` into a separate function
artemagvanian Aug 1, 2024
421b136
Remove unnecessary body copy
artemagvanian Aug 1, 2024
3b3fdda
Update the documentation for `PointsToGraph`
artemagvanian Aug 1, 2024
a2e03c6
Rename `GlobalMemLoc -> MemLoc`
artemagvanian Aug 1, 2024
186abe3
Rename `MemLoc::Global` -> `MemLoc::Static`
artemagvanian Aug 1, 2024
6de9bfd
Expand on the doc comment of `LocalMemLoc::Alloc`
artemagvanian Aug 1, 2024
a0ef351
More changes to `PointsToGraph`
artemagvanian Aug 1, 2024
5114525
Remove confusing references to "scalar places"
artemagvanian Aug 1, 2024
759a540
Accept suggestion from code review
artemagvanian Aug 1, 2024
0354c4e
Merge branch 'main' into delayed-ub
artemagvanian Aug 1, 2024
f1172d6
Merge branch 'main' into delayed-ub
artemagvanian Aug 2, 2024
41bcfb0
Merge branch 'main' into delayed-ub
artemagvanian Aug 2, 2024
c7eed14
Formatting change
artemagvanian Aug 2, 2024
136d5a2
Remove `merge` method duplicate
artemagvanian Aug 2, 2024
7cd2f80
Accept suggestion from clippy
artemagvanian Aug 2, 2024
078d38a
Merge branch 'main' into delayed-ub
artemagvanian Aug 2, 2024
54bea65
Documentation changes / renaming
artemagvanian Aug 2, 2024
3d43def
Accept suggestions for error messages
artemagvanian Aug 2, 2024
9deb5a3
Move `successors_for_rvalue` into a separate function
artemagvanian Aug 2, 2024
700033e
Add a comment about intrinsics
artemagvanian Aug 2, 2024
0197e2a
Merge branch 'main' into delayed-ub
artemagvanian Aug 2, 2024
4cba01e
Restrict use of all internal MIR types to `points_to` module
artemagvanian Aug 2, 2024
8671a38
Small comment update
artemagvanian Aug 5, 2024
52daa97
Merge branch 'main' into delayed-ub
artemagvanian Aug 5, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 16 additions & 2 deletions kani-compiler/src/codegen_cprover_gotoc/compiler_interface.rs
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,13 @@ impl GotocCodegenBackend {
check_contract: Option<InternalDefId>,
mut transformer: BodyTransformation,
) -> (GotocCtx<'tcx>, Vec<MonoItem>, Option<AssignsContract>) {
// This runs reachability analysis before global passes are applied.
//
// Alternatively, we could run reachability only once after the global passes are applied
// and resolve the necessary dependencies inside the passes on the fly. This, however, has a
// disadvantage of not having a precomputed call graph for the global passes to use. The
// call graph could be used, for example, in resolving function pointer or vtable calls for
// global passes that need this.
let (items, call_graph) = with_timer(
|| collect_reachable_items(tcx, &mut transformer, starting_items),
"codegen reachability analysis",
Expand Down Expand Up @@ -115,6 +122,13 @@ impl GotocCodegenBackend {
call_graph,
);

// Re-collect reachable items after global transformations were applied. This is necessary
artemagvanian marked this conversation as resolved.
Show resolved Hide resolved
// since global pass could add extra calls to instrumentation.
celinval marked this conversation as resolved.
Show resolved Hide resolved
let (items, _) = with_timer(
|| collect_reachable_items(tcx, &mut transformer, starting_items),
"codegen reachability analysis (second pass)",
);

// Follow rustc naming convention (cx is abbrev for context).
// https://rustc-dev-guide.rust-lang.org/conventions.html#naming-conventions
let mut gcx =
Expand Down Expand Up @@ -260,8 +274,8 @@ impl CodegenBackend for GotocCodegenBackend {
for unit in units.iter() {
// We reset the body cache for now because each codegen unit has different
// configurations that affect how we transform the instance body.
let mut transformer = BodyTransformation::new(&queries, tcx, &unit);
for harness in &unit.harnesses {
let transformer = BodyTransformation::new(&queries, tcx, &unit);
zhassan-aws marked this conversation as resolved.
Show resolved Hide resolved
let model_path = units.harness_model_path(*harness).unwrap();
let contract_metadata =
contract_metadata_for_harness(tcx, harness.def.def_id());
Expand All @@ -273,7 +287,7 @@ impl CodegenBackend for GotocCodegenBackend {
contract_metadata,
transformer,
);
transformer = results.extend(gcx, items, None);
results.extend(gcx, items, None);
if let Some(assigns_contract) = contract_info {
modifies_instances.push((*harness, assigns_contract));
}
Expand Down
1 change: 1 addition & 0 deletions kani-compiler/src/kani_middle/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ pub mod codegen_units;
pub mod coercion;
mod intrinsics;
pub mod metadata;
pub mod points_to;
pub mod provide;
pub mod reachability;
pub mod resolve;
Expand Down
11 changes: 11 additions & 0 deletions kani-compiler/src/kani_middle/points_to/mod.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
// Copyright Kani Contributors
// SPDX-License-Identifier: Apache-2.0 OR MIT

//! This module contains points-to analysis primitives, such as the graph and types representing its
celinval marked this conversation as resolved.
Show resolved Hide resolved
//! nodes, and the analysis itself.

mod points_to_analysis;
mod points_to_graph;

pub use points_to_analysis::run_points_to_analysis;
pub use points_to_graph::{MemLoc, PointsToGraph};
654 changes: 654 additions & 0 deletions kani-compiler/src/kani_middle/points_to/points_to_analysis.rs

Large diffs are not rendered by default.

222 changes: 222 additions & 0 deletions kani-compiler/src/kani_middle/points_to/points_to_graph.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,222 @@
// Copyright Kani Contributors
// SPDX-License-Identifier: Apache-2.0 OR MIT

//! Graph data structure to store the results of points-to analysis.

use rustc_hir::def_id::DefId;
use rustc_middle::{
mir::{Location, Place, ProjectionElem},
ty::{Instance, List, TyCtxt},
};
use rustc_mir_dataflow::{fmt::DebugWithContext, JoinSemiLattice};
use rustc_smir::rustc_internal;
use stable_mir::mir::{
mono::{Instance as StableInstance, StaticDef},
Place as StablePlace,
};
use std::collections::{HashMap, HashSet, VecDeque};

/// A node in the points-to graph, which could be a place on the stack, a heap allocation, or a static.
#[derive(Clone, Copy, Debug, Hash, PartialEq, Eq)]
pub enum MemLoc<'tcx> {
/// Notice that the type of `Place` here is not restricted to references or pointers. For
/// example, we propagate aliasing information for values derived from casting a pointer to a
/// usize in order to ensure soundness, as it could later be casted back to a pointer.
Stack(Instance<'tcx>, Place<'tcx>),
/// Using a combination of the instance of the function where the allocation took place and the
/// location of the allocation inside this function implements allocation-site abstraction.
Heap(Instance<'tcx>, Location),
Static(DefId),
}

impl<'tcx> MemLoc<'tcx> {
/// Create a memory location representing a new heap allocation site.
pub fn new_heap_allocation(instance: Instance<'tcx>, location: Location) -> Self {
MemLoc::Heap(instance, location)
}

/// Create a memory location representing a new stack allocation.
pub fn new_stack_allocation(instance: Instance<'tcx>, place: Place<'tcx>) -> Self {
MemLoc::Stack(instance, place)
}

/// Create a memory location representing a new static allocation.
pub fn new_static_allocation(static_def: DefId) -> Self {
MemLoc::Static(static_def)
}

/// Create a memory location representing a new stack allocation from StableMIR values.
pub fn from_stable_stack_allocation(
instance: StableInstance,
place: StablePlace,
tcx: TyCtxt<'tcx>,
) -> Self {
let internal_instance = rustc_internal::internal(tcx, instance);
let internal_place = rustc_internal::internal(tcx, place);
Self::new_stack_allocation(internal_instance, internal_place)
}

/// Create a memory location representing a new static allocation from StableMIR values.
pub fn from_stable_static_allocation(static_def: StaticDef, tcx: TyCtxt<'tcx>) -> Self {
let static_def_id = rustc_internal::internal(tcx, static_def);
Self::new_static_allocation(static_def_id)
}
}

/// Graph data structure that stores the current results of the point-to analysis. The graph is
/// directed, so having an edge between two places means that one is pointing to the other.
///
/// For example:
/// - `a = &b` would translate to `a --> b`
/// - `a = b` would translate to `a --> {all pointees of b}` (if `a` and `b` are pointers /
/// references)
///
/// Note that the aliasing is not field-sensitive, since the nodes in the graph are places with no
/// projections, which is sound but can be imprecise.
///
/// For example:
/// ```
/// let ref_pair = (&a, &b); // Will add `ref_pair --> (a | b)` edges into the graph.
/// let first = ref_pair.0; // Will add `first -> (a | b)`, which is an overapproximation.
/// ```
#[derive(Clone, Debug, PartialEq, Eq)]
pub struct PointsToGraph<'tcx> {
/// A hash map of node --> {nodes} edges.
edges: HashMap<MemLoc<'tcx>, HashSet<MemLoc<'tcx>>>,
}

impl<'tcx> PointsToGraph<'tcx> {
pub fn empty() -> Self {
Self { edges: HashMap::new() }
}

/// Collect all nodes which have incoming edges from `nodes`.
pub fn successors(&self, nodes: &HashSet<MemLoc<'tcx>>) -> HashSet<MemLoc<'tcx>> {
nodes.iter().flat_map(|node| self.edges.get(node).cloned().unwrap_or_default()).collect()
}

/// For each node in `from`, add an edge to each node in `to`.
pub fn extend(&mut self, from: &HashSet<MemLoc<'tcx>>, to: &HashSet<MemLoc<'tcx>>) {
for node in from.iter() {
let node_pointees = self.edges.entry(*node).or_default();
node_pointees.extend(to.iter());
}
}

/// Collect all places to which a given place can alias.
///
/// We automatically resolve dereference projections here (by finding successors for each
/// dereference projection we encounter), which is valid as long as we do it for every place we
/// add to the graph.
pub fn resolve_place(
&self,
place: Place<'tcx>,
instance: Instance<'tcx>,
) -> HashSet<MemLoc<'tcx>> {
let place_without_projections = Place { local: place.local, projection: List::empty() };
let mut node_set =
HashSet::from([MemLoc::new_stack_allocation(instance, place_without_projections)]);
for projection in place.projection {
match projection {
ProjectionElem::Deref => {
node_set = self.successors(&node_set);
}
ProjectionElem::Field(..)
| ProjectionElem::Index(..)
| ProjectionElem::ConstantIndex { .. }
| ProjectionElem::Subslice { .. }
| ProjectionElem::Downcast(..)
| ProjectionElem::OpaqueCast(..)
| ProjectionElem::Subtype(..) => {
/* There operations are no-ops w.r.t aliasing since we are tracking it on per-object basis. */
}
}
}
node_set
}

/// Stable interface for `resolve_place`.
pub fn resolve_place_stable(
&self,
place: StablePlace,
instance: StableInstance,
tcx: TyCtxt<'tcx>,
) -> HashSet<MemLoc<'tcx>> {
let internal_place = rustc_internal::internal(tcx, place);
let internal_instance = rustc_internal::internal(tcx, instance);
self.resolve_place(internal_place, internal_instance)
}

/// Dump the graph into a file using the graphviz format for later visualization.
pub fn dump(&self, file_path: &str) {
artemagvanian marked this conversation as resolved.
Show resolved Hide resolved
let mut nodes: Vec<String> =
self.edges.keys().map(|from| format!("\t\"{:?}\"", from)).collect();
nodes.sort();
let nodes_str = nodes.join("\n");

let mut edges: Vec<String> = self
.edges
.iter()
.flat_map(|(from, to)| {
let from = format!("\"{:?}\"", from);
to.iter().map(move |to| {
let to = format!("\"{:?}\"", to);
format!("\t{} -> {}", from.clone(), to)
})
})
.collect();
edges.sort();
let edges_str = edges.join("\n");

std::fs::write(file_path, format!("digraph {{\n{}\n{}\n}}", nodes_str, edges_str)).unwrap();
}

/// Find a transitive closure of the graph starting from a set of given locations; this also
/// includes statics.
pub fn transitive_closure(&self, targets: HashSet<MemLoc<'tcx>>) -> PointsToGraph<'tcx> {
let mut result = PointsToGraph::empty();
// Working queue.
let mut queue = VecDeque::from_iter(targets);
// Add all statics, as they can be accessed at any point.
let statics = self.edges.keys().filter(|node| matches!(node, MemLoc::Static(_)));
queue.extend(statics);
// Add all entries.
while let Some(next_target) = queue.pop_front() {
result.edges.entry(next_target).or_insert_with(|| {
let outgoing_edges =
self.edges.get(&next_target).cloned().unwrap_or(HashSet::new());
artemagvanian marked this conversation as resolved.
Show resolved Hide resolved
queue.extend(outgoing_edges.iter());
outgoing_edges.clone()
});
}
result
}

/// Retrieve all places to which a given place is pointing to.
pub fn pointees_of(&self, target: &MemLoc<'tcx>) -> HashSet<MemLoc<'tcx>> {
self.edges.get(&target).unwrap_or(&HashSet::new()).clone()
}
}

/// Since we are performing the analysis using a dataflow, we need to implement a proper monotonous
/// join operation. In our case, this is a simple union of two graphs. This "lattice" is finite,
/// because in the worst case all places will alias to all places, in which case the join will be a
/// no-op.
impl<'tcx> JoinSemiLattice for PointsToGraph<'tcx> {
fn join(&mut self, other: &Self) -> bool {
artemagvanian marked this conversation as resolved.
Show resolved Hide resolved
let mut updated = false;
// Check every node in the other graph.
for (from, to) in other.edges.iter() {
let existing_to = self.edges.entry(*from).or_default();
let initial_size = existing_to.len();
existing_to.extend(to);
let new_size = existing_to.len();
updated |= initial_size != new_size;
}
updated
}
}

/// This is a requirement for the fixpoint solver, and there is no derive macro for this, so
/// implement it manually.
impl<'tcx, C> DebugWithContext<C> for PointsToGraph<'tcx> {}
Loading
Loading