`macaw-symbolic`: Fine-grained tracking of machine code–specific bad behavior (à la `HasLLVMAnn`) #429

RyanGlScott · 2024-08-21T14:45:40Z

macaw-symbolic is built on top of the crucible-llvm memory model, which means that it has the ability to report instances of C-oriented bad behavior (assuming that the underlying machine code adheres to C's memory model conventions). These instances of bad behavior are tracked via the HasLLVMAnn constraint that is threaded throughout macaw-symbolic.

In addition to C memory model checks, macaw-symbolic also adds a variety of additional assertions that are specific to machine code. These include (but are likely not limited to):

Checking if reads from and writes to the global address space are within bounds:

macaw/symbolic/src/Data/Macaw/Symbolic/Memory/Common.hs

Lines 172 to 175 in 186c35d

    
           p <- mkPred ptrOff 
        
           let msg = printf "%s outside of static memory range (known BlockID 0): %s" (show (MS.pointerUseTag puse)) (show (WI.printSymExpr ptrOff)) 
        
           let loc = MS.pointerUseLocation puse 
        
           let assertion = CB.LabeledPred p (CS.SimError loc (CS.GenericSimError msg))

macaw/symbolic/src/Data/Macaw/Symbolic/Memory/Common.hs

Lines 181 to 187 in 186c35d

    
           p <- mkPred ptrOff 
        
           zeroNat <- WI.natLit sym 0 
        
           isZeroBase <- WI.natEq sym zeroNat ptrBase 
        
           p' <- WI.itePred sym isZeroBase p (WI.truePred sym) 
        
           let msg = printf "%s outside of static memory range (unknown BlockID): %s" (show (MS.pointerUseTag puse)) (show (WI.printSymExpr ptrOff)) 
        
           let loc = MS.pointerUseLocation puse 
        
           let assertion = CB.LabeledPred p' (CS.SimError loc (CS.GenericSimError msg))

Using a default value when a conditional read's condition does not hold:

macaw/symbolic/src/Data/Macaw/Symbolic/MemOps.hs

Lines 1095 to 1099 in 186c35d

    
           let useDefault msg = 
        
                 do notC <- notPred sym cond 
        
                    let errMsg = "[doCondReadMem] " ++ msg 
        
                    assert bak notC (AssertFailureSimError errMsg errMsg) 
        
                    return def

A variety of checks when performing memory operations involving pointers:

macaw/symbolic/src/Data/Macaw/Symbolic/MemOps.hs

Lines 502 to 506 in 186c35d

    
           check :: (IsSymBackend sym bak) => bak -> Pred sym -> String -> String -> IO () 
        
           check bak valid name msg = assert bak valid 
        
                               $ AssertFailureSimError errMsg errMsg 
        
             where 
        
               errMsg = "[" ++ name ++ "] " ++ msg

The attn signal (in macaw-ppc-symbolic):

macaw/macaw-ppc-symbolic/src/Data/Macaw/PPC/Symbolic/Functions.hs

Lines 82 to 84 in 186c35d

    
           MP.Attn -> do 
        
             let reason = C.GenericSimError "ppc_attn" 
        
             C.addFailedAssertion bak reason

Loss of floating-point precision (in macaw-ppc-symbolic):

macaw/macaw-ppc-symbolic/src/Data/Macaw/PPC/Symbolic/Functions.hs

Lines 235 to 246 in 186c35d

    
           cond <- C.iFloatEq @_ @(MS.ToCrucibleFloatInfo fi') sym x' 
        
             =<< C.iFloatCast @_ @_ @(MS.ToCrucibleFloatInfo fi) 
        
                   sym 
        
                   (MS.floatInfoToCrucible fi') 
        
                   C.RNE 
        
                   res 
        
           C.assert bak cond $ C.GenericSimError $ 
        
             "float precision loss: coercing from " 
        
             ++ show (MS.floatInfoToCrucible fi') 
        
             ++ " to " 
        
             ++ show (MS.floatInfoToCrucible fi) 
        
           return res

Dividing by zero in a division-related instruction (in macaw-x86-symbolic):

macaw/x86_symbolic/src/Data/Macaw/X86/Crucible.hs

Lines 725 to 729 in 186c35d

    
           -- Check denominator is not 0 
        
           do let bvZ = app (BVLit dw (BV.zero dw)) 
        
              denNotZero <- evalApp sym $ Not (app (BVEq dw den bvZ)) 
        
              let errMsg = "denominator not zero" 
        
              assert bak denNotZero (C.AssertFailureSimError errMsg (errMsg ++ " in Data.Macaw.X86.Crucible.getDenominator"))

A quotient overflowing in a division-related instruction (in macaw-x86-symbolic):

macaw/x86_symbolic/src/Data/Macaw/X86/Crucible.hs

Lines 767 to 769 in 186c35d

    
           qNoOverflow <- evalApp sym $ BVEq nw (ValBV nw qExt) qExt' 
        
           let errMsg = "quotient no overflow" 
        
           assert bak qNoOverflow (C.AssertFailureSimError errMsg (errMsg ++ " in Data.Macaw.X86.Crucible.uDivRem"))

macaw/x86_symbolic/src/Data/Macaw/X86/Crucible.hs

Lines 807 to 809 in 186c35d

    
           qNoOverflow <- evalApp sym $ BVEq nw (ValBV nw qExt) qExt' 
        
           let errMsg = "quotient no overflow" 
        
           assert bak qNoOverflow (C.AssertFailureSimError errMsg (errMsg ++ " in Data.Macaw.X86.Crucible.sDivRem"))

Unlike the checks in the crucible-llvm memory model, however, these assertions are all performed via GenericSimError or AssertFailureSimError. As a result, it is not straightforward to catch these assertions and perform subsequent analysis on them after simulation fails.

I propose that macaw-symbolic add a constraint similar to HasLLVMAnn (perhaps HasMacawAnn, for lack of a better name) and use it to track the What4 annotations of each of the terms that give rise to these assertion failures. That way, one can consult the Map of bad macaw-symbolic behaviors afterwards and match the annotations to the corresponding terms. This would require a fair bit of API churn in order to thread the new HasMacawAnn constraint through to the relevant functions, however.

The text was updated successfully, but these errors were encountered:

langston-barrett · 2024-08-21T17:29:12Z

For reference, here is the definition of HasLLVMAnn:

https://github.com/GaloisInc/crucible/blob/93bfa7f7858eb458f83eb3e0f030bf600ef72365/crucible-llvm/src/Lang/Crucible/LLVM/MemModel/Partial.hs#L121

The parameter ?recordLLVMAnnotation is usually used in the following way:

A Pred sym, representing a safety condition, is annotated with What4.Interface.annotateTerm
An instance of BadBehavior is constructed that represents the meaning of the safety condition
?recordLLVMAnnotation is called with these arguments, it generally stashes the (annotation, BadBehavior) pair into a MapF that lives in an IORef, where it can later be retrieved for analysis
The predicate is eventually asserted
During processing of goals, if an assertion fails and its predicate was annotated and that annotation appears in the map, tools can look up the BadBehavior from the map and analyze it.

I would actually suggest we consider supporting an API in Macaw that diverges slightly from this scheme, to potentially support more different use-cases. I have a particular proposal below, but just generally think we should consider alternatives.

We could create an enumeration (perhaps MacawError) of all (or at first, most) of the assertions that Macaw makes, and then provide an interface like so:

data MacawError sym where
  divByZero :: forall w. (1 <= w) => SymExpr (BVType w) -> MacawError sym
  -- more constructors here ...

-- | Given a safety predicate and a description of the error it represents,
-- return a new predicate (and possibly perform additional side-effects, such as
-- recording information about the predicate).
type MacawProcessAssertion sym
  = (?processAssert :: sym -> Pred sym -> MacawError sym -> IO (Pred sym))

With this scheme, the MacawProcessAssertion sym could still support the existing use-case of recording information about Pred syms to be used in later analysis, but it has more power:

It has access to the Pred sym itself, not just an annotation on it, so
- it has the ability to not annotate the predicate if it doesn't want to
- it can observe, e.g., if the Pred is a constant before deciding what to do next
It can change the predicate, e.g., to skip making certain classes of assertions by returning truePred sym

... and potentially other use-cases not yet envisioned here. A default implementation might be:

defaultMacawProcessAssertion _sym p e = pure p

Or, to support the existing use-case (not 100% sure this type-checks):

defaultMacawProcessAssertion mapRef sym p e = do
  (n, p') <- annotateTerm sym p
  modifyIORef mapRef (MapF.insert n e)
  pure p'

RyanGlScott added enhancement symbolic-execution Issues relating to macaw-symbolic and symbolic execution labels Aug 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`macaw-symbolic`: Fine-grained tracking of machine code–specific bad behavior (à la `HasLLVMAnn`) #429

`macaw-symbolic`: Fine-grained tracking of machine code–specific bad behavior (à la `HasLLVMAnn`) #429

RyanGlScott commented Aug 21, 2024

langston-barrett commented Aug 21, 2024

macaw-symbolic: Fine-grained tracking of machine code–specific bad behavior (à la HasLLVMAnn) #429

macaw-symbolic: Fine-grained tracking of machine code–specific bad behavior (à la HasLLVMAnn) #429

Comments

RyanGlScott commented Aug 21, 2024

langston-barrett commented Aug 21, 2024

`macaw-symbolic`: Fine-grained tracking of machine code–specific bad behavior (à la `HasLLVMAnn`) #429

`macaw-symbolic`: Fine-grained tracking of machine code–specific bad behavior (à la `HasLLVMAnn`) #429