-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does SOK support inline data? #9
Comments
Hi, assembly codes are problems for our tools to collect ground truth, as compilers do not have basic block information for them. There are two categories of assembly codes: 1. assembly file 2. assembly codes in c file. Our solution is wrapping these regions with specific labels, and do recursive disassembly according to the control flows to identify code and data regions in assembly regions. In this example, below is the assembly result of .bbInfo_INLINEB
#APP
# 6 "test.c" 1
leaq _filter(%rip), %rax
jmp _out
.global _filter
.type _filter,@object
_filter:
.ascii "\040\000\000\000\000\000\000\000\025\000\000\005\015\000\000\000\040\000\000\000\020\000\000\000\025\000\004\000\005\000\000\000\025\000\003\000\012\000\000\000\025\000\002\000\013\000\000\000\025\000\001\000\004\000\000\000\006\000\000\000\000\000\377\177\006\000\000\000\000\000\005\000"
_out:
# 0 "" 2
#NO_APP
.bbInfo_INLINEE We use |
Hi @bin2415 , thanks for your prompt reply. I am kind of curious why we need to use recursive disassembly to distinguish the code and data? Based on my understanding, all the data in the assembly code would have some labels like I do agree that we need to use recursively disassembly to get the basic block information, by the way 😆 |
Hi @ZhangZhuoSJTU, that is a good observation and most cases meet this rule. But there exist some corner cases do not obey this rule as I know. For example, here(link1, link2) are the examples that |
I see. I guess it means if we follow the rule, we would get a sound result for data identification (i.e., w/o false negative but w/ false positive). So I am wondering whether we can first follow the rule to get a superset of such inline-assemble data (i.e., the regions following I prefer linear disassembly rather than recursive disassembly. My observation here is that these specific instruction(s) represented by |
I agree with that.
This should work. By the way, |
Hi, I'm curious about whether SOK could handle inline data?
Though gcc and clang won't place any jump tables or constants in
.text
, there're invariantly some occasions in real-world projects where there exists interleaving data and code in the.text
section. I tried to embed data into gaps of instructions using inline assembly. What I got is that SOK misidentifies those inline data bytes (from0x40055f
to0x4005a7
) as instructions. Given the following program attachments compiled by gcc -O0, SOK even throws an error. The root of this problem is because SOK wrongly takes data bytes as instructions.For your convenience, I post the source code here. Log file and executable file are in attachments.
But even let the former problem alone, there may be some potential problems when handling with overlapping instructions.
No matter what, thanks so much for your amazing work!
The text was updated successfully, but these errors were encountered: