Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong instruction boundary labels in the dataset #28

Open
5c4lar opened this issue Mar 14, 2024 · 2 comments
Open

Wrong instruction boundary labels in the dataset #28

5c4lar opened this issue Mar 14, 2024 · 2 comments

Comments

@5c4lar
Copy link

5c4lar commented Mar 14, 2024

I hope this message finds you well. I am currently working on processing the dataset for a downstream task and have encountered what appears to be an inconsistency with some of the instruction boundary labels. Upon thorough review, it seems that certain labels may be incorrect.

Given that the correctness of these labels is one of the main contributions of your work, could you kindly allocate some time to double-check them?

../data/x86/x86_dataset/linux/servers/gcc_Os/nginx
../data/x86/x86_dataset/linux/servers/gcc_Of/nginx
../data/x86/x86_dataset/linux/libs/gcc_O3/libc-2.27.so
../data/x86/x86_dataset/linux/libs/clang_O3/libv8.so
../data/x86/x86_dataset/linux/libs/gcc_O2/libc-2.27.so
../data/x86/x86_dataset/linux/libs/clang_m32_Os/libxml2.so
../data/x86/x86_dataset/linux/libs/gcc_O1/libc-2.27.so
../data/x86/x86_dataset/linux/libs/clang_Of/libv8.so
../data/x86/x86_dataset/linux/libs/clang_m32_Of/libtiff.so.5
../data/x86/x86_dataset/linux/libs/gcc_Os/libc-2.27.so
../data/x86/x86_dataset/linux/clients/gcc_O2/openssl
../data/x86/x86_dataset/linux/clients/gcc_O0/openssl
../data/x86/x86_dataset/linux/clients/gcc_O1/openssl
../data/x86/x86_dataset/linux/clients/gcc_Os/openssl
../data/x86/x86_dataset/linux/clients/gcc_Of/openssl
@5c4lar
Copy link
Author

5c4lar commented Mar 16, 2024

Some of the cases seems to be the corner case mentioned in the paper, such as those from openssl, but the labels point to the middle of the instruction istead of the start.

For example for x86_dataset/linux/clients/gcc_Os/openssl, 54521a is labeled as an instruction, but it is not.

Screenshot_20240316_185201

@bin2415
Copy link
Collaborator

bin2415 commented Mar 16, 2024

Our tool identifies the boundaries of basic blocks at the compiler level and utilizes capstone to disassemble the instructions within each basic block. I have reproduced the case and confirmed that our tool correctly identifies the boundary of basic block. Here is the log:

 BBL#60010 (256B) @0x00545200 - 0x00545300, BaseOff: 0x145200, SecOff:0x141200, Fixups: 0 , Type: BBL, Padding: 0x4, Fallthrough: N

However, we have encountered an issue where capstone fails to correctly disassemble the instruction at address 0x545219, leading to subsequent instructions being misinterpreted. Here is the error log detailing the problem.

ERROR:Instructions that capstone can't handled. 0x545219
ERROR:Instructions that capstone can't handled. 0x54521a
ERROR:Instructions that capstone can't handled. 0x545222
ERROR:Instructions that capstone can't handled. 0x545228
ERROR:Instructions that capstone can't handled. 0x54522d
ERROR:Instructions that capstone can't handled. 0x545247
ERROR:Instructions that capstone can't handled. 0x54524e

In summary, it is a bug of capstone. We will verify whether the latest version of capstone correctly disassembles the vbroadcasti128 instruction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants