Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Several reassembly error cases on Ddisasm #54

Open
witbring opened this issue Oct 1, 2022 · 9 comments
Open

Several reassembly error cases on Ddisasm #54

witbring opened this issue Oct 1, 2022 · 9 comments

Comments

@witbring
Copy link

witbring commented Oct 1, 2022

When I tested Ddisasm v1.5.3 (docker image digests a803c9, Apr. 2022) for my research, I found several interesting bug cases. 
 
First, I observe that Ddisasm incorrectly symbolize jump table. As an example, given the jump table entry ‘.long .L4895-.L4896’ found in addr2line.tar.gz of Binutils, Ddisasm recognized the value as a jump table entry but the label value is misidentified.

  • Compiler-generated assembly 
.L3326:                       ; this jump table will be placed at 0x131830
    .long	.L3323-.L3326 ; the entry can be represented as .L_103630 - .L_131830
     #...
  • Binary
 131830: 001efdff
  • Reassembler-generated assembly 
.L_131830:
# data_access(4, 4, 1034ee), data_access(4, 4, 10360e), preferred_data_access(131830)
          131830: .long .L_10360c-.L_13180c
# preferred_data_access(131830)
          131834: .long .L_103660-.L_131830

Second, I found that Ddisasm omits some definitions of labels. For example, given the instruction ‘movl $default_quoting_options, %eax’ found in true.tar.gz (x64 non-pie binary) of Coreutils, Ddisasm reassembled the instruction as ‘mov EAX,OFFSET .L_40b2e0’. However, Ddisasm missed the definition of label ‘. L_40b2e0’ so it causes a compilation error.

  • Reassembler-generated assembly 
  402bb5:   mov EAX,OFFSET .L_40b2e0

Third, I observed that Ddisasm makes wrong symbolic expressions so some recompiled binaries refer to incorrect addresses. As an example, given the disassembly code ‘.long .L1543@GOTOFF’ found in nm_new.tar.gz (x86 pie binary) of binutils, Ddisasm symbolized the pointer as ‘.long .L_e4b5-.L_785f1’

  • Compiler-generated assembly
    .long	.L1543@GOTOFF       ; this entry can be represented as .L_95eb8
  • Reassembler-generated assembly
0xc5fe4 : .long .L_e4b5-.L_785f1

Also, I observed that Ddisasm makes some mistakes when it generates got-relative labels. As an example, given the instruction ‘addl $yydefgoto@GOTOFF, %eax’ found in date.tar.gz (x86 pie binary) of coreutils, ddiasm the immediate value as ‘.L_11eca@GOTOFF’. However, the ‘yydefgoto’ is placed at 0x11ee6 not 0x11eca. Also, I calculated the got relative address and concluded that Ddisasm misidentified the label value.

  • Compiler-generated assembly
      addl $yydefgoto@GOTOFF, %eax     
  • Binary (non-stripped version)
$ readelf -s date | grep yydefgoto
   179: 00011ee6    26 OBJECT  LOCAL  DEFAULT   16 yydefgoto

$ objdump -d -M intel date | grep 6395
    6395:    81 c0 e6 be ff ff        add    eax,0xffffbee6

$ readelf -S date | grep got.plt
  [24] .got.plt          PROGBITS        00016000 015000 000128 04  WA  0   0  4

$ python3 -c 'print(hex(0xffffbee6 + 0x0016000 & 0xffffffff))'
0x11ee6
  • Reassembler-generated assembly
          6395:   add EAX,OFFSET .L_11eca@GOTOFF

Lastly, I observed that Ddisasm fails on symbolization when it handles large size binary. For example, Ddisasm fails on symbolizing rip-relative addressing when it reassembled 416.gamess.tar.gz (delete link) of spec cpu 2006. As a result, it causes tremendous false negative errors. 

  • Reassembler-generated assembly
main:
            sub RSP,8
            call FUN_13f0

            lea RSI,QWORD PTR [RIP+6413305]     ; fails on symbolization
@aeflores
Copy link
Collaborator

aeflores commented Oct 3, 2022

Hi @witbring. Thanks for the report!

issue 54.1: addr2line

The first issue, jump table in addr2line seems to be solved in the current master fa15bff

.L_131830:
# data_access(4, 4, 10360e), preferred_data_access(4, 131830)
          131830: .long .L_103630-.L_131830

I would suggest trying the latest version.

issue 54.2: true

This seems to cause a different problem in the current ddisasm version, I will investigate further.

issue 54.3: nm_new

There seems to be something wrong with the tar file that you uploaded, can you upload it again?

issue 54.4: date

I see what you are saying. The disassembled instruction in the broader context is:

.L_6390:
          6390:   add EAX,-28
          6393:   add EAX,EBX
          6395:   add EAX,OFFSET .L_11eca@GOTOFF

That -28 in 6390 is precisely the difference between 11eca and 11ee6. So at 6395, ddisasm considers that EAX has the got address minus 28. If this is wrong, it is because -28 is an offset into the yydefgoto data structure. Could you provide also the compiler-generated assembly of this example? This would help us make sure we implement the right fix.
I think the problem is in this rule https://github.com/GrammaTech/ddisasm/blob/main/src/datalog/binary/elf/elf_binaries.dl#L217
I will look into this further.

issue 54.5: 416.gamess

I can't open this .tar.gz file either, can you re-upload?

@witbring
Copy link
Author

witbring commented Oct 4, 2022

Thank you for your reply.

I checked the uploaded files but there are no problem to unpack the tar files.
Thus, I compress them with a different format, just in case.
Also, I upload an assembly file you asked.
I hope it will help.

issue 54.3: nm_new

nm_new.zip

issue 54.4: date

parse-datetime.s.txt is a relevant assembly file that a compiler generated.
You'd better to check a line number 4474.

issue 54.5: 416.gamess

416.gamess.zip (delete link)

@aeflores
Copy link
Collaborator

aeflores commented Oct 5, 2022

issue 54.3: nm_new

For nm_new, I can successfully untar or unzip, but the file inside is only 29 bytes and does not seem to have a binary format.

issue 54.4: date

Thanks, this is useful. I'll let you know once a fix is in.

issue 54.5: 416.gamess

This also seems to work fine on the current main branch fa15bff

main:
            subq $8,%rsp
            callq _gfortran_set_args@PLT

            leaq .L_61f110(%rip),%rsi

@witbring
Copy link
Author

witbring commented Oct 5, 2022

Sorry, I re-upload nm_new.
nm-new.zip

@aeflores
Copy link
Collaborator

aeflores commented Oct 5, 2022

Hi @witbring, thanks!
I can confirm that nm_new is still an issue. We will work on it.

@aeflores
Copy link
Collaborator

aeflores commented Oct 6, 2022

Alright, issue 54.3: nm_new should be solved by 10d66da

@adamjseitz
Copy link
Contributor

Hi @witbring, I am looking at resolving the remaining issues.

I think I have a fix for the date binary, but I am trying to generate some additional smaller examples, but I am having trouble getting a compiler to produce similar code to that assembly output.

Can you provide any information about your build environment and how you build coreutils to generate this code? I believe from the artifacts you have attached that you're using clang 12 to build coreutils-8.30 (x86 pie). Because the C file parse-datetime.c is generated, maybe your yacc/bison version is relevant? The output of running ./configure command on coreutils might be helpful.

Thanks!

@witbring
Copy link
Author

Hi @adamjseitz,

I'm grade to hear that you fixed the error.
I compiled the date binary with -O1 -pie -fPIE -m32 options.
Also, you can find my build environment and other configurations from config.log.
I hope it helps your job.

Thank you.

@adamjseitz
Copy link
Contributor

The true binary should be fixed by 06fe6fa.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants