Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some directories are not reported #554

Open
kukovecz opened this issue Apr 7, 2023 · 5 comments · May be fixed by #891
Open

Some directories are not reported #554

kukovecz opened this issue Apr 7, 2023 · 5 comments · May be fixed by #891
Assignees
Labels
bug Something isn't working help wanted Extra attention is needed python Pull requests that update Python code

Comments

@kukovecz
Copy link
Contributor

kukovecz commented Apr 7, 2023

When extracting chunks, there is a logic for handling the whole chunks differently, here. This results that in some cases some directories are not reported.

Reproduce this with this test file: test.zip. This is actually from the integration test suit, but I had to zip it for github to allow me attach it.

If I run this file with unblob and check the report, I get the following item:

A part of the generated report json
{
  "task": {
    "path": "/tmp/fruits.lvl1.lzh",
    "depth": 0,
    "chunk_id": "",
    "__typename__": "Task"
  },
  "reports": [
    {
      "path": "/tmp/fruits.lvl1.lzh",
      "size": 146,
      "is_dir": false,
      "is_file": true,
      "is_link": false,
      "link_target": null,
      "__typename__": "StatReport"
    },
    {
      "magic": "  LHarc 1.x/ARX archive data  [lh0], 0x0 OS, with \"apple.txt\"\\012- data",
      "mime_type": "application/x-lzh-compressed",
      "__typename__": "FileMagicReport"
    },
    {
      "md5": "cf71709694cd2f3e98fcf87524194beb",
      "sha1": "701248bfd7dd7a7360ce237754a82425d1d13346",
      "sha256": "e016f42094b088058e7fa5d9c3f98bafaeac87899205192d95b8001f72058a0f",
      "__typename__": "HashReport"
    },
    {
      "chunk_id": "47941:3",
      "handler_name": "lzh",
      "start_offset": 96,
      "end_offset": 146,
      "size": 50,
      "is_encrypted": false,
      "extraction_reports": [],
      "__typename__": "ChunkReport"
    },
    {
      "chunk_id": "47941:2",
      "handler_name": "lzh",
      "start_offset": 47,
      "end_offset": 96,
      "size": 49,
      "is_encrypted": false,
      "extraction_reports": [],
      "__typename__": "ChunkReport"
    },
    {
      "chunk_id": "47941:1",
      "handler_name": "lzh",
      "start_offset": 0,
      "end_offset": 47,
      "size": 47,
      "is_encrypted": false,
      "extraction_reports": [],
      "__typename__": "ChunkReport"
    }
  ],
  "subtasks": [
    {
      "path": "/tmp/unblob/fruits.lvl1.lzh_extract/96-146.lzh_extract",
      "depth": 1,
      "chunk_id": "47941:3",
      "__typename__": "Task"
    },
    {
      "path": "/tmp/unblob/fruits.lvl1.lzh_extract/47-96.lzh_extract",
      "depth": 1,
      "chunk_id": "47941:2",
      "__typename__": "Task"
    },
    {
      "path": "/tmp/unblob/fruits.lvl1.lzh_extract/0-47.lzh_extract",
      "depth": 1,
      "chunk_id": "47941:1",
      "__typename__": "Task"
    }
  ],
  "__typename__": "TaskResult"
}

This means, when unblob handles /tmp/fruits.lvl1.lzh, it will create 3 subtasks:

  • /tmp/unblob/fruits.lvl1.lzh_extract/96-146.lzh_extract
  • /tmp/unblob/fruits.lvl1.lzh_extract/47-96.lzh_extract
  • /tmp/unblob/fruits.lvl1.lzh_extract/0-47.lzh_extract

And will continue to run for those (sub)tasks. However a task for the /tmp/unblob/fruits.lvl1.lzh_extract directory is never created, so that directory is just there in the file system without actually being in the generated report.

@kukovecz kukovecz added the bug Something isn't working label Apr 7, 2023
@e3krisztian
Copy link
Contributor

The directory not being reported/processed as a Task is an auxiliary directory, that is used only to carve chunks to, we did not assign any report to it, yet, because it was not necessary so far.

If it is really needed a new report type on chunks (CarveReport?) could resolve this.

@e3krisztian
Copy link
Contributor

Related: #326.

I am not sure we need to do anything with it, though.

@martonilles
Copy link
Contributor

Option could be to move the carved files out of the extraction tree structure and store them separately. Also in most cases we are deleting the carves, also carves are easily reproducable.

This way we can use the followning extraction tree structure:

  • /tmp/unblob/fruits.lvl1.lzh_96-146_extract/
  • /tmp/unblob/fruits.lvl1.lzh_47-96_extract/
  • /tmp/unblob/fruits.lvl1.lzh_0-47_extract/

@qkaiser
Copy link
Contributor

qkaiser commented Jun 17, 2024

This issue is causing problems with people wanting to do nice things with the unblob API from Python. See #878

@qkaiser qkaiser added help wanted Extra attention is needed python Pull requests that update Python code labels Jun 17, 2024
@qkaiser qkaiser self-assigned this Jun 17, 2024
@AndrewFasano AndrewFasano linked a pull request Jul 2, 2024 that will close this issue
@AndrewFasano
Copy link
Contributor

This was blocking my ability to map between extraction directories and the blobs they were derived from with the API so I took a stab at it in #891. I didn't figure out how to add a new task/subtask for carving, instead I just added a new report type that logs the source and destination of each carve.

With the example fruits.lvl1 file I the following new outputs are produced in the log which allows a consumer of the log to map between the fruits.lvl1.lzh file and the 3 carved files: fruits.lvl1.lzh_extract/96-146.lzh, fruits.lvl1.lzh_extract/47-96.lzh, and fruits.lvl1.lzh_extract/0-47.lzh.

       {
        "carved_from": "/tmp/unblob/fruits.lvl1.lzh",
        "carved_to": "/tmp/unblob/fruits.lvl1.lzh_extract/96-146.lzh",
        "start_offset": 96,
        "end_offset": 146,
        "handler_name": "lzh",
        "__typename__": "CarveReport"
      },
      {
        "carved_from": "/tmp/unblob/fruits.lvl1.lzh",
        "carved_to": "/tmp/unblob/fruits.lvl1.lzh_extract/47-96.lzh",
        "start_offset": 47,
        "end_offset": 96,
        "handler_name": "lzh",
        "__typename__": "CarveReport"
      },
      {
        "carved_from": "/tmp/unblob/fruits.lvl1.lzh",
        "carved_to": "/tmp/unblob/fruits.lvl1.lzh_extract/0-47.lzh",
        "start_offset": 0,
        "end_offset": 47,
        "handler_name": "lzh",
        "__typename__": "CarveReport"
      },

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed python Pull requests that update Python code
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants