Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report carve #891

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Report carve #891

wants to merge 2 commits into from

Conversation

AndrewFasano
Copy link
Contributor

I really wanted to change carving to generate a subtask where there's a task for the original file being processed that has subtasks for each carved file. Then each carved file is another task that generates subtasks for the extraction. But I couldn't find a sane way to do this - would love any feedback or help with it if y'all think that's a better way to approach this.

@e3krisztian
Copy link
Contributor

The addition of CarveReport looks clean enough (will have a second look to verify).

However the change from _extract to _carve directory suffix is backward incompatible, and while generally I like the idea (I have opened the linked issue it solves), I think it would be better for the default for ExtractionConfig.carve_suffix to be _extract (=no change to current output), and make it a command line option to override it.


I had about the same idea of simplifying the double step extraction (carve then extract chunks). I thought, if a file is not fully recognized by any handler (=has multiple chunks), it should be categorized as "unknown" (or "composite") and handled by a "default" handler, which would recognize and extract (carve) chunks, and could also assign handlers to them. This would be exactly what you wanted - one task for each file. I went so far as to make an experimental refactor to work like this 2 years ago, but it become too big of a change with untidy commits to review, and also probably with some bad decisions and thus was abandoned without much consideration. One of the problems to solve is how to pass the handler between processes to avoid the duplication of the expensive handler selection. With the current solution it is not needed: handler selection and extraction happens in the same process.

I do think understanding and reasoning about a flattened extraction process would be easier, but it would be a big work to rewrite the code now.

Could you explain why you would like to handle chunk-files by separate (sub-)tasks? Maybe we can come up with a solution for that problem.

@AndrewFasano
Copy link
Contributor Author

Thanks for the feedback. I updated the PR to set the default carve_suffix to be _extract. I also fixed a type issue pyright caught, hopefully it will pass the CI checks now.

Thanks for pointing me at #464 - I can now see how complex that change would be and will avoid going down that path! What I'm trying to accomplish here is described in #878, but the short version is that I want to collect a clean version of each extraction produced by unblob. Specifically I'm trying to recover multiple partitions from firmware and package them up into archives for subsequent analysis. I don't want to have any *_extract directories or the ####-####.<type> files created by unblob in the packaged archives.

I've managed to do this using a terrible incantation of find to identify *_extract directories and then run tar on each with various --exclude flags to filter out unblob artifacts within the directory, but I figure there must be a better way either using the unblob API or by parsing the extraction report. If you have any tips, please let me know.

@qkaiser
Copy link
Contributor

qkaiser commented Sep 24, 2024

I'm back ! Did anything happen with this @e3krisztian ?

@e3krisztian
Copy link
Contributor

I'm back ! Did anything happen with this @e3krisztian ?

Welcome back @qkaiser !
There was no progress on it, unfortunately.

I think this carve suffix needs to be configurable from the CLI, and did not got around to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Some directories are not reported distinguish between _carve and _extract directory
3 participants