Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for WEVT_TEMPLATE evtx template structure parsing #103

Open
williballenthin opened this issue Jun 12, 2020 · 4 comments
Open

Comments

@williballenthin
Copy link

williballenthin commented Jun 12, 2020

In a recent discussion, it became clear to me that there's a desire for evtx tooling that supports an offline database of templates. Here's some some relevant background on the topic:

The forensikblog.de post describes exactly my goal: to process the resource directory of PE files and collect evtx templates for subsequent use. For example, to put the templates in a sqlite database, carve evtx records from unallocated space, and render the records using the templates from the database. Going forward, I expect to use this evtx library over python-evtx, for many reasons :-).

However, getting this to work may take some changes to this evtx library. I'll describe what I find in this thread. I hope that we can work together to support all these use cases!

Incidentally, I've been chatting with @forensicmatt whose also interested in working with evtx templates, so he may chime in too.


wevt_template is my work in progress project for extracting evtx templates from PE files.

Here is services.exe (renamed to gif extension) that I'll reference below.


In the attached services.exe at offset 0xA3020 with length 0x4b7e is the embedded instrumentation manifest that includes evtx templates:

00000000:  43 52 49 4d 7c 4b 00 00 05 00 01 00 03 00 00 00   CRIM|K..........
00000010:  5b 71 63 00 da ee 07 40 94 29 ad 52 6f 62 69 6e   [qc....@.).Robin
00000020:  4c 00 00 00 97 4c 18 06 01 52 0e 48 92 af 3a 36   L....L...R.H..:6
00000030:  26 c5 b1 40 f4 08 00 00 d1 08 59 55 d7 a6 95 46   &[email protected]
00000040:  8e 1e 26 93 1d 20 12 f4 48 0c 00 00 57 45 56 54   ..&.. ..H...WEVT
00000050:  a8 08 00 00 01 00 00 90 08 00 00 00 05 00 00 00   ................
00000060:  9c 00 00 00 07 00 00 00 08 01 00 00 0d 00 00 00   ................
00000070:  d8 04 00 00 02 00 00 00 24 05 00 00 00 00 00 00   ........$.......
00000080:  a4 05 00 00 01 00 00 00 e4 05 00 00 03 00 00 00   ................
00000090:  f4 06 00 00 04 00 00 00 68 07 00 00 43 48 41 4e   ........h...CHAN
000000a0:  6c 00 00 00 01 00 00 00 00 00 00 00 b8 00 00 00   l...............
000000b0:  10 00 00 00 ff ff ff ff 50 00 00 00 4d 00 69 00   ........P...M.i.
000000c0:  63 00 72 00 6f 00 73 00 6f 00 66 00 74 00 2d 00   c.r.o.s.o.f.t.-.
000000d0:  57 00 69 00 6e 00 64 00 6f 00 77 00 73 00 2d 00   W.i.n.d.o.w.s.-.
000000e0:  53 00 65 00 72 00 76 00 69 00 63 00 65 00 73 00   S.e.r.v.i.c.e.s.
000000f0:  2f 00 44 00 69 00 61 00 67 00 6e 00 6f 00 73 00   /.D.i.a.g.n.o.s.
00000100:  74 00 69 00 63 00 00 00 54 54 42 4c d0 03 00 00   t.i.c...TTBL....
00000110:  02 00 00 00 54 45 4d 50 c0 00 00 00 01 00 00 00   ....TEMP........
00000120:  01 00 00 00 a8 01 00 00 01 00 00 00 fe be 19 ab   ................
00000130:  f0 23 65 5f 2f fd 44 4c 0b e7 4f 99 0f 01 01 00   .#e_/.DL..O.....
00000140:  01 ff ff 5e 00 00 00 44 82 09 00 45 00 76 00 65   ...^...D...E.v.e
00000150:  00 6e 00 74 00 44 00 61 00 74 00 61 00 00 00 02   .n.t.D.a.t.a....
...

Notably, this services.exe from Win10 2020H1 uses the CRIM version 5.1 (in contrast to the libexe description for version 3.1). We'll see why this matters in a moment.

At 0xA306C is the start of an event provider structure (WEVT) for Microsoft-Windows-Services/Diagnostic:

00000000  57 45 56 54 a8 08 00 00 01 00 00 90 08 00 00 00  |WEVT¨...........|
00000010  05 00 00 00 9c 00 00 00 07 00 00 00 08 01 00 00  |................|
00000020  0d 00 00 00 d8 04 00 00 02 00 00 00 24 05 00 00  |....Ø.......$...|
00000030  00 00 00 00 a4 05 00 00 01 00 00 00 e4 05 00 00  |....¤.......ä...|
00000040  03 00 00 00 f4 06 00 00 04 00 00 00 68 07 00 00  |....ô.......h...|
...

At 0xA3128 is the template table (TTBL) and finally at 0xA315C is a binary XML template structure. Ideally, we'd be able to parse the data using this evtx library. I'm currently using the following to parse the data:

        let de = evtx::binxml::deserializer::BinXmlDeserializer::init(
            &buf,
            0x0,
            None,
            false,
            encoding::all::WINDOWS_1252,
        );

        let mut iterator = de.iter_tokens(None)?;

        loop {
            let token = iterator.next();
            if let Some(t) = token {
                debug!("token: {:#x?}", t);
            } else {
                break;
            }
        }

Anyways, here is the binary template:

00000000  0f 01 01 00 01 ff ff 5e 00 00 00 44 82 09 00 45  |.....ÿÿ^...D...E|
00000010  00 76 00 65 00 6e 00 74 00 44 00 61 00 74 00 61  |.v.e.n.t.D.a.t.a|
00000020  00 00 00 02 41 ff ff 3d 00 00 00 8a 6f 04 00 44  |....Aÿÿ=....o..D|
00000030  00 61 00 74 00 61 00 00 00 25 00 00 00 06 4b 95  |.a.t.a...%....K.|
00000040  04 00 4e 00 61 00 6d 00 65 00 00 00 05 01 09 00  |..N.a.m.e.......|
00000050  47 00 72 00 6f 00 75 00 70 00 4e 00 61 00 6d 00  |G.r.o.u.p.N.a.m.|
00000060  65 00 02 0d 00 00 01 04 04 00 00 00 00 00 00 00  |e...............|
...

Unfortunately, this doesn't parse well with the code from this library. Let me explain what I see:

00000000  0f 01 01 00 BinXmlFragmentHeader{version 1.1, flags: 0x0}
                      01 OpenStartElement
                         ff ff dependency identifier
                               5e 00 00 00 data size=0x5E
                                           44 82 <<< hash???
                                                 09 00 number of characters in following wstring
                                                       45 wstring="EventData"
00000010  00 76 00 65 00 6e 00 74 00 44 00 61 00 74 00 61
00000020  00 00 00 end(wstring="EventData"0
                   02 CloseStartElement
                      41 OpenStartElement with Attributes
                         ff ff 3d ...

My guess is that in (at least) format version 5.1 (or 4+???), strings are stored inline rather than as references. I think the structure for tag 01 is maybe:

struct OpenStartElementNoAttributes {
  tag: u8,                             // == 0x01
  dependency_identifier: Option<u16>,  // 0xFFFF -> None
  data_size: u32,
  name_hash: u16,                      // unknown algorithm
  name_character_count: u16,
  name: OsString<utf16>                // name_character_count + trailing NULL character
}

This inline string strategy seems to be used in other parts of the template, too.

I think these strings share a structure with the BinXmlName described by libevtx:

0 4   Unknown
4 2   Name hash Which hash algorithm?
6 2   Number of characters
8 …​   UTF-16 little-endian string with an end-of-string character

So, I wonder if its reasonable to extend read_open_start_element to support this variant of the format. And if so, how to manage the set of features that each variant may support (evtx-file-mode vs WEVT_MODE vs ....).

In a subsequent discussion, assuming we can parse out these templates, then we can chat about how to apply the templates toward data carved from allocated space. But, I haven't gotten this far, yet :-)

@omerbenamram
Copy link
Owner

Hi @williballenthin, thanks for your work on this, it looks really cool 😄

It sounds reasonable to extend read_open_start_element - if we can pass it a flag from the parser telling it how to read the string (if it is indeed determined by the evtx version).

I'll need some time to look into this properly - and I'm a little constrained right now since this isn't something I can spend time at work on.

I'll try to get to this in some upcoming weekend.

@williballenthin
Copy link
Author

Yup, I totally understand. To be clear, I hate to open issues that I won't put effort towards myself, so I hope you don't feel that this creates a burden on you.

For me, I think the biggest question is how to express, construct, and document the code that can parse lots of flavors of the evtx format (there's this immediate issue, and then potentially different versions of the evtx format, etc.). The obvious thing to do is have lots of flags and lots of if/else statements, though it starts to get difficult to track, test, etc. So, before I go opening up a PR that adds a new boolean that's passed all around, I wondered if you had any great ideas here.

@omerbenamram
Copy link
Owner

omerbenamram commented Jun 22, 2020

I agree that adding a lot of if-else branches can get cumbersome, but I think if it's just this small bit of behavior we could probably let it slide.

In general I believe that "duplication is better than the wrong abstraction".

But, if we would need to abstract over it - we would probably need to create some sort of visitor abstraction over the node types, and provide EVTX visitors which behave like the code we have at the moment, and WEVT visitor which can behave differently, and BinXmlDeserializer would be generic over the visitor.

so we would have:

trait BinXmlVisitor {
    // we would need to consider passing a reference to data instead of cursor here, since this can be painful to abstract using cursors
    fn read_open_start_element(data: &[u8], chunk: &Chunk) {
        ...
    }
    fn read_entity_ref_start_element(...) {
        ...
    }
   ...
}

and:

pub struct BinXmlDeserializer<'a, V: BinXmlVisitor> {
    data: &'a [u8],
    offset: u64,
    chunk: Option<&'a EvtxChunk<'a>>,
    // if called from substitution token with value type: Binary XML (0x21)
    is_inside_substitution: bool,
    ansi_codec: EncodingRef,
    deserializer: V
}

I think this would require some refactoring though, and it's probably only worth pursuing if WEVT and EVTX differ by more than a few bits of state.
Is there any spec for where we could reason about the differences between EVTX and WEVT (other than this)?

@truekonrads
Copy link

My approach with kpulp was to "raid" the target system for DLLs which contain expansion strings and then use those. I've looked into parsing those out from PE structure of associated DLLs that are registered as message template providers which seems quite feasible.
From experience, compiling a "master" database of expansion/template strings is error prone as it is heavily version specific. The template you got from Win2k3 won't work on Win10 and then there are regional language issues to address.

It's a tarpit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants