Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sorting of source array in the fingerprint filter prevents building bidirectional IP flows #7

Open
jordansissel opened this issue May 18, 2015 · 2 comments

Comments

@jordansissel
Copy link
Contributor

(This issue was originally filed by @nicholas-marshall at elastic/logstash#2396)


Good Day,

I am working on creating hash values for the 5-tupes of src_ip, src_port, dest_ip, dest_port, proto and then dest_ip, dest_port, src_ip, src_port, proto in order to use these two fingerprints to build bidirectional flows out of flow data I am collecting. However with the following fingerprint filter:

Fingerprint the communications flow by creating source and destination hashes over the IP and ports of the source and destination. The src_hash will be the src_ip, src_port, dest_ip, dst_port and the dest_hash will be dest_ip, dest_port, src_ip, src_port. Then joining duplex flows becomes possible.

if [src_ip] and [dest_ip] {
fingerprint {
concatenate_sources => true
method => "SHA1"
key => "KEYKEYKEY"
source => [ "src_ip", "src_port", "dest_ip", "dest_port", "proto" ]
target => "src_fingerprint"
}

fingerprint {
concatenate_sources => true
method => "SHA1"
key => "KEYKEYKEY"
source => [ "dest_ip", "dest_port", "src_ip", "src_port", "proto" ]
target => "dest_fingerprint"
}
}

Both src_fingerprint and dest_fingerprint are the same. I find this to be very confusing as a fingerprint should be unique and a hash of two strings should be different values. Digging into the ruby code of fingerprint.rb on line 63 has @source.sort.each do |k| which sorts the values in source before concatenating them. So by sorting the values of source before hashing them causes collisions and non-unique values.

I fixed it for my use-case by changing @source.sort.each do |k| to @source.each do |k|, however I suggest adding an option in the fingerprint filter to the effect of unsorted_source => true. Removing the sort part of the code at this point would break backwards compatibility as fingerprints would suddenly change.

Sincerely,

Nicholas Marshall

@sliddjur
Copy link

This problem still exist to me, and it doesn't seem like 7292935 has fixed it.
I am running v3.2.2 of logstash fingerprint plugin

Sample data:

"fw": { "talkers": [ "222.222.222.222", "111.111.111.111"  ] }
"fw": { "talkers": [  "111.111.111.111", "222.222.222.222" ] }

Now I run fingerprint on this value to produce hash

  fingerprint {
    method => "MURMUR3"
    source => "[fw][talkers]"
    target => "[fw][talkers_hash]"
    concatenate_sources => true
}

And they don't produce the same result.

This also doesn't sort before fingerprint. both source fields are a string with an ipv4 address.

      fingerprint {
        method => "MURMUR3"
        source => [ "[fw][src_ip]", "[fw][dst_ip]" ]
        target => "[fw][talkers_hash2]"
        concatenate_sources => true
      }

For me, the workaround is using ruby filter to sort before fingerprint it
ruby { code => 'event.set("[fw][talkers]", event.get("[fw][talkers]").sort)' }

@rahulsinghai
Copy link

Hi @sliddjur , sorting of fields specified in source field has been removed by 7292935.
Now it is up to end user to specify order of fields that he/she wants to be considered while calculating hash.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants