Skip to content

Commit

Permalink
Update dissect and user_agent readme (opensearch-project#4100)
Browse files Browse the repository at this point in the history
* Update dissect and user_agent readme

Signed-off-by: Hai Yan <[email protected]>

* Fix format issue

Signed-off-by: Hai Yan <[email protected]>

---------

Signed-off-by: Hai Yan <[email protected]>
  • Loading branch information
oeyh authored Feb 22, 2024
1 parent cad5e3a commit ef5d5e4
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 170 deletions.
128 changes: 2 additions & 126 deletions data-prepper-plugins/dissect-processor/README.md
Original file line number Diff line number Diff line change
@@ -1,129 +1,5 @@
# Dissect Processor

The Dissect processor is useful when dealing with log files or messages that have a known pattern or structure. It extracts specific pieces of information from the text and map them to individual fields based on the defined Dissect patterns.
The dissect processor extracts values from an event and maps them to individual fields based on user-defined dissect patterns.


## Basic Usage

To get started with dissect processor using Data Prepper, create the following `pipeline.yaml`.
```yaml
dissect-pipeline:
source:
file:
path: "/full/path/to/dissect_logs_json.log"
record_type: "event"
format: "json"
processor:
- dissect:
map:
log: "%{Date} %{Time} %{Log_Type}: %{Message}"
sink:
- stdout:
```
Create the following file named `dissect_logs_json.log` and replace the `path` in the file source of your `pipeline.yaml` with the path of this file.

```
{"log": "07-25-2023 10:00:00 ERROR: Some error"}
```
The Dissect processor will retrieve the necessary fields from the `log` message, such as `Date`, `Time`, `Log_Type`, and `Message`, with the help of the pattern `%{Date} %{Time} %{Type}: %{Message}`, configured in the pipeline.
When you run Data Prepper with this `pipeline.yaml` passed in, you should see the following standard output.
```
{
"log" : "07-25-2023 10:00:00 ERROR: Some error",
"Date" : "07-25-2023"
"Time" : "10:00:00"
"Log_Type" : "ERROR"
"Message" : "Some error"
}
```
The fields `Date`, `Time`, `Log_Type`, and `Message` have been extracted from `log` value.
## Configuration
* `map` (Required): `map` is required to specify the dissect patterns. It takes a `Map<String, String>` with fields as keys and respective dissect patterns as values.
* `target_types` (Optional): A `Map<String, String>` that specifies what the target type of specific field should be. Valid options are `integer`, `double`, `string`, and `boolean`. By default, all the values are `string`. Target types will be changed after the dissection process.
* `dissect_when` (Optional): A Data Prepper Expression string following the [Data Prepper Expression syntax](../../docs/expression_syntax.md). When configured, the processor will evaluate the expression before proceeding with the dissection process and perform the dissection if the expression evaluates to `true`.
## Field Notations
Symbols like `?, +, ->, /, &` can be used to perform logical extraction of data.
* **Normal Field** : The field without a suffix or prefix. The field will be directly added to the output Event.
Ex: `%{field_name}`
* **Skip Field** : ? can be used as a prefix to key to skip that field in the output JSON.
* Skip Field : `%{}`
* Named skip field : `%{?field_name}`
* **Append Field** : To append multiple values and put the final value in the field, we can use + before the field name in the dissect pattern
* **Usage**:
Pattern : "%{+field_name}, %{+field_name}"
Text : "foo, bar"
Output : {"field_name" : "foobar"}
We can also define the order the concatenation with the help of suffix `/<digits>` .
* **Usage**:
Pattern : "%{+field_name/2}, %{+field_name/1}"
Text : "foo, bar"
Output : {"field_name" : "barfoo"}
If the order is not mentioned, the append operation will take place in the order of fields specified in the dissect pattern.<br><br>
* **Indirect Field** : While defining a pattern, prefix the field with a `&` to assign the value found with this field to the value of another field found as the key.
* **Usage**:
Pattern : "%{?field_name}, %{&field_name}"
Text: "foo, bar"
Output : {“foo” : “bar”}
Here we can see that `foo` which was captured from the skip field `%{?field_name}` is made the key to value captured form the field `%{&field_name}`
* **Usage**:
Pattern : %{field_name}, %{&field_name}
Text: "foo, bar"
Output : {“field_name”:“foo”, “foo”:“bar”}
We can also indirectly assign the value to an appended field, along with `normal` field and `skip` field.
### Padding
* `->` operator can be used as a suffix to a field to indicate that white spaces after this field can be ignored.
* **Usage**:
Pattern : %{field1→} %{field2}
Text : “firstname lastname”
Output : {“field1” : “firstname”, “field2” : “lastname”}
* This operator should be used as the right most suffix.
* **Usage**:
Pattern : %{fieldname/1->} %{fieldname/2}
If we use `->` before `/<digit>`, the `->` operator will also be considered part of the field name.
## Developer Guide
This plugin is compatible with Java 14. See
- [CONTRIBUTING](https://github.com/opensearch-project/data-prepper/blob/main/CONTRIBUTING.md)
- [monitoring](https://github.com/opensearch-project/data-prepper/blob/main/docs/monitoring.md)
See the [`dissect` processor documentation](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/dissect/).
45 changes: 1 addition & 44 deletions data-prepper-plugins/user-agent-processor/README.md
Original file line number Diff line number Diff line change
@@ -1,47 +1,4 @@
# User Agent Processor
This processor parses User-Agent (UA) string in an event and add the parsing result to the event.

## Basic Example
An example configuration for the process is as follows:
```yaml
...
processor:
- user_agent:
source: "ua"
target: "user_agent"
...
```

Assume the event contains the following user agent string:
```json
{
"ua": "Mozilla/5.0 (iPhone; CPU iPhone OS 13_5_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.1 Mobile/15E148 Safari/604.1"
}
```

The processor will parse the "ua" field and add the result to the specified target in the following format compatible with Elastic Common Schema (ECS):
```
{
"user_agent": {
"original": "Mozilla/5.0 (iPhone; CPU iPhone OS 13_5_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.1 Mobile/15E148 Safari/604.1",
"os": {
"version": "13.5.1",
"full": "iOS 13.5.1",
"name": "iOS"
},
"name": "Mobile Safari",
"version": "13.1.1",
"device": {
"name": "iPhone"
}
},
"ua": "Mozilla/5.0 (iPhone; CPU iPhone OS 13_5_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.1 Mobile/15E148 Safari/604.1"
}
```

## Configuration
* `source` (Required) — The key to the user agent string in the Event that will be parsed.
* `target` (Optional) — The key to put the parsing result in the Event. Defaults to `user_agent`.
* `exclude_original` (Optional) — Whether to exclude original user agent string from the parsing result. Defaults to false.
* `cache_size` (Optional) - Cache size to use in the parser. Should be a positive integer. Defaults to 1000.
* `tags_on_parse_failure` (Optional) - Tags to add to an event if the processor fails to parse the user agent string.
See the [`user_agent` processor documentation](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/user-agent/).

0 comments on commit ef5d5e4

Please sign in to comment.