Skip to content

Commit

Permalink
docs: Fixed noise_date.md and refactored noise_float.md
Browse files Browse the repository at this point in the history
  • Loading branch information
wwoytenko committed May 15, 2024
1 parent d5340ce commit 0914064
Show file tree
Hide file tree
Showing 2 changed files with 33 additions and 11 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,8 @@ Randomly add or subtract a duration within the provided `ratio` interval to the
## Description

The `NoiseDate` transformer randomly generates duration between `min_ratio` and `max_ratio` parameter and adds it to or
subtracts it from the original date value. The `ratio` parameter must be written in
the [PostgreSQL interval format](https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-INTERVAL-INPUT).
subtracts it from the original date value. The `min_ratio` or `max_ratio` parameters must be written in the
[PostgreSQL interval format](https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-INTERVAL-INPUT).
You can also truncate the resulted date up to a specified part by setting the `truncate` parameter.

In case you have constraints on the date range, you can set the `min` and `max` parameters to specify the threshold
Expand Down Expand Up @@ -59,7 +59,7 @@ to `1 year 2 months 3 days 4 hours 5 minutes 6 seconds and 7 milliseconds` with
In the following example, the original `timestamp` value of `hiredate` will be noised up
to `1 year 2 months 3 days 4 hours 5 minutes 6 seconds and 7 milliseconds` with truncation up to the `month` part.
The `max` threshold is set to `2020-01-01 00:00:00`, and the `min` threshold is set to the `birthdate` column. If the
The `max` threshold is set to `2020-01-01 00:00:00`, and the `min` threshold is set to the `birthdate` column. If the
`birthdate` column is `NULL`, the default value `1990-01-01` will be used. The hash engine is used for deterministic
generation - the same input will always produce the same output.

Expand Down
38 changes: 30 additions & 8 deletions docs/built_in_transformers/standard_transformers/noise_float.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,35 @@ Add or subtract a random fraction to the original float value.

## Parameters

| Name | Description | Default | Required | Supported DB types |
|-----------|----------------------------------------------------------------------------------------------------------|---------|----------|---------------------------------------------------|
| column | The name of the column to be affected | | Yes | float4 (real), float8 (double precision), numeric |
| ratio | The maximum random percentage for noise, from `0` to `1`, e. g. `0.1` means "add noise up to 10%" | | Yes | - |
| precision | The precision of the noised float value (number of digits after the decimal point) | `4` | No | - |
| Name | Description | Default | Required | Supported DB types |
|-----------|---------------------------------------------------------------------------------------------------|----------|----------|--------------------|
| column | The name of the column to be affected | | Yes | float4, float8 |
| precision | The precision of the noised float value (number of digits after the decimal point) | `4` | No | - |
| min_ratio | The minimum random percentage for noise, from `0` to `1`, e. g. `0.1` means "add noise up to 10%" | `0.05` | No | - |
| max_ratio | The maximum random percentage for noise, from `0` to `1`, e. g. `0.1` means "add noise up to 10%" | | Yes | - |
| min | Min threshold of noised value | | No | - |
| max | Min threshold of noised value | | No | - |
| engine | The engine used for generating the values [random, hash]. Use hash for deterministic generation | `random` | No | - |

## Description

The `NoiseFloat` transformer multiplies the original float value by a provided random value that is not higher than
the `ratio` parameter and adds it to or subtracts it from the original value. Additionally, you can specify the number of decimal digits by using the `precision` parameter.
the `max_ratio` parameter and not less that `max_ratio` parameter and adds it to or subtracts it from the original
value. Additionally, you can specify the number of decimal digits by using the `precision` parameter. In case you have
constraints on the float range, you can set the `min` and `max` parameters to specify the threshold values. The values
for `min` and `max` must have the same format as the `column` parameter. Parameters min and max support dynamic mode.

!!! info

If the noised value exceeds the `max` threshold, the transformer will set the value to `max`. If the noised value
is lower than the `min` threshold, the transformer will set the value to `min`.

## Dynamic parameters

| Parameter | Supported types |
|-----------|----------------------------------|
| min | float4, float8, int2, int4, int8 |
| max | float4, float8, int2, int4, int8 |

## Example: Adding noise to the purchase price

Expand All @@ -23,7 +42,10 @@ In this example, the original value of `standardprice` will be noised up to `50%
transformers:
- name: "NoiseFloat"
params:
column: "standardprice"
ratio: 0.5
column: "lastreceiptcost"
max_ratio: 0.15
precision: 2
dynamic_params:
min:
column: "standardprice"
```

0 comments on commit 0914064

Please sign in to comment.