Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: First doc review for v0.2 beta #109

Merged
merged 27 commits into from
May 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
a6ea6d1
docs: Removed deprecated getting-started.md added transformation engi…
wwoytenko May 11, 2024
6e04dca
[docs] Added two points in key features list. Removed old artifacts
wwoytenko May 12, 2024
666f4b6
Merge remote-tracking branch 'origin_public/main' into docs/v0_2_revi…
wwoytenko May 12, 2024
3f2b70c
[docs] Removed old artifacts
wwoytenko May 12, 2024
d5340ce
docs: Edited documentation for NoiseDate transformer. Added info abou…
wwoytenko May 13, 2024
f98e69e
Merge remote-tracking branch 'origin_public/main' into docs/v0_2_revi…
wwoytenko May 15, 2024
0914064
docs: Fixed noise_date.md and refactored noise_float.md
wwoytenko May 15, 2024
6d53c2c
doc: Reviewed documentation for NoiseInt and NoiseFloat transformers
wwoytenko May 15, 2024
1def11f
doc: Reviewed RandomBool transformer
wwoytenko May 15, 2024
e0ff104
doc: Reviewed RandomChoice transformer
wwoytenko May 15, 2024
febd114
doc: Reviewed RandomUuid transformer
wwoytenko May 15, 2024
7cf838f
doc: Reviewed RandomString transformer
wwoytenko May 15, 2024
5d13609
doc: Added NoiseNumeric transformer documentation
wwoytenko May 15, 2024
fcee37f
Merge remote-tracking branch 'origin_public/main' into docs/v0_2_revi…
wwoytenko May 16, 2024
8b8a0a7
fix: renamed precision parameter to decimal in docs
wwoytenko May 16, 2024
7b5c2fb
Merge branch 'docs/v0_2_review_1' of github.com:GreenmaskIO/greenmask…
wwoytenko May 16, 2024
2b52460
Merge remote-tracking branch 'origin_public/main' into docs/v0_2_revi…
wwoytenko May 16, 2024
db0a35e
Merge remote-tracking branch 'origin_public/main' into docs/v0_2_revi…
wwoytenko May 16, 2024
48250d3
doc: Revised doc for RandomInt transformer
wwoytenko May 16, 2024
a085b15
doc: Added transformation result in some transformers doc
wwoytenko May 16, 2024
c5e4a57
Merge remote-tracking branch 'origin_public/main' into docs/v0_2_revi…
wwoytenko May 16, 2024
9db7fe7
doc: Refactored docs
wwoytenko May 17, 2024
49d45a1
doc: Added doc for RandomEmail transformer
wwoytenko May 17, 2024
98f7c01
doc: Added transformation_engines.md
wwoytenko May 17, 2024
43248f6
doc: Fixed description
wwoytenko May 17, 2024
b0aa2b4
doc: Added tsModify documentation
wwoytenko May 17, 2024
87d5455
doc: Roadmap on the main page and hash func info
wwoytenko May 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 6 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,12 @@ backward-compatible with existing PostgreSQL utilities.

# Features

* **Deterministic transformers** — deterministic approach to data transformation based on the hash
functions. This ensures that the same input data will always produce the same output data. Almost each transformer
supports either `random` or `hash` engine making it universal for any use case.
* **Dynamic parameters** — almost each transformer supports dynamic parameters, allowing to parametrize the
transformer dynamically from the table column value. This is helpful for resolving the functional dependencies
between columns and satisfying the constraints.
* **Cross-platform** - Can be easily built and executed on any platform, thanks to its Go-based architecture,
which eliminates platform dependencies.
* **Database type safe** - Ensures data integrity by validating data and utilizing the database driver for
Expand Down Expand Up @@ -52,9 +58,6 @@ solution for managing obfuscation procedures. We recognize the challenges of mai
throughout the software lifecycle. Greenmask is dedicated to providing valuable tools and features that ensure the
obfuscation process remains fresh, predictable, and transparent.

## [Getting started](./getting_started.md)


### General Information

It is evident that the most appropriate approach for executing logical backup dumping and restoration is by leveraging
Expand Down Expand Up @@ -98,10 +101,6 @@ Greenmask introduces the concept of **Storages**.
various cloud-based storage solutions.
* **directory** - This is the standard choice, representing the ordinary filesystem directory for local storage.

!!! note
If you have suggestions for additional storage options that would be valuable to implement, please feel free to
share your ideas. Greenmask aims to accommodate a wide range of storage preferences to suit diverse backup needs.

## Restoration Process

In the restoration process, Greenmask combines the capabilities of different tools:
Expand Down
172 changes: 63 additions & 109 deletions config.yml.example
Original file line number Diff line number Diff line change
Expand Up @@ -25,137 +25,91 @@ dump:
load-via-partition-root: true

transformation:
- schema: "bookings"
name: "flights"
query: "select * from bookings.flights limit 100"
columns_type_override:
post_code: "int4"
- schema: "public"
name: "account"
transformers:
- name: "RandomDate"
params:
min: "2023-01-01 00:00:00.0+03"
max: "2023-01-02 00:00:00.0+03"
column: "scheduled_departure"

- name: "NoiseDate"
- name: "RandomInt"
params:
ratio: "1 day"
column: "scheduled_arrival"
column: "id"
engine: hash
min: 1
max: 2147483647

- name: "RegexpReplace"
- name: "RandomChoice"
params:
column: "departure_airport"
regexp: "DME"
replace: "SVO"
column: "gender"
values:
- "M"
- "F"

- name: "RegexpReplace"
- name: "RandomPerson"
params:
column: "status"
regexp: "On Time"
replace: "Delayed"
columns:
- name: "first_name"
template: "{{ .FirstName }}"
- name: "last_name"
template: "{{ .LastName }}"
dynamic_params:
gender:
column: gender

- name: "Email"
params:
column: "email"
engine: "hash"
keep_original_domain: true
keep_null: false
local_part_template: "{{ first_name | lower }}.{{ last_name | lower }}"

- name: "RandomDate"
params:
column: "actual_departure"
min: "2023-01-03 01:00:00.0+03"
max: "2023-01-04 00:00:00.0+03"
column: "birth_date"
min: '{{ now | tsModify "-30 years" | .EncodeValue }}' # 1994
max: '{{ now | tsModify "-18 years" | .EncodeValue }}' # 2006

- name: "RandomDate"
params:
column: "actual_arrival"
min: "2023-01-04 01:00:00.0+03"
max: "2023-01-05 00:00:00.0+03"
column: "created_at"
max: "{{ now | .EncodeValue }}"
truncate: "day"
dynamic_params:
min:
column: "birth_date"
template: '{{ .GetValue | tsModify "18 years" | .EncodeValue }}'

- name: "RandomInt"
params:
column: "post_code"
min: "11"
max: "99"

- name: "Replace"
params:
column: "post_code"
value: "54321"
- schema: "public"
name: "orders"
transformers:

- name: "TwoDatesGen"
- name: "RandomInt"
params:
column_a: "scheduled_arrival"
column_b: "actual_arrival"
column: "account_id"
engine: hash
min: 1
max: 2147483647

- name: "TestTransformer"
- name: "NoiseNumeric"
params:
column: "actual_arrival"
column: "total_price"
decimal: 2
min_ratio: 0.1
max_ratio: 0.9

- name: "Cmd"
params:
executable: "cmd_test.sh"
driver:
name: "json"
params:
format: "bytes"
timeout: "60s"
validate_output: true
expected_exit_code: -1
skip_on_behaviour: "any"
columns:
- name: "actual_arrival"
skip_original_data: true
skip_on_null_input: true
- name: "scheduled_arrival"
skip_original_data: true
#
- name: "TestTransformer"
- name: "NoiseDate"
params:
column: "scheduled_arrival"
column: "created_at"
max_ratio: "6 day"
min_ratio: "1 day"
truncate: "day"

- schema: "bookings"
name: "measurement"
apply_for_inherited: True
transformers:
- name: "RandomDate"
params:
column: "logdate"
min: "2023-01-03"
max: "2023-01-30"

- name: "TemplateRecord"
params:
validate: false
columns:
- "scheduled_departure"
template: >
{{- $val := .GetValue "scheduled_departure" -}}
{{- if isNull $val -}}
{{ now | dateModify "24h" | .SetValue "scheduled_departure" }}
{{ else }}
{{ now | dateModify "48h" | .SetValue "scheduled_departure" }}
{{ end }}


- schema: "bookings"
name: "aircrafts_data"
transformers:
- name: "Json"
params:
column: "model"
operations:
- operation: "set"
path: "en"
value: "Boeing 777-300-2023"
- operation: "set"
path: "crewSize"
value: 10

- name: "NoiseInt"
params:
ratio: 0.9
column: "range"

- name: "NoiseFloat"
params:
ratio: 0.1
column: "test_float"
precision: 2
column: "paid_at"
max: '{{ now | .EncodeValue }}'
truncate: "day"
dynamic_params:
min:
column: "created_at"

restore:
pg_restore_options:
Expand Down
4 changes: 0 additions & 4 deletions docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,10 +30,6 @@ Greenmask introduces the concept of storages.
* `s3` — this option supports any S3-like storage system, including AWS S3, which makes it versatile and adaptable to various cloud-based storage solutions.
* `directory` — this is the standard choice, representing the ordinary filesystem directory for local storage.

!!! note
If you have suggestions for additional storage options that would be valuable to implement, feel free to
share your ideas with us. Greenmask aims to accommodate a wide range of storage preferences to suit diverse backup needs.

## Restoration process

In the restoration process, Greenmask combines the capabilities of different tools:
Expand Down
Binary file added docs/assets/built_in_transformers/img.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed docs/assets/getting_started/list-dumps.png
Binary file not shown.
Binary file not shown.
Binary file removed docs/assets/getting_started/validate-result.png
Binary file not shown.
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,8 @@ Below you can find custom core functions which are divided into categories based
### masking

Replaces characters with asterisk `*` symbols depending on the provided masking rule. If the
value is `NULL`, it is kept unchanged. This function is based on [ggwhite/go-masker](https://github.com/ggwhite/go-masker).
value is `NULL`, it is kept unchanged. This function is based
on [ggwhite/go-masker](https://github.com/ggwhite/go-masker).

=== "Masking rules"

Expand Down Expand Up @@ -113,16 +114,18 @@ Adds or subtracts a random duration in the provided `interval` to or from the or

### noiseFloat

Adds or subtracts a random fraction to or from the original float value. Multiplies the original float value by a provided random value that is not higher than the `ratio` parameter and adds it to the original value with the option to specify the precision via the `precision` parameter.
Adds or subtracts a random fraction to or from the original float value. Multiplies the original float value by a
provided random value that is not higher than the `ratio` parameter and adds it to the original value with the option to
specify the decimal via the `decimal` parameter.

=== "Signature"

`noiseFloat(ratio float, precision int, value float) (res float64, err error)`
`noiseFloat(ratio float, decimal int, value float) (res float64, err error)`

=== "Parameters"

* `ratio` — the maximum multiplier value in the interval (0:1). The value will be randomly generated up to `ratio`, multiplied by the original value, and the result will be added to the original value.
* `precision` — the precision of the resulted value
* `decimal` — the decimal of the resulted value
* `value` — the original value

=== "Return values"
Expand All @@ -132,7 +135,8 @@ Adds or subtracts a random fraction to or from the original float value. Multipl

### noiseInt

Adds or subtracts a random fraction to or from the original integer value. Multiplies the original integer value by a provided random value that is not higher than the `ratio` parameter and adds it to the original value.
Adds or subtracts a random fraction to or from the original integer value. Multiplies the original integer value by a
provided random value that is not higher than the `ratio` parameter and adds it to the original value.

=== "Signature"

Expand Down Expand Up @@ -176,13 +180,13 @@ Generates a random float value within the provided interval.

=== "Signature"

`randomFloat(min any, max any, precision int) (res float, err error)`
`randomFloat(min any, max any, decimal int) (res float, err error)`

=== "Parameters"

* `min` — the minimum random value threshold
* `max` — the maximum random value threshold
* `precision` — the precision of the resulted value
* `decimal` — the decimal of the resulted value

=== "Return values"

Expand Down Expand Up @@ -229,18 +233,37 @@ Generates a random string using the provided characters within the specified len

### roundFloat

Rounds a float value up to provided precision.
Rounds a float value up to provided decimal.

=== "Signature"

`roundFloat(precision int, original float) (res float, err error)`
`roundFloat(decimal int, original float) (res float, err error)`

=== "Parameters"

* `precision` — the precision of the value
* `decimal` — the decimal of the value
* `original` — the original float value

=== "Return values"

* `res` — a rounded float value
* `err` — an error if there is an issue

### tsModify

Modify original time value by adding or subtracting the provided interval. The interval is a string in the format of
the [PostgreSQL interval](https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-INTERVAL-INPUT).

=== "Signature"

`tsModify(interval string, val time.Time) (time.Time, error)`

=== "Parameters"

* `interval` — the maximum value of `ratio` that is added to the original value. The format is the same as in the [PostgreSQL interval format](https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-INTERVAL-INPUT).
* `original` — the original time value

=== "Return values"

* `res` — a modified date
* `err` — an error if there is an issue
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Modify records using a Go template and apply changes by using the PostgreSQL dri

## Description

`TemplateRecord` uses [Go templates](https://pkg.go.dev/text/template) to change data. However, while the [Template transformer](/template.md) operates with a single column and automatically applies results, the `TemplateRecord` transformer can make changes to a set of columns in the string, and using driver functions `.SetValue` or `.SetRawValue` is mandatory to do that.
`TemplateRecord` uses [Go templates](https://pkg.go.dev/text/template) to change data. However, while the [Template transformer](./template.md) operates with a single column and automatically applies results, the `TemplateRecord` transformer can make changes to a set of columns in the string, and using driver functions `.SetValue` or `.SetRawValue` is mandatory to do that.

With the `TemplateRecord` transformer, you can implement complicated transformation logic using basic or custom template functions. Below you can get familiar with the basic template functions for the `TemplateRecord` transformer. For more information about available custom template functions, see [Custom functions](custom_functions/index.md).

Expand Down
Loading
Loading