feat: influx inspect export parquet #25047

alespour · 2024-06-07T16:02:57Z

Closes https://github.com/influxdata/edge/issues/672

This PR extends influx_inspect command export with Parquet output. The code in cmd/influx_inspect/export/parquet folder is Parquet exporter code and related code ported from idpe (command snapshot store tsm export), just the minimal required subset for the export.

New influx_inspect options:

-measurement - selects measurement to be exported required
-parquet - selects Parquet output instead of line protocol
-chunk-size - size, in bytes, to partition Parquet files (default 100000000)

Output file(s) are created in a folder specified via existing -out option. The limitations are:

-database, -retention and -measurement must be specified
only TSM files are exported (not WAL files, unlike when exporting to line protocol) - if requested, can be easily implemented
export to Parquet file(s) is done per each TSM file. The files are apparently not sorted by time of the contained data by the reading code. Therefore, neither are output files. So for example table-00001.parquet may contain older data than table-00000.parquet. Seems irrelevant for import.

I've read the contributing section of the project README.

alespour · 2024-06-12T09:40:25Z

Export example cmd:

influx_inspect export -datadir /var/lib/influxdb/data/ -waldir /var/lib/influxdb/wal/ -out /bigdata/export/ -database benchmark_db -retention autogen -measurement cpu -parquet

Import via telegraf:

[[inputs.file]]
   files = ["/bigdata/export/table-*.parquet"]
   name_override = "cpu"
   data_format = "parquet"
   tag_columns = ["datacenter","hostname","os","rack","region","service","team"]
   timestamp_column = "time"
   timestamp_format = "unix_ns"

telegraf --once

powersj · 2024-06-12T14:03:39Z

@davidby-influx could we get Stuart's review on this PR? While not urgent, it would be nice to keep up the momentum on this.

@alespour I have two comments:

In the README I'd rather see some examples of running with this new option + the required params
Is there a reason you are using arrow v14 and not v16? I assume that is copied over as well?

alespour · 2024-06-12T14:08:19Z

@powersj Yes, arrow v14 is used used in v2 exporter and it was just copied. I'll update the dep to v16. And add some examples of running the tool with Parquet output.

davidby-influx

I am not familiar with Parquet, so I leave the correctness of the algorithms to other reviewers like @stuartcarnie. This review focuses on code robustness and debugability when errors occur from things like bad input.

cmd/influx_inspect/export/export.go

davidby-influx · 2024-07-19T18:45:25Z

cmd/influx_inspect/export/export_parquet.go

+		// since code from v2 exporter is used, we need v2 series key
+		seriesKeyEx := append([]byte(models.MeasurementTagKey+"="), []byte(key)...)
+		// get tags
+		tags := models.ParseTags(models.Escaped{B: seriesKeyEx})


Is there the possibility of an error if tags comes back zero-length or nil? Does that generate legal parquet?

Not a request for change, just a question.

cmd/influx_inspect/export/export_parquet.go

cmd/influx_inspect/export/parquet/models/points.go

stuartcarnie

As noted in the comments, I have identified a few issues that must be addressed, as the data is not exporting correctly.

The following is just some observations, to document my understanding of the implementation, including some potential downsides. I don't know the goals of the Parquet export, so the downsides may not be a concern. @powersj and @garylfowler this is for your reference too.

Parquet file output

For each measurement, this implementation generates 1 parquet file for each TSM file, and once addressed, it will also generate a parquet file for each shard that has a WAL with data for the measurement.

For example, say there are 10 shards, and each shard contains 10 TSM files, and the user exports 10 measurements, the result may be up to 1,100 parquet files¹.

Note

It is expected that each measurement must be in its own Parquet file, as the schema across measurements is potentially different.

Potential pitfalls

A disadvantage of this approach is that the schema for each parquet is determined by a single TSM file. This becomes an issue if the schema for a measurement varies across TSM files. Within a single shard, the field types cannot change, but the tag set may vary.

For example, if a user writes some data, such as:

m0,tag00=val00,tag01=val00 fieldF=1.3
m0,tag00=val01,tag01=val00 fieldF=1.3
m0,tag00=val00,tag01=val01 fieldF=1.3

The columns looks like this:

tag00,tag01,fieldF

If a later TSM file contains writes such as:

m0,tag00=val00,tag01=val00,tag02=val00 fieldF=1.3
m0,tag00=val00,tag01=val00,tag02=val00 fieldF=1.3
m0,tag00=val00,tag01=val00,tag02=val00 fieldF=1.3

The schema now looks like this:

tag00,tag01,tag02,fieldF

In practice, that might mean that querying all the tables using tools like duckdb are more difficult, as the schema varies across files. I verified this, and whilst the data was still queryable, duckdb dropped columns that didn't exist in all parquet files.

Note

The source implementation from Cloud 2 contains logic to merge multiple TSM files together and to generate a merged schema. If that approach is desired, then the current 1.x implementation will need to be refactored to use a streaming approach, where TSM data is read and merged from disk, and written in chunks to Parquet, much like the Cloud 2 implementation.

Assuming each TSM file contains all measurements, and every shard contains a WAL with all measurements. ↩

stuartcarnie · 2024-07-24T02:08:19Z

cmd/influx_inspect/export/export_parquet.go

+	if err := cmd.writeToParquet(newResultSet(vc), schema); err != nil {
+		return err
+	}


Exporting a measurement that does not exist causes this line to panic.

stuartcarnie · 2024-07-24T03:29:28Z

cmd/influx_inspect/export/export_parquet.go

+
+	var schema *index.MeasurementSchema
+
+	for key, fields := range vc {


This loop does not produce a valid schema, if the user inserts data with mixed tag sets.

Using influx to create some test data:

> create database mixed_schema > use mixed_schema Using database mixed_schema > insert test,tag0=val0 fieldF=1.2 > insert test,tag0=val0,tag1=val0 fieldF=3.1

And then running the command to export some data for the test measurement¹. We can view the data using duckdb:

duckdb -box -s "from 'parquet/*.parquet' select *"

time tag0 fieldF -------------------------- ---- ------ 2024-07-24 02:57:41.285306 1.2 2024-07-24 02:57:45.308777 3.1

Note

The tag1 column is missing.

We'll address the missing data in the tag0 column separately.

To produce the correct schema, all series keys must be enumerated and the tag keys merged. An example implementation is the KeyMerger type:

influxdb/storage/reads/keymerger.go

Lines 10 to 15 in 636a27e

// tagsKeyMerger is responsible for determining a merged set of tag keys

type KeyMerger struct {

i int

tmp [][]byte

keys [2][][]byte

}

Footnotes

ranging over a map produces keys in a non-deterministic order, so you may need to run the export multiple times to achieve the result above ↩

cmd/influx_inspect/export/parquet/models/points.go

stuartcarnie · 2024-07-24T03:59:34Z

cmd/influx_inspect/export/export.go

+		if cmd.measurement != "" && cmd.measurement != strings.Split(string(measurement), ",")[0] {
+			continue
+		}


Measurements can have escaped , values, so it is not possible to export all measurements with this function. Example:

> create database measurement_with_comma > insert cols\,bad,tag0=tag0_0,tag1=tag1_0 fieldF=3.2 > insert cols\,bad,tag0=tag0_1,tag1=tag1_0 fieldF=1.2 > select * from "cols,bad" name: cols,bad time fieldF tag0 tag1 ---- ------ ---- ---- 1721793491299269000 3.2 tag0_0 tag1_0 1721793494708317000 1.2 tag0_1 tag1_0

Use models.ParseName to extract the measurement correctly.

stuartcarnie · 2024-07-24T04:00:25Z

cmd/influx_inspect/export/export.go

+				if cmd.measurement != "" && cmd.measurement != strings.Split(string(measurement), ",")[0] {
+					continue
+				}
+


As noted previously, measurements can have escaped , values, so it is not possible to export with this function.

stuartcarnie · 2024-07-24T04:07:53Z

cmd/influx_inspect/export/export.go

Data in WAL files is not exported to parquet, as there is no call to exportDone.

stuartcarnie · 2024-07-24T05:38:46Z

cmd/influx_inspect/export/parquet/table/table.go

+		_, t.buf.tags = models.ParseKeyBytesWithTags(groupKey, t.buf.tags)
+		tagSet := t.buf.tags[1:] // skip measurement, which is always the first tag key (\x00)


This results in the first tag key column dropping all its data, as this code assumes a V2 series key, which it is not.

For example, using the following data:

> drop measurement cols > insert cols,tag0=tag0_0,tag1=tag1_0 fieldF=3.2 > insert cols,tag0=tag0_0,tag1=tag1_1 fieldF=1.2 > insert cols,tag0=tag0_1,tag1=tag1_0 fieldF=1.3 > insert cols,tag0=tag0_2,tag1=tag1_1 fieldF=1.3 > insert cols,tag0=tag0_2,tag1=tag1_2 fieldF=4.3

The exported parquet is missing all data for column tag0:

duckdb -column -s "from 'parquet/*.parquet' select *"

time tag0 tag1 fieldF -------------------------- ---- ------ ------ 2024-07-24 06:04:38.273453 tag1_2 4.3 2024-07-24 06:04:07.819015 tag1_0 3.2 2024-07-24 06:04:13.952354 tag1_1 1.2 2024-07-24 06:04:21.444498 tag1_0 1.3 2024-07-24 06:04:34.224291 tag1_1 1.3

As noted earlier, given this code comes from V2, it assumes it was parsed from a V2 TSM series keys. Structurally, the series keys are the same, taking the form:

<measurement>[,tag0=val,...]#!~#fieldkey

Semantically, they are different, as V2 series keys are always:

orgid+bucketid,\x00=<measurement>[,tag0=val,...]\xff=<fieldkey>#!~#fieldkey

Note

Both are ordered the same when stored in a TSM file.

One approach for situations like this, is to introduce new types to encapsulate what type of TSM series key is represented by the []byte. For example the following struct:

type SeriesKeyV1 struct { B []byte }

is a series key read from disk in InfluxDB v1.x, and may look like the following:

my_measurement,mytag=val#!~#field_key

Whereas:

type SeriesKeyV2 struct { B []byte }

is a V2 style series key. Given there is no org / bucket ID in InfluxDB 1.x, a default measurement name could be assigned, such as NULL:

NULL,\x00=my_measurement,mytag=val\xff=field_key#!~#field_key

When these keys are passed around, they should never be passed as a raw []byte, but rather in their container struct.

stuartcarnie · 2024-07-24T06:23:44Z

cmd/influx_inspect/export/export_parquet.go

+// resultSet implements resultset.ResultSet over exported TSM data
+type resultSet struct {
+	x          map[string]map[string][]tsm1.Value
+	keys       []string
+	keyIndex   int
+	fields     []string
+	fieldIndex int
+}


This resultSet implementation returns data non-deterministically. As a result, the resulting Parquet files will be different every time. Not a bug, per-se, but something to bear in mind.

stuartcarnie · 2024-07-24T06:50:38Z

cmd/influx_inspect/export/export_parquet.go

+)
+
+//
+// Export to Parquet file(s) is done per each TSM file. The files are apparently not sorted.


Data within a single TSM file is sorted in ascending order, by the series key, and the data with a single block is sorted by timestamp.

There is no guaranteed ordering of data over multiple TSM files. This may occur if a user is back filling data and writes the data in no specific order.

stuartcarnie

Added a comment about how schema does vary within a single TSM file

stuartcarnie · 2024-07-24T22:59:27Z

cmd/influx_inspect/export/export_parquet.go

@@ -87,7 +91,7 @@ func (cmd *Command) exportDoneParquet(_ string) error {
 			TagSet:   tagSet,
 			FieldSet: fieldSet,
 		}
-		// schema does not change in a table
+		// schema does not change in a table in one tsm file


The tag set schema can change within a single TSM file from one series key to the next.

If a user writes the following point:

m0,tag0=val0 f0=1.3

The schema is for the previous line is:

col type

tag0 string (tag)

f0 float (field)

If the next write is:

m0,tag1=val0,tag2=val1 f1=false

The schema for that line is:

col type

tag1 string (tag)

tag2 string (tag)

f1 bool (field)

Therefore, the schema must be the union of all series keys, resulting in a table schema of:

col type

tag0 string (tag)

tag1 string (tag)

tag2 string (tag)

f0 float (field)

f1 bool (field)

I assume then the export have to iterate over twice TSM files. In the first iteration, complete tables schema would be gathered, and in the seconds iteration the actual data exported, correct?

alespour · 2024-07-25T09:31:24Z

Thank you very much for your input, @stuartcarnie . Given the apparent need for better insight into TSM and exporter code itself, I begin to wonder if it wouldn't be better a more feasible approach to use the tool's existing capability to export the data into line protocol, then scan the output to extract tables schemas (1st pass), and then parse it again and save it to Parquet format (2nd pass).

stuartcarnie · 2024-08-02T01:50:45Z

I begin to wonder if it wouldn't be better a more feasible approach to use the tool's existing capability to export the data into line protocol, then scan the output to extract tables schemas (1st pass), and then parse it again and save it to Parquet format (2nd pass).

That would be very inefficient for exporting large databases. If you have access to the code I wrote for Cloud 2, that could be made to work with OSS, and it should be very efficient.

stuartcarnie · 2024-08-02T04:28:18Z

@alespour I'd like to recommend you consider an alternate approach using some higher-level APIs, rather than a TSMReader.

I would study the influx_tools export command:

https://github.com/influxdata/influxdb/blob/cc26b7653c7d9c383b855c3765b5deb3ec803c51/cmd/influx_tools/export

and consider adding a new command to influx_tools, called export-parquet, unless the team has strong feelings otherwise.

At a high level, I suggest processing the export per shard. That will mean using the Shard type:

influxdb/tsdb/shard.go

Lines 161 to 162 in e484c4d

    
           // NewShard returns a new initialized Shard. walPath doesn't apply to the b1 type index 
        
           func NewShard(id uint64, path string, walPath string, sfile *SeriesFile, opt EngineOptions) *Shard {

You'll use a combination of the CreateSeriesCursor API:

influxdb/tsdb/shard.go

Line 913 in e484c4d

    
           func (s *Shard) CreateSeriesCursor(ctx context.Context, req SeriesCursorRequest, cond influxql.Expr) (SeriesCursor, error) {

which is responsible for iterating over all the series keys of a shard. The series keys are produced in order.

This, in combination with the CreateCursorIterator API:

influxdb/tsdb/shard.go

Line 921 in e484c4d

    
           func (s *Shard) CreateCursorIterator(ctx context.Context) (CursorIterator, error) {

Is used to produce data for each field.

You can see these APIs being used in the cmd/influx_tools/internal/storage package, starting with a call to the Read API:

influxdb/cmd/influx_tools/internal/storage/store.go

Lines 28 to 29 in 0887b38

    
           // Read creates a ResultSet that reads all points with a timestamp ts, such that start ≤ ts < end. 
        
           func (s *Store) Read(ctx context.Context, req *ReadRequest) (*ResultSet, error) {

which returns a ResultSet:

influxdb/cmd/influx_tools/internal/storage/resultset.go

Lines 11 to 15 in fe6c64b

    
           type ResultSet struct { 
        
           	cur seriesCursor 
        
           	row seriesRow 
        
           	ci  CursorIterator 
        
           }

The existing code wraps the list of shards in a ShardGroup and so it obtains the list of fields as follows:

influxdb/cmd/influx_tools/internal/storage/series_cursor.go

Lines 51 to 68 in 0887b38

    
           var itr query.Iterator 
        
           var fi query.FloatIterator 
        
           var opt = query.IteratorOptions{ 
        
           	Aux:        []influxql.VarRef{{Val: "key"}}, 
        
           	Authorizer: query.OpenAuthorizer, 
        
           	Ascending:  true, 
        
           	Ordered:    true, 
        
           } 
        
           if itr, err = sg.CreateIterator(ctx, &influxql.Measurement{SystemIterator: "_fieldKeys"}, opt); itr != nil && err == nil { 
        
           	if fi, err = toFloatIterator(itr); err != nil { 
        
           		goto CLEANUP 
        
           	} 
        
           	p.fields = extractFields(fi) 
        
           	fi.Close() 
        
           	return p, nil 
        
           }

As you will be working with a Shard (not a ShardGroup), you can obtain all the fields for the measurement you are processing via the MeasurementFields API directly, which takes the name of a measurement:

influxdb/tsdb/shard.go

Lines 853 to 854 in e484c4d

    
           // MeasurementFields returns fields for a measurement. 
        
           func (s *Shard) MeasurementFields(name []byte) *MeasurementFields {

You then use the FieldKeys API to retrieve all the fields for the measurement, which importantly, is sorted, and you must maintain that sort order.

You can then get the type information for each field returned by the FieldKeys API using the Field API:

influxdb/tsdb/shard.go

Lines 1639 to 1640 in e484c4d

    
           	return false 
        
           }

You then treat the ResultSet as an iterator, calling the Next API to iterate over each series key and field:

influxdb/cmd/influx_tools/internal/storage/resultset.go

Lines 36 to 37 in fe6c64b

    
           // Next moves to the result set forward to the next series key. 
        
           func (r *ResultSet) Next() bool {

Note

As stated in previous comments, you will need to iterate over all the series keys first, to determine all the tag keys, to ensure the Parquet table schema is complete.

Ultimately, your goal is to replace:

influxdb/cmd/influx_inspect/export/export_parquet.go

Lines 231 to 238 in 5b5d6ee

    
           // resultSet implements resultset.ResultSet over exported TSM data 
        
           type resultSet struct { 
        
           	x          map[string]map[string][]tsm1.Value 
        
           	keys       []string 
        
           	keyIndex   int 
        
           	fields     []string 
        
           	fieldIndex int 
        
           }

with a version that consumes a Shard directly.

bednar · 2024-08-14T12:42:16Z

Just a quick update from us: We are on track to use influx_tools export as a base tool to convert TSM data into Parquet format. @alespour has successfully customized the tool to iterate over shards for data access during the export phase, and we are now exploring how to integrate influx_tools export with the existing code for the Parquet exporter.

bednar · 2024-08-21T11:49:23Z

Status update from Bonitoo: We've prepared a refactored version of the exporter based on influx_tools, detailed in PR #25253. We are currently waiting to feedback how to correctly create the series key in the exported file. For more information, please check out this Slack conversation: https://influxdata.slack.com/archives/C5BSZ026L/p1724142258571929?thread_ts=1721781280.449769&cid=C5BSZ026L

alespour added 6 commits June 7, 2024 12:46

feat(client): add initial support for exporting to Parquet

0c65d78

style: import order

733e1bb

fix: unused input parameter

3b5896c

test: add influx_inspect test

92f08e2

style: go fmt

2544c28

fix: extend Parquet options values checks

f04fc04

alespour marked this pull request as ready for review June 11, 2024 08:34

alespour marked this pull request as draft June 12, 2024 09:04

alespour marked this pull request as ready for review June 12, 2024 09:41

alespour added 2 commits June 13, 2024 09:41

chore: update arrow to v16

031aae7

docs: update with new influx_inspect options and sample command

50f4511

davidby-influx requested changes Jul 19, 2024

View reviewed changes

davidby-influx assigned alespour Jul 19, 2024

davidby-influx added area/backup and restore 1.x kind/feature labels Jul 19, 2024

stuartcarnie self-requested a review July 24, 2024 00:14

stuartcarnie requested changes Jul 24, 2024

View reviewed changes

alespour added 8 commits July 24, 2024 10:30

refactor: use constants instead of literal values

db4bb2a

fix: use errors.Capture to handle errors in defer calls

05c1e50

fix: use errors.Capture to handle errors in defer calls

6482911

fix: wrap original error

c1e50bb

fix: print buffer inside error message

b48a1ca

fix: print buffer inside error message

54906a5

fix: check for empty input

5b5d6ee

test: assert and cleanaup properly

b09c14e

stuartcarnie reviewed Jul 24, 2024

View reviewed changes

stuartcarnie mentioned this pull request Aug 27, 2024

feat: influx_tools export parquet #25253

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: influx inspect export parquet #25047

feat: influx inspect export parquet #25047

alespour commented Jun 7, 2024 •

edited

Loading

alespour commented Jun 12, 2024 •

edited

Loading

powersj commented Jun 12, 2024 •

edited

Loading

alespour commented Jun 12, 2024 •

edited

Loading

davidby-influx left a comment

davidby-influx Jul 19, 2024

stuartcarnie left a comment •

edited

Loading

stuartcarnie Jul 24, 2024

stuartcarnie Jul 24, 2024 •

edited

Loading

stuartcarnie Jul 24, 2024 •

edited

Loading

stuartcarnie Jul 24, 2024

stuartcarnie Jul 24, 2024

stuartcarnie Jul 24, 2024

stuartcarnie Jul 24, 2024

stuartcarnie Jul 24, 2024

stuartcarnie left a comment

stuartcarnie Jul 24, 2024

alespour Jul 25, 2024

alespour commented Jul 25, 2024 •

edited

Loading

stuartcarnie commented Aug 2, 2024

stuartcarnie commented Aug 2, 2024 •

edited

Loading

bednar commented Aug 14, 2024

bednar commented Aug 21, 2024


		var schema *index.MeasurementSchema

		for key, fields := range vc {

	// tagsKeyMerger is responsible for determining a merged set of tag keys
	type KeyMerger struct {
	i int
	tmp [][]byte
	keys [2][][]byte
	}

		_, t.buf.tags = models.ParseKeyBytesWithTags(groupKey, t.buf.tags)
		tagSet := t.buf.tags[1:] // skip measurement, which is always the first tag key (\x00)

col	type
tag0	string (tag)
tag1	string (tag)
tag2	string (tag)
f0	float (field)
f1	bool (field)

feat: influx inspect export parquet #25047

Are you sure you want to change the base?

feat: influx inspect export parquet #25047

Conversation

alespour commented Jun 7, 2024 • edited Loading

alespour commented Jun 12, 2024 • edited Loading

powersj commented Jun 12, 2024 • edited Loading

alespour commented Jun 12, 2024 • edited Loading

davidby-influx left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stuartcarnie left a comment • edited Loading

Choose a reason for hiding this comment

Parquet file output

Potential pitfalls

Footnotes

Choose a reason for hiding this comment

stuartcarnie Jul 24, 2024 • edited Loading

Choose a reason for hiding this comment

Footnotes

stuartcarnie Jul 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stuartcarnie left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alespour commented Jul 25, 2024 • edited Loading

stuartcarnie commented Aug 2, 2024

stuartcarnie commented Aug 2, 2024 • edited Loading

bednar commented Aug 14, 2024

bednar commented Aug 21, 2024

alespour commented Jun 7, 2024 •

edited

Loading

alespour commented Jun 12, 2024 •

edited

Loading

powersj commented Jun 12, 2024 •

edited

Loading

alespour commented Jun 12, 2024 •

edited

Loading

stuartcarnie left a comment •

edited

Loading

stuartcarnie Jul 24, 2024 •

edited

Loading

stuartcarnie Jul 24, 2024 •

edited

Loading

alespour commented Jul 25, 2024 •

edited

Loading

stuartcarnie commented Aug 2, 2024 •

edited

Loading