Switch field in the influxdb3_wal crate to use ColumnId instead of a Arc<str> #25461

mgattozzi · 2024-10-11T15:11:44Z

This is a follow up to #25388. We want to use ColumnIds instead of Arc. This is the relevant line to change, with any tests and code that need to be changed as a result.

influxdb/influxdb3_wal/src/lib.rs

Line 574 in 219fc16

pub name: Arc<str>,

mgattozzi · 2024-10-11T18:03:51Z

@pauldix this is turning out to not be a simple change and has far reaching implications. For instance:

The BufferIndex depends on the name of the column, especially when working with Exprs which are from Datafusion
A lot of the times we need names, it's because of Datafusion
LastCache expects names for lookups, especially when interfacing with, you guessed it, the Datafusion Schema type

I think we'd spend more time doing a conversion lookup then just having an Arc everywhere. I'm not sure the ColumnId makes as much sense compared to just using the name of the field. We'd end up having to pass the ColumnId and the name around in many places.

We could still try to make it work, but this is not a single day task like I had originally hoped for and I'll likely need to pick this up after vacation or we reconsider the value of ColumnId at all. Even the wrapper TableSchema is proving hard to pass around, when so much of the code really only cares about a schema::Schema type. I'm worried we're adding a lot of overhead for very little gain here.

pauldix · 2024-10-11T18:13:21Z

I think it's worth doing mainly because that goes into the wal constantly, many more times than you'd do the lookup for queries. Ideally the last cache and queryable buffer would keep the data indexed by ID too. Just doing conversion on the read side when pulling data out.

Definitely not urgent so it can wait a while.

mgattozzi · 2024-10-11T18:22:26Z

Okay I'll pick this up when I get back from vacation then!

mgattozzi self-assigned this Oct 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch field in the influxdb3_wal crate to use ColumnId instead of a Arc<str> #25461

Switch field in the influxdb3_wal crate to use ColumnId instead of a Arc<str> #25461

mgattozzi commented Oct 11, 2024

mgattozzi commented Oct 11, 2024

pauldix commented Oct 11, 2024

mgattozzi commented Oct 11, 2024

Switch field in the influxdb3_wal crate to use ColumnId instead of a Arc<str> #25461

Switch field in the influxdb3_wal crate to use ColumnId instead of a Arc<str> #25461

Comments

mgattozzi commented Oct 11, 2024

mgattozzi commented Oct 11, 2024

pauldix commented Oct 11, 2024

mgattozzi commented Oct 11, 2024