Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch field in the influxdb3_wal crate to use ColumnId instead of a Arc<str> #25461

Open
mgattozzi opened this issue Oct 11, 2024 · 3 comments
Assignees

Comments

@mgattozzi
Copy link
Contributor

This is a follow up to #25388. We want to use ColumnIds instead of Arc. This is the relevant line to change, with any tests and code that need to be changed as a result.

pub name: Arc<str>,

@mgattozzi mgattozzi self-assigned this Oct 11, 2024
@mgattozzi
Copy link
Contributor Author

@pauldix this is turning out to not be a simple change and has far reaching implications. For instance:

  • The BufferIndex depends on the name of the column, especially when working with Exprs which are from Datafusion
  • A lot of the times we need names, it's because of Datafusion
  • LastCache expects names for lookups, especially when interfacing with, you guessed it, the Datafusion Schema type

I think we'd spend more time doing a conversion lookup then just having an Arc everywhere. I'm not sure the ColumnId makes as much sense compared to just using the name of the field. We'd end up having to pass the ColumnId and the name around in many places.

We could still try to make it work, but this is not a single day task like I had originally hoped for and I'll likely need to pick this up after vacation or we reconsider the value of ColumnId at all. Even the wrapper TableSchema is proving hard to pass around, when so much of the code really only cares about a schema::Schema type. I'm worried we're adding a lot of overhead for very little gain here.

@pauldix
Copy link
Member

pauldix commented Oct 11, 2024

I think it's worth doing mainly because that goes into the wal constantly, many more times than you'd do the lookup for queries. Ideally the last cache and queryable buffer would keep the data indexed by ID too. Just doing conversion on the read side when pulling data out.

Definitely not urgent so it can wait a while.

@mgattozzi
Copy link
Contributor Author

Okay I'll pick this up when I get back from vacation then!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants