Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add references table hint #1925

Open
wants to merge 5 commits into
base: devel
Choose a base branch
from
Open

Add references table hint #1925

wants to merge 5 commits into from

Conversation

steinitzu
Copy link
Collaborator

@steinitzu steinitzu commented Oct 3, 2024

Description

Adds new references table hint. Can be added via @resource decorator and apply_hints
Takes a list of "foreign key" references.

SQLAlchemy database source automatically generates the hint from tabel foreign keys

Related Issues

#1713

Copy link

netlify bot commented Oct 3, 2024

Deploy Preview for dlt-hub-docs canceled.

Name Link
🔨 Latest commit bb49c49
🔍 Latest deploy log https://app.netlify.com/sites/dlt-hub-docs/deploys/67085386c44a00000847af9c

Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is good! the procedure to get references from sql alchemy looks pretty invasive. ie. we get foreign keys to tables in other schemas and those we do not even sync.

my take: we add another feature flag to let users enable it on demand.

`columns` corresponds to the `referenced_columns` in the referenced table and their order should match.
"""

columns: Sequence[str]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sh-rp this is correct? corresponds to the one used in dbt-gen?

@@ -559,6 +560,18 @@ def normalize_table_identifiers(table: TTableSchema, naming: NamingConvention) -
else:
new_columns[new_col_name] = c
table["columns"] = new_columns
reference = table.get("references")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is good. but we need to change the table diff functions above. the questions is: should we replace references or merge them ie by making sure duplicates are not endlessly inserted. could you check it out and test?

dlt/extract/hints.py Show resolved Hide resolved
if reflection_level == "minimal":
return None
result: List[TTableReference] = []
for fk_constraint in table.foreign_key_constraints:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if table is reflected, we have foreign keys available, or here a call to database is made to populate the constraints?

return None
result: List[TTableReference] = []
for fk_constraint in table.foreign_key_constraints:
referenced_table = fk_constraint.referred_table.name
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if the tables is in a separate schema in the database? is the other schema reflected lazily? should we add reference to tables in schemas different from the table? WDYT?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate schemas should work the same. But I didn't consider it. The reference hint only has table name so a reference to a different schema doesn't make sense, unless we add a dataset_name property?
Probably best to just filter out different schemas.

We have always been using metadata.reflect(resolve_fks=True) (default) so there are no extra db calls from before, referenced tables are always reflected whether they're synced or not so we have the referred_table property.
But should handle the case when referred_table is not available, may be when using custom metadata.

+ reflect references from foreign keys in sqlalchemy source
@steinitzu steinitzu force-pushed the table-reference-hint branch 2 times, most recently from 33dbea8 to 195f88a Compare October 10, 2024 17:55
Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the code is good. from your comment is pretty clear that we should get the references in sql alchemy only when requested by the user so

  1. a new flag to sql_database and sql_table
  2. do not reflect foreign keys if flag disabled (should be default)
  3. do not create references if the above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants