Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQuery Write API cannot use BigQuery CDC feature #1086

Open
jster1357 opened this issue Oct 11, 2023 · 0 comments
Open

BigQuery Write API cannot use BigQuery CDC feature #1086

jster1357 opened this issue Oct 11, 2023 · 0 comments
Labels
enhancement New feature or request

Comments

@jster1357
Copy link

jster1357 commented Oct 11, 2023

I have a spark application that I want to stream messages into BigQuery. Instead of manually having to a merge operation on the BQ side, I'd like to use the native CDC functionality that allows for UPSERTS and DELETES. I created a table in BQ w/ a PK as well as clustered so the pre-requisites are good there.

I added the field _CHANGE_TYPE to my data I wanted to load and omitted that field in the BQ table because its a pseudo column. When I try to use the storage write API I get errors related to it trying to write that pseudo column and their being a schema mismatch between the DF and the BQ schema.

It seems like this CDC feature isn't supported as part of the connector. Is this something that can be done? If not, is this in the planning stages?

simple example:

spark.conf.set('spark.datasource.bigquery.enableModeCheckForSchemaFields',False)
spark.conf.set('spark.datasource.bigquery.writeAtLeastOnce',True)

word_count = spark.sql(
'SELECT word, SUM(word_count) AS word_count, 'UPSERT' as _CHANGE_TYPE FROM words GROUP BY word')

Save the data to BigQuery

word_count.write.format('bigquery')
.option('writeMethod','direct')
.mode('append')
.save('demo_data.cdc_wordcount')

@jster1357 jster1357 changed the title BigQuery Write API cannot BigQuery Write API cannot use BigQuery CDC feature Oct 11, 2023
@isha97 isha97 added the enhancement New feature or request label May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants