You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
from pyspark.sql import Row
# Create a list of Rows
data = [
Row(col1="value1_1", col2="value1_2", col3="value1_3"),
Row(col1="value2_1", col2="value2_2", col3="value2_3"),
Row(col1="value3_1", col2="value3_2", col3="value3_3"),
Row(col1="value4_1", col2="value4_2", col3="value4_3"),
Row(col1="value5_1", col2="value5_2", col3="value5_3"),
Row(col1="value6_1", col2="value6_2", col3="value6_3"),
Row(col1="value7_1", col2="value7_2", col3="value7_3"),
Row(col1="value8_1", col2="value8_2", col3="value8_3"),
Row(col1="value9_1", col2="value9_2", col3="value9_3"),
Row(col1="value10_1", col2="value10_2", col3="value10_3")
]
# Create DataFrame
df = spark.createDataFrame(data)
# count table
(spark.read
.format("bigquery")
.option("table", 'mydataset.test_read_write_table')
.option("project", 'myproject')
.load()
).count()
### prints 50
# write 10 rows to table
(df.write
.mode('append')
.format("bigquery")
.option("project", 'myproject')
.option("writeMethod", "direct")
.save('mydataset.test_read_write_table')
)
# check count again
(spark.read
.format("bigquery")
.option("table", 'mydataset.test_read_write_table')
.option("project", 'myproject')
.load()
).count()
### prints 50 but should be 60
When I check the BQ table the rows are updated but not reflected in my Dataframe. If I use the query option instead of the table and perform "select count(1) from mydataset.test_read_write_table" then the counts are accurate. This seems like a potential cache problem which I tried using the cacheExpirationTimeInMinutes option to 0 but it seems to not work. However, if I set it to a positive integer it does work after the time setting is up.
The text was updated successfully, but these errors were encountered:
I am having an issue getting accurate counts when reading/writing to BigQuery from Databricks after installing the connector.
Connector Version: spark-3.5-bigquery-0.39.1.jar
Apache Spark 3.5.0
Scala 2.12
Databricks 14.3LTS
Code to replicate:
When I check the BQ table the rows are updated but not reflected in my Dataframe. If I use the query option instead of the table and perform "select count(1) from mydataset.test_read_write_table" then the counts are accurate. This seems like a potential cache problem which I tried using the cacheExpirationTimeInMinutes option to 0 but it seems to not work. However, if I set it to a positive integer it does work after the time setting is up.
The text was updated successfully, but these errors were encountered: