You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using DuckDB SQL (against Views of parquet files), for TPC-H query 9 at scale factor 1 the ordering is off. USA and UK are interleaved, when they should actually be UK then USA.
result vs answer
nation o_year sum_profit
result[152, ] SAUDI ARABIA 1994 48181673
result[153, ] SAUDI ARABIA 1993 45692556
result[154, ] SAUDI ARABIA 1992 48924913
- result[155, ] UNITED STATES 1998 27826594
+ answer[155, ] UNITED KINGDOM 1998 26366683
- result[156, ] UNITED KINGDOM 1998 26366683
+ answer[156, ] UNITED KINGDOM 1997 44518130
- result[157, ] UNITED KINGDOM 1997 44518130
+ answer[157, ] UNITED KINGDOM 1996 45539730
- result[158, ] UNITED STATES 1997 46638572
+ answer[158, ] UNITED KINGDOM 1995 46845879
- result[159, ] UNITED STATES 1996 46688281
+ answer[159, ] UNITED KINGDOM 1994 43081610
- result[160, ] UNITED KINGDOM 1996 45539730
+ answer[160, ] UNITED KINGDOM 1993 44770147
- result[161, ] UNITED STATES 1995 48951592
+ answer[161, ] UNITED KINGDOM 1992 44123403
- result[162, ] UNITED KINGDOM 1995 46845879
+ answer[162, ] UNITED STATES 1998 27826594
- result[163, ] UNITED STATES 1994 45099092
+ answer[163, ] UNITED STATES 1997 46638572
- result[164, ] UNITED KINGDOM 1994 43081610
+ answer[164, ] UNITED STATES 1996 46688281
- result[165, ] UNITED KINGDOM 1993 44770147
+ answer[165, ] UNITED STATES 1995 48951592
- result[166, ] UNITED STATES 1993 46181601
+ answer[166, ] UNITED STATES 1994 45099092
- result[167, ] UNITED STATES 1992 46168214
+ answer[167, ] UNITED STATES 1993 46181601
- result[168, ] UNITED KINGDOM 1992 44123403
+ answer[168, ] UNITED STATES 1992 46168214
result[169, ] VIETNAM 1998 27281931
result[170, ] VIETNAM 1997 48735914
result[171, ] VIETNAM 1996 47824596
library(duckdb)
library(arrowbench)
query_id <- 9
format <- "parquet"
scale_factor <- 1
Sys.setenv(ARROWBENCH_LOCAL_DIR = "~/repos/ab_store/het")
tpch_files <- ensure_source("tpch", scale_factor = scale_factor)
tpch_tables_needed <- tables_refed(tpc_h_queries[[query_id]])
con <- dbConnect(duckdb())
# set parallelism for duckdb
DBI::dbExecute(con, paste0("PRAGMA threads=10"))
for (name in tpch_tables_needed) {
file <- path.expand(tpch_files[[name]])
# have to create a VIEW in order to reference it by name
# This view is the most accurate comparison to Arrow, however it will
# penalize duckdb since AFAICT `parquet_scan` is not parallelized and
# ends up being the bottleneck
if (format == "parquet") {
sql_query <- paste0("CREATE OR REPLACE VIEW ", name, " AS SELECT * FROM parquet_scan('", file, "');")
} else if (format == "native") {
sql_query <- paste0("CREATE TABLE IF NOT EXISTS ", name, " AS SELECT * FROM parquet_scan('", file, "');")
}
DBI::dbExecute(con, sql_query)
}
tpc_h_queries <- list()
# TODO: should this actually just be query_id?
queries <- 1:22
# get the queries
tpc_h_queries <- lapply(queries, get_sql_query_func, con = con)
names(tpc_h_queries) <- queries
result <- tpc_h_queries[[query_id]](con = con)
result <- dplyr::as_tibble(result)
answer <- tpch_answer(scale_factor, query_id)
waldo::compare(result, answer, x_arg = "result", y_arg = "answer")
The text was updated successfully, but these errors were encountered:
When using DuckDB SQL (against Views of parquet files), for TPC-H query 9 at scale factor 1 the ordering is off. USA and UK are interleaved, when they should actually be UK then USA.
The text was updated successfully, but these errors were encountered: