Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk Ingestion into Postgresql Numeric with scale #2177

Open
ibnubay opened this issue Sep 20, 2024 · 6 comments
Open

Bulk Ingestion into Postgresql Numeric with scale #2177

ibnubay opened this issue Sep 20, 2024 · 6 comments
Labels
Type: question Usage question

Comments

@ibnubay
Copy link

ibnubay commented Sep 20, 2024

What would you like help with?

Env:

  • Macos Sonoma (14.5)
  • Python 3.11.9
  • pyarrow 17.0.0
  • adbc_driver_postgresql 1.2.0

Background:
I want to load data into Postgresql using Bulk Ingestion/Copy command from parquet file.
I have pyarrow "amount" column with pa.float64(), to be inserted into Numeric with scale column in Postgresql "amount numeric(18, 2)".

Then I got this error below:
Exception: ProgrammingError('INVALID_ARGUMENT: [libpq] Failed to execute COPY statement: PGRES_FATAL_ERROR ERROR: insufficient data left in message\nCONTEXT: COPY invoice_new, line 1, column amount\n. SQLSTATE: 08P01')

And I tried to change pyarrow schema "amount" column into "decimal256(18, 2)"
I got this error below:
Exception: ProgrammingError('INVALID_ARGUMENT: [libpq] Failed to execute COPY statement: PGRES_FATAL_ERROR ERROR: invalid scale in external "numeric" value\nCONTEXT: COPY invoice_new, line 1, column amount\n. SQLSTATE: 22P03')

To fix this error, do I need cast pyarrow "amount" column schema to another schema type?

Ty

@ibnubay ibnubay added the Type: question Usage question label Sep 20, 2024
@WillAyd
Copy link
Contributor

WillAyd commented Sep 20, 2024

Does it matter if you use decimal128 versus decimal256? The former should work just fine with the provided precision / scale

@ibnubay
Copy link
Author

ibnubay commented Sep 20, 2024

Sorry @WillAyd , It's an error when "amount" value like this "12222.00" or that have decimal with zero value.
I already tried with "decimal128 or decimal256", still same error above
Any suggestion?

@paleolimbot
Copy link
Member

I think this is because the bulk insert feature requires that the schemas match exactly and because we don't take into account the existing column types when performing a bulk insertion (we should!). Specifically, that would mean that we should add a PostgresType argument here:

static inline ArrowErrorCode MakeCopyFieldWriter(
struct ArrowSchema* schema, struct ArrowArrayView* array_view,
const PostgresTypeResolver& type_resolver,
std::unique_ptr<PostgresCopyFieldWriter>* out, ArrowError* error) {

...and return our field writers based on a many-to-many mapping (e.g., so that we can generate valid COPY for a numeric type, for example, based on a variety of types of arrow inputs.

In the meantime, you should be able to use a parameterized INSERT as a workaround (i.e., INSERT INTO some_table VALUES (?). (I forget exactly how to access the bulk bind via dbapi/ADBC in Python). I believe there was another issue where somebody did a bulk insert into a temporary table and used SQL to do the type casting/insert.

@WillAyd
Copy link
Contributor

WillAyd commented Sep 20, 2024

In the meantime, you should be able to use a parameterized INSERT as a workaround (i.e., INSERT INTO some_table VALUES (?). (I forget exactly how to access the bulk bind via dbapi/ADBC in Python)

I think you are looking for something like this:

tbl = pa.Table(...)
with adbc_driver_postgresql.dbapi(conn_args) as conn, conn.cursor() as cur:
    cur.adbc_prepare("INSERT INTO some_table VALUES (?)")
    cur.executemany("INSERT INTO some_table VALUES (?)", tbl)
    conn.commit()

@lidavidm
Copy link
Member

This won't be as fast as the bulk insert, just to be clear (but I think it should be a bit better than other drivers since we can still at least use the binary format? It's only that we have to do a server roundtrip for each row in this case)

@ibnubay
Copy link
Author

ibnubay commented Sep 21, 2024

I try to do another approach to tackle insert "Decimal" data into Postgresql.
Like in another issue about retrieve "Decimal" data from Postgresql, on "staging" table I put decimal data into "String" schema then upsert into "main" table I do convert that decimal from "VARCHAR" into "DECIMAL" using sql.

I think that's enough for me, even must add doing "decimal transformation".

Ty guys, for help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: question Usage question
Projects
None yet
Development

No branches or pull requests

4 participants