Bulk Ingestion into Postgresql Numeric with scale #2177

ibnubay · 2024-09-20T17:45:32Z

What would you like help with?

Env:

Macos Sonoma (14.5)
Python 3.11.9
pyarrow 17.0.0
adbc_driver_postgresql 1.2.0

Background:
I want to load data into Postgresql using Bulk Ingestion/Copy command from parquet file.
I have pyarrow "amount" column with pa.float64(), to be inserted into Numeric with scale column in Postgresql "amount numeric(18, 2)".

Then I got this error below:
Exception: ProgrammingError('INVALID_ARGUMENT: [libpq] Failed to execute COPY statement: PGRES_FATAL_ERROR ERROR: insufficient data left in message\nCONTEXT: COPY invoice_new, line 1, column amount\n. SQLSTATE: 08P01')

And I tried to change pyarrow schema "amount" column into "decimal256(18, 2)"
I got this error below:
Exception: ProgrammingError('INVALID_ARGUMENT: [libpq] Failed to execute COPY statement: PGRES_FATAL_ERROR ERROR: invalid scale in external "numeric" value\nCONTEXT: COPY invoice_new, line 1, column amount\n. SQLSTATE: 22P03')

To fix this error, do I need cast pyarrow "amount" column schema to another schema type?

Ty

The text was updated successfully, but these errors were encountered:

WillAyd · 2024-09-20T18:13:47Z

Does it matter if you use decimal128 versus decimal256? The former should work just fine with the provided precision / scale

ibnubay · 2024-09-20T18:31:04Z

Sorry @WillAyd , It's an error when "amount" value like this "12222.00" or that have decimal with zero value.
I already tried with "decimal128 or decimal256", still same error above
Any suggestion?

paleolimbot · 2024-09-20T21:54:20Z

I think this is because the bulk insert feature requires that the schemas match exactly and because we don't take into account the existing column types when performing a bulk insertion (we should!). Specifically, that would mean that we should add a PostgresType argument here:

arrow-adbc/c/driver/postgresql/copy/writer.h

Lines 568 to 571 in 46dc748

    
           static inline ArrowErrorCode MakeCopyFieldWriter( 
        
               struct ArrowSchema* schema, struct ArrowArrayView* array_view, 
        
               const PostgresTypeResolver& type_resolver, 
        
               std::unique_ptr<PostgresCopyFieldWriter>* out, ArrowError* error) {

...and return our field writers based on a many-to-many mapping (e.g., so that we can generate valid COPY for a numeric type, for example, based on a variety of types of arrow inputs.

In the meantime, you should be able to use a parameterized INSERT as a workaround (i.e., INSERT INTO some_table VALUES (?). (I forget exactly how to access the bulk bind via dbapi/ADBC in Python). I believe there was another issue where somebody did a bulk insert into a temporary table and used SQL to do the type casting/insert.

WillAyd · 2024-09-20T22:08:31Z

In the meantime, you should be able to use a parameterized INSERT as a workaround (i.e., INSERT INTO some_table VALUES (?). (I forget exactly how to access the bulk bind via dbapi/ADBC in Python)

I think you are looking for something like this:

tbl = pa.Table(...)
with adbc_driver_postgresql.dbapi(conn_args) as conn, conn.cursor() as cur:
    cur.adbc_prepare("INSERT INTO some_table VALUES (?)")
    cur.executemany("INSERT INTO some_table VALUES (?)", tbl)
    conn.commit()

lidavidm · 2024-09-21T02:01:03Z

This won't be as fast as the bulk insert, just to be clear (but I think it should be a bit better than other drivers since we can still at least use the binary format? It's only that we have to do a server roundtrip for each row in this case)

ibnubay · 2024-09-21T10:58:10Z

I try to do another approach to tackle insert "Decimal" data into Postgresql.
Like in another issue about retrieve "Decimal" data from Postgresql, on "staging" table I put decimal data into "String" schema then upsert into "main" table I do convert that decimal from "VARCHAR" into "DECIMAL" using sql.

I think that's enough for me, even must add doing "decimal transformation".

Ty guys, for help.

ibnubay added the Type: question Usage question label Sep 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bulk Ingestion into Postgresql Numeric with scale #2177

Bulk Ingestion into Postgresql Numeric with scale #2177

ibnubay commented Sep 20, 2024 •

edited

Loading

WillAyd commented Sep 20, 2024

ibnubay commented Sep 20, 2024 •

edited

Loading

paleolimbot commented Sep 20, 2024

WillAyd commented Sep 20, 2024

lidavidm commented Sep 21, 2024

ibnubay commented Sep 21, 2024 •

edited

Loading

Bulk Ingestion into Postgresql Numeric with scale #2177

Bulk Ingestion into Postgresql Numeric with scale #2177

Comments

ibnubay commented Sep 20, 2024 • edited Loading

What would you like help with?

WillAyd commented Sep 20, 2024

ibnubay commented Sep 20, 2024 • edited Loading

paleolimbot commented Sep 20, 2024

WillAyd commented Sep 20, 2024

lidavidm commented Sep 21, 2024

ibnubay commented Sep 21, 2024 • edited Loading

ibnubay commented Sep 20, 2024 •

edited

Loading

ibnubay commented Sep 20, 2024 •

edited

Loading

ibnubay commented Sep 21, 2024 •

edited

Loading