Column names with uppercase characters break key_properties #15

laurentS · 2021-03-26T23:10:14Z

I'm not 100% sure whose bug it is, but I think it belongs here:
If I load a CSV with a column name containing uppercase characters (eg. MyKeyCol) and want to use it as the key column, I run into a problem if I am also using target-postgres as the loader (the datamill-co variant). Somewhere in the process, the column name is lowercased, but the key property isn't, which results in the KeyError below.
If I change key_properties to be one of the columns with a lowercase name, it works fine.
I looked at the extras in meltano, but couldn't figure out a solution. It almost feels like the key_properties should allow a value like MyKeyCol as mykeycol.
I'm guessing the field_names config value could be used to force the column names, but it doesn't look like I can skip the first row, so I'm back to unhappiness again :) (this could actually be an extra setting, like skip_rows (integer, default 0) to force the loader to ignore the first N rows of the file).

Here are the logs I'm getting with an example that fails:

tap-spreadsheets-anywhere | INFO Using supplied catalog .meltano/run/elt/load_csv_base_data/3bce9632-9df0-4752-b19d-1705bab4d11d/tap.properties.json.
tap-spreadsheets-anywhere | INFO Syncing stream:us_sec_country_codes
tap-spreadsheets-anywhere | INFO Walking /static_data/.
tap-spreadsheets-anywhere | INFO Found 13 files.
tap-spreadsheets-anywhere | INFO Checking 9 resolved objects for any that match regular expression "myfile.*" and were modified since 2011-01-01 00:00:00+00:00
tap-spreadsheets-anywhere | INFO Processing 1 resolved objects that met our criteria. Enable debug verbosity logging for more details.
tap-spreadsheets-anywhere | INFO Syncing file "myfile.csv".
tap-spreadsheets-anywhere | INFO Wrote 309 records for stream "mystream".
target-postgres           | DEBUG MillisLoggingConnection: 0 millis spent executing: b'BEGIN;'
target-postgres           | DEBUG MillisLoggingConnection: 0 millis spent executing: b"\n            SELECT c.relname, obj_description(c.oid, 'pg_class')\n            FROM pg_namespace AS n\n                INNER JOIN pg_class AS c ON n.oid = c.relnamespace\n            WHERE n.nspname = 'meltano';\n        "
target-postgres           | INFO Mapping: mytable to ['mytable']
target-postgres           | INFO Mapping: tp_mytable_mycol__sdc_sequence_idx to None
target-postgres           | DEBUG MillisLoggingConnection: 7 millis spent executing: b"\n            SELECT column_name, data_type, is_nullable FROM information_schema.columns\n            WHERE table_schema = 'meltano' and table_name = 'mytable';\n        "
target-postgres           | DEBUG MillisLoggingConnection: 1 millis spent executing: b"\n            SELECT EXISTS (\n                SELECT 1 FROM pg_tables\n                WHERE schemaname = 'meltano' AND\n                      tablename = 'mytable');\n        "
target-postgres           | INFO Stream mytable (mytable) with max_version None targetting None
target-postgres           | INFO Root table name mytable
target-postgres           | INFO Writing batch with 309 records for `mytable` with `key_properties`: `['MyKeyCol']`
target-postgres           | INFO METRIC: {"type": "counter", "metric": "record_count", "value": 0, "tags": {"count_type": "batch_rows_persisted", "path": ["mytable"], "database": "mydb", "schema": "meltano"}}
target-postgres           | INFO METRIC: {"type": "timer", "metric": "job_duration", "value": 0.0010831356048583984, "tags": {"job_type": "batch", "path": ["mytable"], "database": "mydb", "schema": "meltano", "status": "failed"}}
target-postgres           | ERROR Exception writing records
target-postgres           | Traceback (most recent call last):
target-postgres           |   File ".meltano/loaders/target-postgres/venv/lib/python3.8/site-packages/target_postgres/postgres.py", line 295, in write_batch
target-postgres           |     written_batches_details = self.write_batch_helper(cur,
target-postgres           |   File ".meltano/loaders/target-postgres/venv/lib/python3.8/site-packages/target_postgres/sql_base.py", line 808, in write_batch_helper
target-postgres           |     for table_batch in denest.to_table_batches(schema, key_properties, records):
target-postgres           |   File ".meltano/loaders/target-postgres/venv/lib/python3.8/site-packages/target_postgres/denest.py", line 20, in to_table_batches
target-postgres           |     table_schemas = _get_streamed_table_schemas(schema,
target-postgres           |   File ".meltano/loaders/target-postgres/venv/lib/python3.8/site-packages/target_postgres/denest.py", line 47, in _get_streamed_table_schemas
target-postgres           |     key_prop_schemas[key] = schema['properties'][key]
target-postgres           | KeyError: 'MyKeyCol'
target-postgres           | CRITICAL ('Exception writing records', KeyError('MyKeyCol'))
target-postgres           | Traceback (most recent call last):
target-postgres           |   File ".meltano/loaders/target-postgres/venv/lib/python3.8/site-packages/target_postgres/postgres.py", line 295, in write_batch
target-postgres           |     written_batches_details = self.write_batch_helper(cur,
target-postgres           |   File ".meltano/loaders/target-postgres/venv/lib/python3.8/site-packages/target_postgres/sql_base.py", line 808, in write_batch_helper
target-postgres           |     for table_batch in denest.to_table_batches(schema, key_properties, records):
target-postgres           |   File ".meltano/loaders/target-postgres/venv/lib/python3.8/site-packages/target_postgres/denest.py", line 20, in to_table_batches
target-postgres           |     table_schemas = _get_streamed_table_schemas(schema,
target-postgres           |   File ".meltano/loaders/target-postgres/venv/lib/python3.8/site-packages/target_postgres/denest.py", line 47, in _get_streamed_table_schemas
target-postgres           |     key_prop_schemas[key] = schema['properties'][key]
target-postgres           | KeyError: 'MyKeyCol'
target-postgres           |
target-postgres           | During handling of the above exception, another exception occurred:
target-postgres           |
target-postgres           | Traceback (most recent call last):
target-postgres           |   File ".meltano/loaders/target-postgres/venv/bin/target-postgres", line 8, in <module>
target-postgres           |     sys.exit(cli())
target-postgres           |   File ".meltano/loaders/target-postgres/venv/lib/python3.8/site-packages/target_postgres/__init__.py", line 45, in cli
target-postgres           |     main(args.config)
target-postgres           |   File ".meltano/loaders/target-postgres/venv/lib/python3.8/site-packages/target_postgres/__init__.py", line 39, in main
target-postgres           |     target_tools.main(postgres_target)
target-postgres           |   File ".meltano/loaders/target-postgres/venv/lib/python3.8/site-packages/target_postgres/target_tools.py", line 28, in main
target-postgres           |     stream_to_target(input_stream, target, config=config)
target-postgres           |   File ".meltano/loaders/target-postgres/venv/lib/python3.8/site-packages/target_postgres/target_tools.py", line 77, in stream_to_target
target-postgres           |     raise e
target-postgres           |   File ".meltano/loaders/target-postgres/venv/lib/python3.8/site-packages/target_postgres/target_tools.py", line 70, in stream_to_target
target-postgres           |     state_tracker.flush_streams(force=True)
target-postgres           |   File ".meltano/loaders/target-postgres/venv/lib/python3.8/site-packages/target_postgres/stream_tracker.py", line 47, in flush_streams
target-postgres           |     self._write_batch_and_update_watermarks(stream)
target-postgres           |   File ".meltano/loaders/target-postgres/venv/lib/python3.8/site-packages/target_postgres/stream_tracker.py", line 67, in _write_batch_and_update_watermarks
target-postgres           |     self.target.write_batch(stream_buffer)
target-postgres           |   File ".meltano/loaders/target-postgres/venv/lib/python3.8/site-packages/target_postgres/postgres.py", line 309, in write_batch
target-postgres           |     raise PostgresError(message, ex)
target-postgres           | target_postgres.exceptions.PostgresError: ('Exception writing records', KeyError('MyKeyCol'))
meltano                   | Loading failed (1): target_postgres.exceptions.PostgresError: ('Exception writing records', KeyError('MyKeyCol'))
meltano                   | ELT could not be completed: Loader failed

The text was updated successfully, but these errors were encountered:

nickolasclarke · 2024-02-16T22:55:52Z

@laurentS did you ever resolve this? I am also encountering this.

laurentS · 2024-02-17T11:27:35Z

I did not, sorry. I moved on to other solutions to load CSV files, outside the singer/meltano ecosystem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Column names with uppercase characters break key_properties #15

Column names with uppercase characters break key_properties #15

laurentS commented Mar 26, 2021

nickolasclarke commented Feb 16, 2024 •

edited

Loading

laurentS commented Feb 17, 2024

Column names with uppercase characters break key_properties #15

Column names with uppercase characters break key_properties #15

Comments

laurentS commented Mar 26, 2021

nickolasclarke commented Feb 16, 2024 • edited Loading

laurentS commented Feb 17, 2024

nickolasclarke commented Feb 16, 2024 •

edited

Loading