Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Column names with uppercase characters break key_properties #15

Open
laurentS opened this issue Mar 26, 2021 · 2 comments
Open

Column names with uppercase characters break key_properties #15

laurentS opened this issue Mar 26, 2021 · 2 comments

Comments

@laurentS
Copy link
Contributor

I'm not 100% sure whose bug it is, but I think it belongs here:
If I load a CSV with a column name containing uppercase characters (eg. MyKeyCol) and want to use it as the key column, I run into a problem if I am also using target-postgres as the loader (the datamill-co variant). Somewhere in the process, the column name is lowercased, but the key property isn't, which results in the KeyError below.
If I change key_properties to be one of the columns with a lowercase name, it works fine.
I looked at the extras in meltano, but couldn't figure out a solution. It almost feels like the key_properties should allow a value like MyKeyCol as mykeycol.
I'm guessing the field_names config value could be used to force the column names, but it doesn't look like I can skip the first row, so I'm back to unhappiness again :) (this could actually be an extra setting, like skip_rows (integer, default 0) to force the loader to ignore the first N rows of the file).

Here are the logs I'm getting with an example that fails:

tap-spreadsheets-anywhere | INFO Using supplied catalog .meltano/run/elt/load_csv_base_data/3bce9632-9df0-4752-b19d-1705bab4d11d/tap.properties.json.
tap-spreadsheets-anywhere | INFO Syncing stream:us_sec_country_codes
tap-spreadsheets-anywhere | INFO Walking /static_data/.
tap-spreadsheets-anywhere | INFO Found 13 files.
tap-spreadsheets-anywhere | INFO Checking 9 resolved objects for any that match regular expression "myfile.*" and were modified since 2011-01-01 00:00:00+00:00
tap-spreadsheets-anywhere | INFO Processing 1 resolved objects that met our criteria. Enable debug verbosity logging for more details.
tap-spreadsheets-anywhere | INFO Syncing file "myfile.csv".
tap-spreadsheets-anywhere | INFO Wrote 309 records for stream "mystream".
target-postgres           | DEBUG MillisLoggingConnection: 0 millis spent executing: b'BEGIN;'
target-postgres           | DEBUG MillisLoggingConnection: 0 millis spent executing: b"\n            SELECT c.relname, obj_description(c.oid, 'pg_class')\n            FROM pg_namespace AS n\n                INNER JOIN pg_class AS c ON n.oid = c.relnamespace\n            WHERE n.nspname = 'meltano';\n        "
target-postgres           | INFO Mapping: mytable to ['mytable']
target-postgres           | INFO Mapping: tp_mytable_mycol__sdc_sequence_idx to None
target-postgres           | DEBUG MillisLoggingConnection: 7 millis spent executing: b"\n            SELECT column_name, data_type, is_nullable FROM information_schema.columns\n            WHERE table_schema = 'meltano' and table_name = 'mytable';\n        "
target-postgres           | DEBUG MillisLoggingConnection: 1 millis spent executing: b"\n            SELECT EXISTS (\n                SELECT 1 FROM pg_tables\n                WHERE schemaname = 'meltano' AND\n                      tablename = 'mytable');\n        "
target-postgres           | INFO Stream mytable (mytable) with max_version None targetting None
target-postgres           | INFO Root table name mytable
target-postgres           | INFO Writing batch with 309 records for `mytable` with `key_properties`: `['MyKeyCol']`
target-postgres           | INFO METRIC: {"type": "counter", "metric": "record_count", "value": 0, "tags": {"count_type": "batch_rows_persisted", "path": ["mytable"], "database": "mydb", "schema": "meltano"}}
target-postgres           | INFO METRIC: {"type": "timer", "metric": "job_duration", "value": 0.0010831356048583984, "tags": {"job_type": "batch", "path": ["mytable"], "database": "mydb", "schema": "meltano", "status": "failed"}}
target-postgres           | ERROR Exception writing records
target-postgres           | Traceback (most recent call last):
target-postgres           |   File ".meltano/loaders/target-postgres/venv/lib/python3.8/site-packages/target_postgres/postgres.py", line 295, in write_batch
target-postgres           |     written_batches_details = self.write_batch_helper(cur,
target-postgres           |   File ".meltano/loaders/target-postgres/venv/lib/python3.8/site-packages/target_postgres/sql_base.py", line 808, in write_batch_helper
target-postgres           |     for table_batch in denest.to_table_batches(schema, key_properties, records):
target-postgres           |   File ".meltano/loaders/target-postgres/venv/lib/python3.8/site-packages/target_postgres/denest.py", line 20, in to_table_batches
target-postgres           |     table_schemas = _get_streamed_table_schemas(schema,
target-postgres           |   File ".meltano/loaders/target-postgres/venv/lib/python3.8/site-packages/target_postgres/denest.py", line 47, in _get_streamed_table_schemas
target-postgres           |     key_prop_schemas[key] = schema['properties'][key]
target-postgres           | KeyError: 'MyKeyCol'
target-postgres           | CRITICAL ('Exception writing records', KeyError('MyKeyCol'))
target-postgres           | Traceback (most recent call last):
target-postgres           |   File ".meltano/loaders/target-postgres/venv/lib/python3.8/site-packages/target_postgres/postgres.py", line 295, in write_batch
target-postgres           |     written_batches_details = self.write_batch_helper(cur,
target-postgres           |   File ".meltano/loaders/target-postgres/venv/lib/python3.8/site-packages/target_postgres/sql_base.py", line 808, in write_batch_helper
target-postgres           |     for table_batch in denest.to_table_batches(schema, key_properties, records):
target-postgres           |   File ".meltano/loaders/target-postgres/venv/lib/python3.8/site-packages/target_postgres/denest.py", line 20, in to_table_batches
target-postgres           |     table_schemas = _get_streamed_table_schemas(schema,
target-postgres           |   File ".meltano/loaders/target-postgres/venv/lib/python3.8/site-packages/target_postgres/denest.py", line 47, in _get_streamed_table_schemas
target-postgres           |     key_prop_schemas[key] = schema['properties'][key]
target-postgres           | KeyError: 'MyKeyCol'
target-postgres           |
target-postgres           | During handling of the above exception, another exception occurred:
target-postgres           |
target-postgres           | Traceback (most recent call last):
target-postgres           |   File ".meltano/loaders/target-postgres/venv/bin/target-postgres", line 8, in <module>
target-postgres           |     sys.exit(cli())
target-postgres           |   File ".meltano/loaders/target-postgres/venv/lib/python3.8/site-packages/target_postgres/__init__.py", line 45, in cli
target-postgres           |     main(args.config)
target-postgres           |   File ".meltano/loaders/target-postgres/venv/lib/python3.8/site-packages/target_postgres/__init__.py", line 39, in main
target-postgres           |     target_tools.main(postgres_target)
target-postgres           |   File ".meltano/loaders/target-postgres/venv/lib/python3.8/site-packages/target_postgres/target_tools.py", line 28, in main
target-postgres           |     stream_to_target(input_stream, target, config=config)
target-postgres           |   File ".meltano/loaders/target-postgres/venv/lib/python3.8/site-packages/target_postgres/target_tools.py", line 77, in stream_to_target
target-postgres           |     raise e
target-postgres           |   File ".meltano/loaders/target-postgres/venv/lib/python3.8/site-packages/target_postgres/target_tools.py", line 70, in stream_to_target
target-postgres           |     state_tracker.flush_streams(force=True)
target-postgres           |   File ".meltano/loaders/target-postgres/venv/lib/python3.8/site-packages/target_postgres/stream_tracker.py", line 47, in flush_streams
target-postgres           |     self._write_batch_and_update_watermarks(stream)
target-postgres           |   File ".meltano/loaders/target-postgres/venv/lib/python3.8/site-packages/target_postgres/stream_tracker.py", line 67, in _write_batch_and_update_watermarks
target-postgres           |     self.target.write_batch(stream_buffer)
target-postgres           |   File ".meltano/loaders/target-postgres/venv/lib/python3.8/site-packages/target_postgres/postgres.py", line 309, in write_batch
target-postgres           |     raise PostgresError(message, ex)
target-postgres           | target_postgres.exceptions.PostgresError: ('Exception writing records', KeyError('MyKeyCol'))
meltano                   | Loading failed (1): target_postgres.exceptions.PostgresError: ('Exception writing records', KeyError('MyKeyCol'))
meltano                   | ELT could not be completed: Loader failed
@nickolasclarke
Copy link

nickolasclarke commented Feb 16, 2024

@laurentS did you ever resolve this? I am also encountering this.

@laurentS
Copy link
Contributor Author

I did not, sorry. I moved on to other solutions to load CSV files, outside the singer/meltano ecosystem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants