-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First-class COPY support #62
Comments
Hey, I'm adding my notes here on COPY support. I think some of the points here could relate to #61 @jasonmp85 , if you see items in here that are relevant for #61, could you copy+paste them?
On the last item, this has been heavily discussed in the context of PostgreSQL too: https://wiki.postgresql.org/wiki/Error_logging_in_COPY Proprietary databases that extend PostgreSQL usually set a threshold for COPY errors. For example, if the COPY observes 5 errors in one file (or 1% of rows), it stops altogether. Otherwise, COPY tries to continue data loading. |
I think the most difficult problem to overcome will be "what happens when a replica fails partway through", not "what happens when you can't send data to any replica". Do we rollback entirely, or do we support partial data loads (which is a feature not directly supported by the existing We can mark a shard as bad if it has a failure, but what about the other shards? Do we finish ingesting the data to them all? If so, how does the user fill in the missing data while omitting the shards that have already been processed? These are questions we'll need to answer for any usable implementation. |
@marcocitus mentioned pgloader the other day… maybe we can look at it for inspiration re: partial failures or ignore-and-continue semantics. |
Hi, With the current version of pg_shard, we're planning to |
What makes recovering from a failed |
We couldn't safely parallelize different instances of the This answer is related to my comments on #61 last week. Maybe it'd be more on-topic there? |
Yeah let's move there. |
+1 I wanted to combine bdr with pg_shard to have a multi-cluster setup. But bdr uses copy for at least the initial data dump and thus preventing me from setting this up. |
This ticket is to track full support for the
COPY
command. Unlike the trigger implementation in #61, this would mean supporting a bulk method for data ingestion. Issues like consistency and isolation will show up, as well as failure modes.The text was updated successfully, but these errors were encountered: