-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-2990: Only call hsync() and hflush() on supported filesystems #2991
base: master
Are you sure you want to change the base?
Conversation
Instead of log the unsupported call check capabilities and call only on supported filesystems - e.g. S3A does not.
@@ -51,7 +51,9 @@ public void write(byte[] b, int off, int len) throws IOException { | |||
} | |||
|
|||
public void sync() throws IOException { | |||
wrapped.hsync(); | |||
if (wrapped.hasCapability("hsync")) { | |||
wrapped.hsync(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@steveloughran Does this make sense to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hsync is actually really expensive. even on hdfs you shouldn't be calling it as it doesn't return until the data has been replicated and persisted.
S3A tells you off because some apps (hbase etc) really rely on sync to commit work. there's switch to actually fail...turning that on let us find which bits of code actually expected it to work.
Ideally code shouldn't be using outside of work whey want to be 100% confident data is written.
My IDE doesn't show any uses of the method...is there some through reflection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code LGTM -though I'd question the need to explicitly call them.
hsync() is more expensive than hflush(); it's the one which guarantees the data is persisted. I'd only consider it in close(), and swallow any UnsupportedOperationException everywhere ... in case there's an external implementation which fails or someone has turned s3a failure on.
@@ -51,7 +51,9 @@ public void write(byte[] b, int off, int len) throws IOException { | |||
} | |||
|
|||
public void sync() throws IOException { | |||
wrapped.hsync(); | |||
if (wrapped.hasCapability("hsync")) { | |||
wrapped.hsync(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hsync is actually really expensive. even on hdfs you shouldn't be calling it as it doesn't return until the data has been replicated and persisted.
S3A tells you off because some apps (hbase etc) really rely on sync to commit work. there's switch to actually fail...turning that on let us find which bits of code actually expected it to work.
Ideally code shouldn't be using outside of work whey want to be 100% confident data is written.
My IDE doesn't show any uses of the method...is there some through reflection?
@@ -62,7 +64,9 @@ public void flush() throws IOException { | |||
@Override | |||
public void close() throws IOException { | |||
try (FSDataOutputStream fdos = wrapped) { | |||
fdos.hflush(); | |||
if (fdos.hasCapability("hflush")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- adding a delegating
hasCapability()
might be handy - consider what to do on a sync failure. close() will sill be invoked but any exception thrown would probably be from the hflush() failure.
The CI failures are related:
|
that compiler failure means you are running against a very old version of hadoop, 2.8 or earlier as the change is from https://issues.apache.org/jira/browse/HDFS-11644 Keeping the entire hadoop-2/2.7.3 is really preventing the library from using the modern, especially cloud-friendlier APIs -including hadoop 2.9 APIs to probe for capabilities. Compare with spark which is on 3.4.0. Cut it and everyone's life will be much better. Doesn't have to be 3.4.x, but the latest 3.3.x release (3.3.x) |
@@ -62,7 +64,9 @@ public void flush() throws IOException { | |||
@Override | |||
public void close() throws IOException { | |||
try (FSDataOutputStream fdos = wrapped) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking at this more, while it does guarantee wrapped.close() is called and is very pretty, there's a risk that if hflush() raises an exception then anything raised by wrapped.close() is lost.
when working to s3, it's that close() where the write to S3 takes place and is where the more important failures are likely to be raised.
It might be best to catch and log the hflush failure and always call wrapped.close()
commented on this again. that warning is only printed once per process, though it is potentially a sign of a dangerous mismatch between application code and the apps (hbase, some streaming logs) what we could do (and I'll take a hadoop PR) to give that warning message a new log name which is only used for this message. org.apache.hadoop.fs.s3a.needless for example. 😀 you can have it in hadoop 3.4.1 if you do a timely PR otherwise, #2944 will fix the build problems |
Instead of log the unsupported call check capabilities and call only on supported filesystems - e.g. S3A does not.
Rationale for this change
When stream into an HadoopOutputFile on S3A a waring gets logged: Application invoked the Syncable API against stream writing to XXX. This is Unsupported
https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/troubleshooting_s3a.html#UnsupportedOperationException_.E2.80.9CS3A_streams_are_not_Syncable._See_HADOOP-17597..E2.80.9D
What changes are included in this PR?
Instead of log the unsupported call (hflush() and hsync()) check capabilities and call only on supported filesystems - whereas S3A is not.
Are these changes tested?
Yes
Are there any user-facing changes?
No
Closes: #2990