#! [euphoria-flink] Avoid extra shuffle when windowing on streaming #52

xitep · 2017-03-21T08:44:02Z

Moves the map function into the operator chain being executed before .keyBy avoiding an extra re-balancing operation.

xitep · 2017-03-21T08:47:30Z

I'm sorry, I didn't spot this yesterday as part of the review for #50.

Here's the execution graph with the extra re-balancing:

Here's the exec graph of the same program with the fix applied:

je-ik · 2017-03-21T09:04:03Z

euphoria-flink/src/main/java/cz/seznam/euphoria/flink/streaming/ReduceStateByKeyTranslator.java

-            .setParallelism(operator.getParallelism());
+            // ~ execute in the same chain of the input's processing
+            // so far, thereby, avoiding an unnecessary shuffle
+            .setParallelism(input.getParallelism());


What is the difference of input.getParallelism() and operator.getParallelism()? Shouldn't this be the same provided that to operator didn't define its own parallelism?

input.getParallelism() returns the parallelism the corresponding flink's dataset

operator.getParallism() returns the target parallelism intended for the operator (in this case, rsbk)

Yes, okay, but if the user code does not define specific parallelism (for RSBK), shouldn't this be the same?

ah, yes. that's correct! in that case input.parallelism will equal operator.parallelism, that's right.

Okay, I didn't look into details of this PR, but will the RSBK respect the user-supplied parallelism (if any) after this change?

yes, it will. the attached execution plan shows a program where the RSBK is instructed for a 120 partitions output.

Yes, user-supplied parallelism is correctly applied to the operation after shuffle. It's basically map -> keyBy -> setParallelism -> reduceByKey.

vanekjar · 2017-03-21T11:34:42Z

Good catch! Thanks 👍

#! [euphoria-flink] Avoid extra shuffle when windowing on streaming

b1a1f9b

xitep requested a review from vanekjar March 21, 2017 08:44

je-ik reviewed Mar 21, 2017

View reviewed changes

vanekjar merged commit 4312951 into master Mar 21, 2017

vanekjar deleted the pete/align-parallelism branch March 21, 2017 11:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#! [euphoria-flink] Avoid extra shuffle when windowing on streaming #52

#! [euphoria-flink] Avoid extra shuffle when windowing on streaming #52

xitep commented Mar 21, 2017

xitep commented Mar 21, 2017 •

edited

Loading

je-ik Mar 21, 2017

xitep Mar 21, 2017

je-ik Mar 21, 2017

xitep Mar 21, 2017

je-ik Mar 21, 2017

xitep Mar 21, 2017

vanekjar Mar 21, 2017

vanekjar commented Mar 21, 2017

#! [euphoria-flink] Avoid extra shuffle when windowing on streaming #52

#! [euphoria-flink] Avoid extra shuffle when windowing on streaming #52

Conversation

xitep commented Mar 21, 2017

xitep commented Mar 21, 2017 • edited Loading

je-ik Mar 21, 2017

Choose a reason for hiding this comment

xitep Mar 21, 2017

Choose a reason for hiding this comment

je-ik Mar 21, 2017

Choose a reason for hiding this comment

xitep Mar 21, 2017

Choose a reason for hiding this comment

je-ik Mar 21, 2017

Choose a reason for hiding this comment

xitep Mar 21, 2017

Choose a reason for hiding this comment

vanekjar Mar 21, 2017

Choose a reason for hiding this comment

vanekjar commented Mar 21, 2017

xitep commented Mar 21, 2017 •

edited

Loading