Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[euphoria-core] #259 Hints are not runtime specific. #268

Merged
merged 2 commits into from
Mar 8, 2018

Conversation

mareksimunek
Copy link
Contributor

#259 first part of unifying hints

implemented

public Dataset<OUT> output(Hint... hints)

Its breaking change because withHints on Join operator no longer exists

Copy link
Contributor

@je-ik je-ik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I din't walk though the test, I think that they will need more tuning and adding more generic tests after the core part is done. Now I think that the most important change would be to move the hints from Dataset to operator.


/**
* A dataset abstraction.
*
* @param <T> type of elements of this data set
*/
@Audience(Audience.Type.CLIENT)
public interface Dataset<T> extends Serializable {
public interface Dataset<T> extends OutputHintAware<OutputHint>, Serializable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand purpose of this interface. It looks odd in the context of client API. We should definitely keep this set on operator, not dataset, because operator is not (directly) part of client API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interface removed. Hints are now on operator

@@ -39,9 +42,9 @@
* @return a dataset representing the output of the given operator
*/
public static <IN, OUT> Dataset<OUT> createOutputFor(
Flow flow, Dataset<IN> input, Operator<IN, OUT> op) {
Flow flow, Dataset<IN> input, Operator<IN, OUT> op, Set<OutputHint> outputHints) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we set output hints on operator, there is no need for additional parameter here. On Beam branch, I already further simplified this method, because the parameter flow is redundant, because it can be retrieved from both input and op.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hints are now on operator

import javax.annotation.Nullable;
import java.util.Collection;
import java.util.Collections;
import java.util.Set;

/**
* {@code PCollection} that is input of a {@code Flow}.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this is not part of this PR, can we fix the comment here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in #271

@Override
public Set<OutputHint> getHints() {
return Collections.emptySet();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another +1 for moving hints to operator, this useless method will disappear. Moreover, the hints are called OutputHint, and Dataset has no "output". Operator does.

@@ -106,24 +105,13 @@
*
* @return the dataset representing the new operator's output
*/
default Dataset<V> outputValues() {
default Dataset<V> outputValues(OutputHint... outputHints) {
return MapElements
.named("extract-values")
.of(output())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hints should be set here I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't, output Dataset is with values. There would hint get lost. (last operator is MapElements). I added test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would the hint get lost? The graph can be arbitrarily complicated and as a general rule, this should yeild the same runtime behavior:

 Dataset input = ...;
 Dataset<Pair> output = ReduceByKey.of(input)
   ...
   .output(hints);
 MapElements.of(input)
   .using(Pair::getSecond)
   .output()
   .persist(...)

it should behave the same way as

 Dataset input = ...;
 ReduceByKey.of(input)
    ...
    .outputValues(hints)
    .persist(...)

That is because non-shuffle operators should forward hints to downstream Datasets (unless overridden). That is due to the construction of the hints. The operator that the hint applies to is really RBK, not the MapElements.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion, for now Hints doesn't propagate downstream in flow. Because we don't know if Hint will be still valid in next operator.

Flow flow = input.getFlow();
FlatMap<IN, OUT> map = new FlatMap<>(name, flow, input, functor, evtTimeFn);
FlatMap<IN, OUT> map = new FlatMap<>(name, flow, input, functor, evtTimeFn, Sets.newHashSet
(outputHints));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opening parentheses placement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -426,6 +443,11 @@ public boolean isCombinable() {
return valueExtractor;
}

@Override
public Dataset<Pair<KEY, OUT>> output(OutputHint... outputHints) {
return output();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks odd. Why two methods, when the varargs version can handle the zero argument version as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed, because it cannot be reached (we are using Builders instead)

import java.util.Set;

@Audience(Audience.Type.INTERNAL)
public interface HintAware<HINT extends Hint> {
public interface OutputHintAware<HINT extends OutputHint> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would drop this whole interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

/**
* Extra information for runner about Dataset size
*/
public enum SizeHint implements OutputHint {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use @Audience annotation here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

assertTrue(join.listInputs().stream().anyMatch(input -> ((Dataset) input).getHints().contains(new
TestHint())));
assertTrue(join.listInputs().stream().anyMatch(input -> ((Dataset) input).getHints().contains(new
TestHint2())));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two lines wrap strangely.

@mareksimunek mareksimunek force-pushed the simunek/259/hintsUnified branch 2 times, most recently from 0e18e81 to 9482640 Compare March 7, 2018 13:12
Copy link
Contributor

@dmvk dmvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I have just few minor comments, good job 👍

}

@SuppressWarnings("unchecked")
ReduceByKey(String name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would prefer sticking with two constructors

Collections.emptySet());
}

ReduceStateByKey(String name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

((Dataset) input).getProducer().getHints().contains(new Util.TestHint())));
assertTrue(join.listInputs().stream().anyMatch(input ->
((Dataset) input).getProducer().getHints().contains(new Util.TestHint2())));
assertEquals(2, ((Dataset) join.listInputs().stream().findFirst().get()).getProducer().getHints().size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reckon multilining this would be beneficial :)

this.valueExtractor = valueExtractor;
}

@Override
public DAG<Operator<?, ?>> getBasicOps() {
ReduceByKey<IN, KEY, Long, Long, W> reduceByKey =
new ReduceByKey<>(getName(), input.getFlow(), input,
keyExtractor, valueExtractor, windowing, Sums.ofLongs());
keyExtractor, valueExtractor, windowing, Sums.ofLongs(), this.getHints());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should stick with the same coding style :) Can we remove this.?

Copy link
Contributor

@je-ik je-ik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

fixed tests

[euphoria-core] hints in getBasicOps are only on last operator

[euphoria-core] code style corrections

[euphoria-core] code style corrections 2
@dmvk dmvk merged commit 6c3f3db into master Mar 8, 2018
@dmvk dmvk deleted the simunek/259/hintsUnified branch March 8, 2018 15:23
mareksimunek pushed a commit that referenced this pull request May 4, 2018
[euphoria-core] #259 Hints are not runtime specific.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants