Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port To.safe macro to Scala 3 #10

Open
wants to merge 31 commits into
base: scala3-main
Choose a base branch
from
Open

Port To.safe macro to Scala 3 #10

wants to merge 31 commits into from

Conversation

MaximeKjaer
Copy link

Companion PR to spotify#3767

else Some(ls.collect { case Some(x) => x })

private def interpret[T: scala.quoted.Type](using Quotes): Option[Schema[T]] =
Type.of[T] match {
Copy link

@jto jto Apr 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm missing something this means that the possible set of supported schema types is effectively limited to those already defined in scio.

If a user adds a given instance of Schema for type that isn't supported (say for example a Java class), the derivation will simply ignore it. This could be a problem for aliaswa support in Schema but probably something we can live with. We don't really expect users to define their own instances of Schema.

I'd be curious to see how the implementation looks like for Java beans and Avro's SpecificRecord.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm missing something this means that the possible set of supported schema types is effectively limited to those already defined in scio.

Yes, that's the assumption here.

If a user adds a given instance of Schema for type that isn't supported (say for example a Java class), the derivation will simply ignore it. This could be a problem for aliaswa support in Schema but probably something we can live with. We don't really expect users to define their own instances of Schema.

It's still possible to support those use cases by registering custom interpreters to be tried as a last resort. Indeed, it would complicate things and can be subtle: think the case where the user defines the schema and use it in the same project.

This could be a problem for aliaswa support in Schema

It's not clear what you mean above. Could you please clarify?

I'd be curious to see how the implementation looks like for Java beans and Avro's SpecificRecord.

@vincenzobaz and @MaximeKjaer are going to work on it. I'll let them report progress on it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey. Yeah sorry I wanted to talk about LogicalType actually but got the exact term used in beam wrong... I think that's the only case for which a user might want to define a custom type. Last time I checked the implementation was broken in beam and fixing it did not seem to be the priority so I guess we can live without it.

The most common use case for us would be to convert from avro's SpecificRecord to case classes and vice-versa so this one needs to be working really well.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was checking out how we could handle java beans and I hit an important blocker I think.
It is really easy to work around the implicit instance of IsJavaBean[T] because an instance of such type it is just an empty evidence, moreover the code to instantiate it is already ported to macro. It is straightforward to add a case to the pattern match which could like this:

 case _ if TypeRepr.of[T].typeSymbol.flags.is(Flags.JavaDefined) && Try(IsJavaBean.isJavaBeanImpl[T]).isSuccess => 
    ???

I also remarked that we only need a BSchema for the compatibility check, so we might avoid altogether reaching for a RawRecord[T].

The major pain point is that JavaBeanSchema.schemaFor which is used to create the BSchema in question requires a Class[T] (which explains the implicit requirement of a ClassTag[T]). Class[T] is available only at runtime, so it is not possible to have instances of it in the macro.

I am not sure how we can handle this problem besides maybe implementing our own schema derivation for java beans.
What do you think?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This blocker might affect AvroInstances.avroSchema as well given that it relies on an implicit ClassTag

Copy link

@jto jto Apr 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However this might result in a compile time Schema which diverges from the one generated at runtime by the beam

This might be the case but seems unlikely. I think r-eimplementing it is an acceptable solution. We can always have tests to validate the the Schema is the same as what beam would derive.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC for Avro we actually only need to access the avro Schema. There's a schema field in the generated code so it should be possible to access it at compile time since it's just a String ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For avro I see two methods:

  • implicit def avroSchema[T <: SpecificRecord: ClassTag]: Schema[T] which relies on the ClassTag to feed a Class to TypeDescriptor and therefore suffers from the same problem as the Java bean discussed here.
  • def fromAvroSchema(schema: org.apache.avro.Schema): Schema[GenericRecord] which is not generic nor implicit.

I imagine that you refer to the second one. In this case we can pattern match on the call of fromAvroSchema but we have not compile time information about schema: avro.Schema, so I am not sure about how to obtain the Schema[GenericRecord]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jto commit 93de09c proposes an implementation of compile time schema derivation for java beans similar to the one used for case classes. It seems to be accepted by (simple) tests.
Does it look reasonable to you?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Seems reasonable :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants