Skip to content
This repository has been archived by the owner on Aug 27, 2019. It is now read-only.

Service to Task Config Mapper #1

Open
kkasravi opened this issue Oct 6, 2015 · 5 comments
Open

Service to Task Config Mapper #1

kkasravi opened this issue Oct 6, 2015 · 5 comments

Comments

@kkasravi
Copy link

kkasravi commented Oct 6, 2015

GearPump Service to Task Config Mapper

Overview

When launching a GearPump application on TAP, the GearPump application may require a number of TAP services across the DAG of Tasks. For example a DAG that is a pipeline that reads from kafka and eventually writes to HBase will require configuration information about where the Kafka and HBase servers are. Additionally TAP related services are kerberos protected which will required kerberos tokens. Finally OAUTH credentials will be required in certain cases where a service needs to be created. Both OAUTH credentials and Kerberos tokens have short TTL's so this information needs to be provided to the GearPump application in a timely manner prior to submitting its DAG of Tasks for execution.

Requirements

GearPump Tasks often require input parameters described by a UserConfig object that is passed to the Task when being reified on the GearPump executor. In order to inject TAP related service information into Task specific configurations we envision a utility that can return the right kind of data that is added to the Task's UserConfig when parsing TAP service related data.

  1. There should be a way to convert a TAP JSON service object to a Task specific data object that is passed in that Task's UserConfig.
  2. The JSON object that holds the TAP service information may describe configuration information for a new type of Task that is being defined in the GearPump application. The utility should be able to handle this use case where a new Task specific data object can be returned using the TAP JSON service object.
  3. These TAP related JSON objects will have a serviceName that uniquely identifies that JSON type.
  4. The Service to Task Config Mapper is a utility that can parse the TAP service information and provides an API that takes a serviceName and returns an object that can be added to the Task's UserConfig with the TAP service information injected into this data type. Examples will be provided in the use cases below.

Use Cases

Within the GearPump Services UI we will need a way to submit an application to the GearPump master which will allocate and run the Tasks using TAP service information where necessary.

1) Providing Task configuration information via the user's clipboard

The user would copy the TAP service information from the TAP service window and paste this information into the GearPump UI that submits an application. The copied information will be a JSON object as described in 3. above. When pasted into the GearPump Services UI form field the UI would convert the data into one of more Task UserConfig's and add this information to the GearPump manifest.

2) Providing Task configuration information via an exported HOCON file

The user would browse to the TAP services dashboard and export TAP services required by the application to a HOCON (.conf) file that would be saved locally on the users computer. The GearPump services UI would include this conf file as part of the jar submission. This would require no modifications to the services UI.

Approve (quorum 2+)

Design

Design for Use Case 2.
Gearpump_TAP_integration_design.docx

Approve (quorum 2+)

@kkasravi
Copy link
Author

kkasravi commented Oct 6, 2015

@clockfly - Don't we need a new issue opened in GearPump that describes the javascript or scalajs object that can accept the pasted information and modifies the GearPump manifest?

@whjiang
Copy link

whjiang commented Oct 6, 2015

hi @kkasravi, from my understanding, what you actually need is:

  1. source/sink that can understand TAP configuration (JSON).
  2. TAP GP broker understand TAP configuration and use it for GP application submission.
    I don't see any requirement for a general GP processor needs to understand such JSON configuration.
    Is this understanding correct?

If so, I think maybe another solution is: add a general TAPSource and a general TAPSink, which can understand the TAP exported JSON configuration. In this way, TAP JSON configuration is just something like hbase-site.xml file which HBaseSink needs. It is certain source/sink specific.

@kkasravi
Copy link
Author

kkasravi commented Oct 6, 2015

I believe @clockfly's design outlines a solution where a utility could take the JSON and translate it to something HBaseSink wants or something KafkaSource would want. A general TAPSource or TAPSink ... would this derive from KafkaSource or HBaseSink or be an adaptor that calls the actual source or sink? If the former this may not be the best design since we have a source or sink that only differs by the config object, if the latter then yes this would be an option but a utility might be less invasive to the core types in the streaming component.

@kkasravi kkasravi added the design label Oct 6, 2015
@whjiang
Copy link

whjiang commented Oct 6, 2015

What I talking about TAPSource/TAPSink is the adapter. I don't want to reimplement all the connectors again.

@kkasravi
Copy link
Author

kkasravi commented Oct 7, 2015

@whjiang ok. In practice this will be somewhat challenging. Let's take a look at HBaseSink and KafkaSource. Both require configuration information and both use special data objects they pass into UserConfig that stores this information.

class HBaseSink(tableName: String, @transient var configuration: Configuration) extends DataSink
Attempting to create a HBaseSink using a Configuration object will not work since this is transient and will not be serialized when the DataSinkTask reifies the sink object from UserConfig.

class KafkaSource( config: KafkaSourceConfig, offsetStorageFactory: OffsetStorageFactory, messageDecoder: MessageDecoder = new DefaultMessageDecoder, timestampFilter: TimeStampFilter = new DefaultTimeStampFilter, private var fetchThread: Option[FetchThread] = None, private var offsetManagers: Map[TopicAndPartition, KafkaOffsetManager] = Map.empty[TopicAndPartition, KafkaOffsetManager]) extends TimeReplayableSource
The KafkaSource uses the KafkaSourceConfig object to store the zookeeper and other information like consumer topic. KafkaSourceConfig uses a Properties object that has this information when being constructed.

But I understand your objective and will explore this approach.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants