PTH_10: DPL to Apache Spark Translator

Translates Data Processing Language (DPL) commands to Apache Spark actions and transformations. Uses ANTLR visitors to generate a list of step objects, which contain the actual implementations of the commands using the Apache Spark API.

Features

Translates a string-based DPL command using the parse tree generated by the PTH_03 ANTLR-based parser to Apache Spark actions and transformations.
Fetch data from a datasource provider (by default, PTH_06 datasource provider) and filter the data with the filters specified in the DPL command.
Apply various transformations and actions to the data with simple easy-to-understand commands.
Supports parallel and sequential modes based on which kind of commands are used. If a command requires batch-based processing, sequential mode will be used. Otherwise, processing will remain on parallel mode, allowing stream processing.
Spark API implementations are enclosed in so-called Step objects, which take a Dataset as input and return the transformed dataset as the return value, allowing for easy reusability of these objects.
ANTLR-based visitor functions purely gather all the necessary parameters for these objects, not containing any implementation logic of the commands themselves.

Documentation

See the official documentation on docs.teragrep.com.

Limitations

Not all commands in the Data Processing Language are yet implemented.

How to

Use:

Create a new DPLParserCatalystContext. It requires a SparkSession object and a com.typesafe.config.Config. The config is usually provided from the Zeppelin component.

DPLParserCatalystContext catCtx = new DPLParserCatalystContext(sparkSession, config);

Create a new DPLParserCatalystVisitor, in which you set the DPLParserCatalystContext.

DPLParserCatalystVisitor catVisitor = new DPLParserCatalystVisitor(catCtx);

Visit the parse tree generated by PTH_03 using the visitor functions with the DPLParserCatalystVisitor.visit() function.

CatalystNode n = (CatalystNode) visitor.visit(tree);

The result of that function is a CatalystNode. It contains a DataStreamWriter, which can be started to start the execution.

n.getDataStreamWriter();

Set the visitor’s Consumer to a function of your liking to view or move the resulting Dataset to the desired component.

visitor.setConsumer((ds, id) -> {
    ds.show();
});

For a more concrete example, check out the PTH_07 Zeppelin DPL Interpreter project.

Compile:

mvn clean install -Pbuild

Contributing

You can involve yourself with our project by opening an issue or submitting a pull request.

Contribution requirements:

All changes must be accompanied by a new or changed test. If you think testing is not required in your pull request, include a sufficient explanation as why you think so.
Security checks must pass
Pull requests must align with the principles and values of extreme programming.
Pull requests must follow the principles of Object Thinking and Elegant Objects (EO).

Read more in our Contributing Guideline.

Contributor License Agreement

Contributors must sign Teragrep Contributor License Agreement before a pull request is accepted to organization’s repositories.

You need to submit the CLA only once. After submitting the CLA you can contribute to all Teragrep’s repositories.

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.github		.github
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.adoc		README.adoc
eclipse-java-formatter.xml		eclipse-java-formatter.xml
license-header		license-header
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PTH_10: DPL to Apache Spark Translator

Features

Documentation

Limitations

How to

Contributing

Contributor License Agreement

About

Releases

Packages

Languages

License

51-code/pth_10

Folders and files

Latest commit

History

Repository files navigation

PTH_10: DPL to Apache Spark Translator

Features

Documentation

Limitations

How to

Contributing

Contributor License Agreement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages