Skip to content

madgik/ExaSpark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ExaSpark

ExaSpark is an extension of Apache Spark, which supports virtual tables. Furthermore, user-defined functions and user-defined aggregate functions are supported already from Apache Spark.

How to register a virtual table

  • Create a Java class with the name of the desired virtual table
  • The constructor of this class should have the same arguments as the virtual table
  • This java class should contain a mapReduce function, where users write the code for the functionality of the virtual table, create a view or table (better to be temporary) and then return its name.

Finally:

  • This java class should be placed to madgik/exaSpark/vtFunctions

How to run the application on terminal

  • mvn clean compile assembly:single, use this command so as to compile the maven project
  • A jar file would be produced
  • Run the .jar with the java -jar NameOfJar.jar
  • A console should appear, so as to write sql queries

How to write a sql query with virtual table

Example

There are some vtable functions in the path /madgik/exaSpark/vtFunctions so as to test the application or write your own based on them.

$ SELECT * FROM FOO(',','/pathOfFile.txt')
$ SELECT * FROM BOO(',',(SELECT * FROM FOO(',','/pathOfFile.txt')))

Built-in virtual table functions

Apachelogsplit

Breaks a single apache log row into multiple fields.

$ select * from apachelogsplit('/path/of/access_log')

Sample

Returns a random sample_size set of rows.

$ select * from sample(HowMany,(select * from apachelogsplit('/path/of/access_log')))

Feautures

  • Improved console (auto-complete, command history, new design)
  • ReservedWords.txt file contains reserved-sql words for auto-complete method
  • "show virtual tables" command has been included
  • ExaremeSparkSession (extension of SparkSession) has been included, so as to support sql queries with virtual tables without console

New Feautures

Rest api

Through our REST API a user is able to:

  • submit queries

Settings A POST request is used to perform the functionality

  • ExaSpark Rest API listens on port 9090 (can be configured from application.properties file
  • Declare the Accept request HTTP header to:
    • application/json (for json responses)
    • text.csv (for csv responses)
  • Every request should contain a form with the following value:
    • query : the ExaDSpark query

Swagger UI

Visualization and interactaction with the API’s resources

screenshot from 2018-03-02 17-42-42

Endpoints

  • http://:9090/query/ : to perform a query
  • http://:9090/swagger-ui.html : to visualize the API's resources

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published