Skip to content

Gradoop HBase Store

Christopher Rost edited this page Nov 1, 2018 · 4 revisions

Apache HBase Apache HBase™ is the Hadoop database, a distributed, scalable, big data store.

With this adapter implementation you can use Apache HBase as DataSource and DataSink for your graph data.

Getting start

Add gradoop-hbase to your dependency

Compile gradoop hbase with

mvn clean install -DskipTests=true

Copy gradoop-store/gradoop-hbase/target/gradoop-hbase-<ver>.jar into your client lib.

Or you can simply use maven pom as below:

<!-- Maven Gradoop HBase -->
<dependency>
    <groupId>org.gradoop</groupId>
    <artifactId>gradoop-hbase</artifactId>
    <version>${gradoop.version}</version>
</dependency>

Creation of an HBase based Graph-store

// create gradoop HBase configuration
GradoopHBaseConfig config = GradoopHBaseConfig.getDefaultConfig();

// create HBase configuration
HBaseConfiguration hbconfig = HBaseConfiguration.create();

// create store
HBaseEPGMStore graphStore = HBaseEPGMStoreFactory
    .createOrOpenEPGMStore(hbconfig, config);

Now let's add some graph elements

graphStore.writeGraphHead(graphHead);
graphStore.wirteVertex(vertex);
graphStore.writeEdge(edge);

graphStore.flush();

Accessing Data

Example for DataSink & DataSource

Read data from store

// data source
GradoopFlinkConfig flinkConfig = GradoopFlinkConfig
    .createConfig(getExecutionEnvironment());
DataSource hbaseDataSource = new HBaseDataSource(graphStore, flinkConfig);
GraphCollection result = HBaseDataSource.cypher(
    "MATCH (u1:Person)<-[:hasModerator]-(f:Forum)" +
    "(u2:Person)<-[:hasMember]-(f)" +
    "WHERE u1.name = \"Alice\"");

Write data to store

// data sink
GradoopFlinkConfig flinkConfig = GradoopFlinkConfig
    .createConfig(getExecutionEnvironment());
DataSink HBaseSink = new HBaseDataSink(graphStore, flinkConfig);
HBaseSink.write(result);

Store Layout

GraphData (table 'graph_heads')

----------*-------------*-----------------------*------------*-----------------------
  row     |     cf      |           cq          |  timestamp |   value
----------*-------------*-----------------------*------------*-----------------------
          |     m       |           l           |            |  {label}
  {id}    *-------------*-----------------------*------------*-----------------------
          |   p_type    |      {property key}   |            |  {property type byte}
          *-------------*-----------------------*------------*-----------------------
          |   p_value   |      {property key}   |            |  {property value}
----------*-------------*-----------------------*------------*-----------------------


VertexData (table 'vertices')

----------*-------------*-----------------------*------------*-----------------------
  row     |     cf      |           cq          |  timestamp |   value
----------*-------------*-----------------------*------------*-----------------------
          |      m      |            l          |            |  {label}
  {id}    *-------------*-----------------------*------------*-----------------------
          |      m      |            g          |            |  {graph id}
          *-------------*-----------------------*------------*-----------------------
          |   p_type    |      {property key}   |            |  {property type byte}
          *-------------*-----------------------*------------*-----------------------
          |   p_value   |      {property key}   |            |  {property value}
----------*-------------*-----------------------*------------*-----------------------

EdgeData (table 'edges')

----------*-------------*-----------------------*------------*-----------------------
  row     |     cf      |           cq          |  timestamp |   value
----------*-------------*-----------------------*------------*-----------------------
          |      m      |            l          |            |  {label}
          *-------------*-----------------------*------------*-----------------------
          |      m      |            g          |            |  {graph id}
  {id}    *-------------*-----------------------*------------*-----------------------
          |      m      |            s          |            |  {source vertex id}
          *-------------*-----------------------*------------*-----------------------
          |      m      |            t          |            |  {varget vertex id}
          *-------------*-----------------------*------------*-----------------------
          |   p_type    |      {property key}   |            |  {property type byte}
          *-------------*-----------------------*------------*-----------------------
          |   p_value   |      {property key}   |            |  {property value}
----------*-------------*-----------------------*------------*-----------------------