Skip to content

🐍 Change-based Offset-enabled Bidirectional RDF Archive

Notifications You must be signed in to change notification settings

rdfostrich/cobra

Repository files navigation

COBRA

Change-based Offset-enabled Bidirectional RDF Archive.

Build Status Docker Automated Build

COBRA is a bidirectional extension of OSTRICH, an RDF triple store that allows multiple versions of a dataset to be stored and queried at the same time.

Warning: this is experimental software

The store is a hybrid between snapshot, delta and timestamp-based storage, which provides a good trade-off between storage size and query time. It provides several built-in algorithms to enable efficient iterator-based queries at a certain version, between any two versions, and for versions. These queries support limits and offsets for any triple pattern.

Insertion is done by first inserting a dataset snapshot, which is encoded in HDT. After that, deltas can be inserted, which contain additions and deletions based on the last delta or snapshot.

More details on COBRA can be found in our article. More details on OSTRICH can be found in our journal or demo articles.

Building

COBRA requires ZLib, Kyoto Cabinet and CMake (compilation only) to be installed.

Clone this repo with the --recurse-submodules option in your git clone command.

Compile:

$ mkdir build
$ cd build
$ cmake ..
$ make

Running

The COBRA dataset will always be loaded from the current directory.

For more information, please refer to the OSTRICH documentation.

Compiler variables

PATCH_INSERT_BUFFER_SIZE: The size of the triple parser buffer during patch insertion. (default 100)

FLUSH_POSITIONS_COUNT: The amount of triples after which the patch positions should be flushed to disk, to avoid memory issues. (default 500000)

FLUSH_TRIPLES_COUNT: The amount of triples after which the store should be flushed to disk, to avoid memory issues. (default 500000)

KC_MEMORY_MAP_SIZE: The KC memory map size per tree. (default 1LL << 27 = 128MB)

KC_PAGE_CACHE_SIZE: The KC page cache size per tree. (default 1LL << 25 = 32MB)

MIN_ADDITION_COUNT: The minimum addition triple count so that it will be stored in the db. Changing this value only has effect during insertion time. Lookups are compatible with any value. (default 200)

Reproducing Experiments

To reproduce experiments and bring the results to this repo, see the subdirectories of the Experiments directory in this project.

The old Experiments folder is still available for illustration, as ExperimentsOld.

License

This software is written by Thibault Mahieu and Ruben Taelman and colleagues.

This code is copyrighted by Ghent University – imec and released under the MIT license.