The Triple Pattern Fragment Profiler is used to study the performance w.r.t. response time of Triple Pattern Fragments (TPFs). The profiler samples as set of triples from a given TPF and derives a set of triple patterns from those triples by replacing RDF terms with variables. Thereafter, these triple patterns are used to measure the response time of the TPF and record additional metadata.
Prerequisites:
- Unix-based OS (Linux / Mac OS)
- Python 2.7
- pip
Follow these steps to setup and run the profiler:
- Download or clone this git repository
cd tpf_profiler
- Install virtual environment package:
[sudo] pip install virtualenv
- Activate the virtual environment:
. venv/bin/activate
- Optional: Edit the
sources.json
to specify the mappings for TPF server pairs (local and remote) - See the command line tool options via `python run_study.py -h``
Setting up the controlled Environment:
- Find the installation guide for setting up a local TPF server using Node.js here
- HDT Files:
- HDT Tools for generating RDF files from HDT files
- Virtuoso SPARQL Endpoint
Use command line to run the profiler and set the options to specify the profiler settings.
Example:
`
- DBLP TPF with 10 samples and 1 run:
python run_study --url http://data.linkeddatafragments.org/dblp -s 10 -r -1
- DBpedia TPF with 100 samples, 2 runs and write the results to a CSV file:
python run_study --url http://data.linkeddatafragments.org/dbpedia -s 100 -r -2 -w 1
Lars-H. (2018, April 6).
Lars-H/tpf_profiler: Release v0.2 (Version 0.2).
Zenodo. http://doi.org/10.5281/zenodo.1213694
- The raw data create for the evaluation of TPF servers in our study is provided freely available here as a raw CSV file.
- The statistical analysis providing the basis for our evaluation is available in the
notebooks
directory as a Jupyter Notebook. - The visulaizations for the publication is also provided in the
notebooks
directory
This work is licensed under BSD-3-Clause.