LogPM Benchmark

Introduction

Log PM is a log parser benchmark emphasizing precise in-message parameter detection rather than template-based message clustering. The original paper is titled "LogPM: A new benchmark" and published at .

Log PM introduces a new parsing output called parameter mask, a binary sequence with the same length as the log message where each element indicates if the corresponding message character is a parameter. For instance, the log message:

User u_123 connected from 10.10.1.10

is supposed to produce the mask:

000001111100000000000000001111111111

The traditional metrics, such as group accuracy, F1 score, and rand index, are also available in this benchmark as well. You may modify the metrics in benchmark/baseline_benchmark.py

Usage

Please install python version >= 3.9 if you haven't installed it already. Clone the repository:

git clone https://github.com/M3SOulu/LogPMBenchmark.git
cd LogPMBenchmark

Prepare execution environment:

conda env create -f environment.yaml
conda activate LogPMBenchmark

Run the benchmark:

python main.py benchmark <parser_name> <dataset_name>

E.g.

python main.py benchmark spell proxifier

benchmarks the SPELL parsing algorithm on proxifier dataset.

Run the benchmark for all datasets ignore the dataset argument,and just pass the parser:

python main.py benchmark <parser_name>

Check the list of available datasets and parsers by:

python main.py list

Download a dataset without any benchmark:

python main.py download <dataset_name>

Benchmark a new parser:

Create a new class and make it inherit from BaseBnechmark.

from benchmark.base_classes import BaseBenchmark

class MyParserBenchmark(BaseBenchmark):
    pass

Implement fit, predict_mask, and predict_cluster methods.

class MyParserBenchmark(BaseBenchmark):
    def __init__(self): # initialization (optional)
        ...

    def fit(self, x: Sequence[str]): # learn the latent patterns give the messages
        ...

    def predict_mask(self, x: Sequence[str]) -> Sequence[str]: # predict the parameter masks given the messages
        ...
        
    def predict_cluster(self, x: Sequence[str]) -> Sequence[Hashable]: # predict the cluster IDs given the message
        ...

Add the class to the PARSER dictionary object in benchmark/__init__.py

BENCHMARKS = {
    'no_parameter': NoParameterBenchmark,
    'all_parameter': AllParameterBenchmark,
    'random_parameter': RandomParameterBenchmark,
    'drain': DrainBenchmark,
    'lenma': LenmaBenchmark,
    'spell': SpellBenchmark,
    'my_parser': MyParserBechmark
}

Check if your parser have been added successfully by python main.py list.

❯ python .\main.py list

Parsers:
        no_parameter
        all_parameter
        random_parameter
        drain
        lenma
        spell
        my_parser

Datasets:
        hpc
        zookeeper
        android
        apache
        hadoop
        hdfs
        linux
        openstack
        proxifier
        ssh

Benchmark it by running python main.py benchmark my_parser.

Please consider citing the paper if you use the code. Bibtex:

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
Brain		Brain
benchmark		benchmark
drain3		drain3
lenma		lenma
pyspell		pyspell
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dataset.py		dataset.py
environment.yaml		environment.yaml
helpers.py		helpers.py
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LogPM Benchmark

Introduction

Usage

Benchmark a new parser:

About

Releases

Packages

Contributors 2

Languages

License

M3SOulu/LogPMBenchmark

Folders and files

Latest commit

History

Repository files navigation

LogPM Benchmark

Introduction

Usage

Benchmark a new parser:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages