Skip to content

Enable easier regex testing for contributors to the Charcoal project.

License

Notifications You must be signed in to change notification settings

Andrew5057/sd-regex-testing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SmokeDetector Regex Testing


SmokeDetector Regex Testing (SDRT) provides functionality for testing regexes against metasmoke data and analyzing their results.

It makes heavy use of the Polars library, which it uses to store post data, test regexes, filter results, and more.

Installation

SDRT is available on PyPI under the name sd-regex-testing.

pip install sd-regex-testing

Usage

Python module

It's recommended to use the sdrt alias when importing sd_regex_testing:

import sd_regex_testing as sdrt

Any processing requires an initial call to sdrt.read_json. This function accepts the path to a metasmoke JSON file and returns a Polars DataFrame.

data = sdrt.read_json("path/to/file")

From there, the DataFrame can be tested against a regex. The sdrt polars namespace includes several testing methods:

title = data.sdrt.test_title("test")
username = data.sdrt.test_username("test")
keyword = data.sdrt.test_keyword("test")
website = data.sdrt.test_website("test")

Each of these methods also takes a case_sensitive optional parameter, which defaults to False.

case_sensitive = data.sdrt.test_keyword("test", case_sensitive=True)

The results of a given test can be filtered using the tp, fp, tn, and fn properties, which reflect the effectiveness of the just-tested regex. Each of these properties is a DataFrame containing only the target posts.

tps = keyword.sdrt.tp
fps = keyword.sdrt.fp
tns = keyword.sdrt.tn
fns = keyword.sdrt.fn

Commmand line tool

This package also creates an sdrt command line tool. It takes the path to an MS JSON file as an argument:

sdrt path/to/file

This will open an interactive regex testing session. The test command will test a given regex against the file and store the result.

>>> test (title|username|keyword|website) regex

The tp, fp, tn, and fn commands will report the number of posts with the given result.

>>> tp|fp|tn|fn

The summarize command will pretty-print the counts for all four result types, as well as reporting the last test.

>>> summarize
Regex [last regex] as a [last regex type] yielded [count] TP, [count] FP, [count] TN, and [count] FN.

About

Enable easier regex testing for contributors to the Charcoal project.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages