Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use xmlbf for XML parsing #67

Merged
merged 44 commits into from
Jun 3, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
1a3b60e
Hlint cleanups
jrp2014 Jan 23, 2019
287b017
replace some data with newtype
jrp2014 Jan 23, 2019
4fd2edf
Merge pull request #63 from jrp2014/master
robstewart57 Jan 23, 2019
6dc6379
Add support for Algebraic Graphs. Fix #59
wismill May 22, 2019
79aa1f4
hlint
wismill May 22, 2019
c8f61f8
Replace `++` with `<>`
wismill May 22, 2019
a19f164
hlint fmap
wismill May 22, 2019
cd46e94
Fix algebraic-graphs version's bound
wismill May 22, 2019
8ab7a1d
Travis: add lts-13 and drop lts-7
wismill May 22, 2019
b447e4b
Fix issue with Haddock
wismill May 22, 2019
fbf8431
Fix issue with Haddock
wismill May 22, 2019
d32650a
Fix tests when path contains characters that must be escaped.
wismill May 23, 2019
bf7112c
Fix Xeno
wismill May 23, 2019
51a013d
Tidying
wismill May 23, 2019
721a1ff
Make `pPredicateObject` easier to read
wismill May 23, 2019
896e21e
Merge pull request #66 from wismill/wip/hlint
robstewart57 May 23, 2019
56553dd
Merge remote-tracking branch 'origin/master' into wip/Alga
wismill May 23, 2019
fcc08fb
Merge pull request #65 from wismill/wip/Alga
robstewart57 May 23, 2019
b30c7a3
Merge remote-tracking branch 'origin/master' into wip/xmlbf
wismill May 23, 2019
4d312a6
Tidying
wismill May 24, 2019
8fccd87
Use a monad transformer to manage the state
wismill May 24, 2019
acefeea
Improvements
wismill May 24, 2019
d44b348
Move some definitions to ParserUtils
wismill May 25, 2019
7d318d3
Improvements
wismill May 25, 2019
bce4300
Improvements
wismill May 25, 2019
d3f3c73
Collection
wismill May 25, 2019
a348010
Fix typo in XML
wismill May 27, 2019
aa23978
Improvements
wismill May 27, 2019
84af8a9
Fix doc uri
wismill May 27, 2019
eb2159f
Improvements
wismill May 28, 2019
7af4f69
Add Semigroup and Monoid instances to PrefixMappings
wismill May 28, 2019
137d0fe
Improvements
wismill May 28, 2019
93a39c2
Improvements
wismill May 30, 2019
c04d8e4
Update W3C repository
wismill May 30, 2019
f6664cd
Update stack to use the proper version of Xmlbf.
wismill May 30, 2019
d5ab655
Some more cleaning
wismill May 30, 2019
1f8bdb8
Fix test suite for Turtle
wismill May 30, 2019
b22f89c
Fix test-00.ttl
wismill May 30, 2019
4052d1a
Update latest changes of Xmlbf
wismill May 31, 2019
f1a04e9
Improve the Cabal file
wismill Jun 1, 2019
1ad58fc
Documentation
wismill Jun 3, 2019
3ed61db
Add benchmark for XML parsers
wismill Jun 3, 2019
6485b27
Update stack.yaml to the latest commit of the official repo
wismill Jun 3, 2019
8bf9f1f
Documentation
wismill Jun 3, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ TAGS
*.backup
/.cabal-sandbox
cabal.sandbox.config
cabal.project.local
countries.ttl
*.prof
bench/MainCriterion
Expand Down
65 changes: 65 additions & 0 deletions .hlint.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# HLint configuration file
# https://github.com/ndmitchell/hlint
##########################

# This file contains a template configuration file, which is typically
# placed as .hlint.yaml in the root of your project


# Specify additional command line arguments
#
# - arguments: [--color, --cpp-simple, -XQuasiQuotes]


# Control which extensions/flags/modules/functions can be used
#
# - extensions:
# - default: false # all extension are banned by default
# - name: [PatternGuards, ViewPatterns] # only these listed extensions can be used
# - {name: CPP, within: CrossPlatform} # CPP can only be used in a given module
#
# - flags:
# - {name: -w, within: []} # -w is allowed nowhere
#
# - modules:
# - {name: [Data.Set, Data.HashSet], as: Set} # if you import Data.Set qualified, it must be as 'Set'
# - {name: Control.Arrow, within: []} # Certain modules are banned entirely
#
# - functions:
# - {name: unsafePerformIO, within: []} # unsafePerformIO can only appear in no modules


# Add custom hints for this project
#
# Will suggest replacing "wibbleMany [myvar]" with "wibbleOne myvar"
# - error: {lhs: "wibbleMany [x]", rhs: wibbleOne x}


# Turn on hints that are off by default
#
# Ban "module X(module X) where", to require a real export list
# - warn: {name: Use explicit module export list}
#
# Replace a $ b $ c with a . b $ c
# - group: {name: dollar, enabled: true}
#
# Generalise map to fmap, ++ to <>
- group: {name: generalise, enabled: true}


# Ignore some builtin hints
# - ignore: {name: Use let}
# - ignore: {name: Use const, within: SpecialModule} # Only within certain modules


# Define some custom infix operators
# - fixity: infixr 3 ~^#^~


# To generate a suitable file for HLint do:
# $ hlint --default > .hlint.yaml

- ignore: {name: Eta reduce}
- ignore: {name: Redundant bracket}
- ignore: {name: Reduce duplication}
- ignore: {name: Use camelCase}
8 changes: 4 additions & 4 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -81,10 +81,6 @@ matrix:
compiler: ": #stack 7.10.3"
addons: {apt: {packages: [libgmp-dev]}}

- env: BUILD=stack ARGS="--resolver lts-7"
compiler: ": #stack 8.0.1"
addons: {apt: {packages: [libgmp-dev]}}

- env: BUILD=stack ARGS="--resolver lts-9"
compiler: ": #stack 8.0.2"
addons: {apt: {packages: [libgmp-dev]}}
Expand All @@ -97,6 +93,10 @@ matrix:
compiler: ": #stack 8.4.3"
addons: {apt: {packages: [libgmp-dev]}}

- env: BUILD=stack ARGS="--resolver lts-13"
compiler: ": #stack 8.6.5"
addons: {apt: {packages: [libgmp-dev]}}

# Nightly builds are allowed to fail
# - env: BUILD=stack ARGS="--resolver nightly"
# compiler: ": #stack nightly"
Expand Down
95 changes: 54 additions & 41 deletions bench/MainCriterion.hs
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
module Main where

import Prelude hiding (readFile)
import Data.Semigroup (Semigroup(..))
import Criterion
import Criterion.Types
import Criterion.Main
Expand All @@ -18,21 +19,15 @@ import Control.DeepSeq (NFData)
-- $ gzip -d bills.099.actions.rdf.gz

parseXmlRDF :: Rdf a => T.Text -> RDF a
parseXmlRDF s =
let (Right rdf) = parseString (XmlParser Nothing Nothing) s
in rdf
parseXmlRDF = either (error . show) id . parseString (XmlParser Nothing Nothing)
{-# INLINE parseXmlRDF #-}

parseNtRDF :: Rdf a => T.Text -> RDF a
parseNtRDF s =
let (Right rdf) = parseString NTriplesParser s
in rdf
parseNtRDF = either (error . show) id . parseString NTriplesParser
{-# INLINE parseNtRDF #-}

parseTtlRDF :: Rdf a => T.Text -> RDF a
parseTtlRDF s =
let (Right rdf) = parseString (TurtleParser Nothing Nothing) s
in rdf
parseTtlRDF = either (error . show) id . parseString (TurtleParser Nothing Nothing)
{-# INLINE parseTtlRDF #-}

queryGr :: Rdf a => (Maybe Node,Maybe Node,Maybe Node,RDF a) -> [Triple]
Expand All @@ -48,15 +43,19 @@ main :: IO ()
main = defaultMainWith
(defaultConfig {resamples = 100})
[ env
-- [FIXME] Do not rely on system's defaults to read files.
(do fawltyContentTurtle <- readFile "data/ttl/fawlty1.ttl"
fawltyContentNTriples <- readFile "data/nt/all-fawlty-towers.nt"
rdf1' <- parseFile (XmlParser Nothing Nothing) xmlFile
rdf2' <- parseFile (XmlParser Nothing Nothing) xmlFile
let rdf1 = either (error . show) id rdf1' :: RDF TList
xmlContent <- readFile xmlFile
let rdf1' = parseString (XmlParser Nothing Nothing) xmlContent
rdf2' = parseString (XmlParser Nothing Nothing) xmlContent
rdf3' =parseString (XmlParser Nothing Nothing) xmlContent
rdf1 = either (error . show) id rdf1' :: RDF TList
rdf2 = either (error . show) id rdf2' :: RDF AdjHashMap
rdf3 = either (error . show) id rdf3' :: RDF AlgebraicGraph
triples = triplesOf rdf1
return (rdf1, rdf2, triples, fawltyContentNTriples, fawltyContentTurtle)) $
\ ~(triplesList, adjMap, triples, fawltyContentNTriples, fawltyContentTurtle) ->
return (rdf1, rdf2, rdf3, triples, fawltyContentNTriples, fawltyContentTurtle, xmlContent)) $
\ ~(triplesList, adjMap, algGraph, triples, fawltyContentNTriples, fawltyContentTurtle, xmlContent) ->
bgroup
"rdf4h"
[ bgroup
Expand All @@ -81,42 +80,56 @@ main = defaultMainWith
let res = parseString (TurtleParserCustom Nothing Nothing Attoparsec) t :: Either ParseFailure (RDF TList)
in either (error . show) id res
) fawltyContentTurtle
, bench "xml-xmlbf" $
nf (\t ->
let res = parseString (XmlParser Nothing Nothing) t :: Either ParseFailure (RDF TList)
in either (error . show) id res
) xmlContent
, bench "xml-xht" $
nf (\t ->
let res = parseString (XmlParserHXT Nothing Nothing) t :: Either ParseFailure (RDF TList)
in either (error . show) id res
) xmlContent
]
,
bgroup
"query"
(queryBench "TList" triplesList ++
queryBench "AdjHashMap" adjMap
-- queryBench "SP" mapSP ++ queryBench "HashSP" hashMapSP
(queryBench "TList" triplesList <>
queryBench "AdjHashMap" adjMap <>
queryBench "AlgebraicGraph" algGraph
-- queryBench "SP" mapSP <> queryBench "HashSP" hashMapSP
)
, bgroup
"select"
(selectBench "TList" triplesList ++
selectBench "AdjHashMap" adjMap
-- selectBench "SP" mapSP ++ selectBench "HashSP" hashMapSP
(selectBench "TList" triplesList <>
selectBench "AdjHashMap" adjMap <>
selectBench "AlgebraicGraph" algGraph
-- selectBench "SP" mapSP <> selectBench "HashSP" hashMapSP
)
, bgroup
"add-remove-triples"
(addRemoveTriples "TList" triples (empty :: RDF TList) triplesList
++ addRemoveTriples "AdjHashMap" triples (empty :: RDF AdjHashMap) adjMap
(addRemoveTriples "TList" triples (empty :: RDF TList) triplesList <>
addRemoveTriples "AdjHashMap" triples (empty :: RDF AdjHashMap) adjMap <>
addRemoveTriples "AlgebraicGraph" triples (empty :: RDF AlgebraicGraph) algGraph
)
, bgroup
"count_triples"
[ bench "TList" (nf (length . triplesOf) triplesList)
, bench "AdjHashMap" (nf (length . triplesOf) adjMap)
, bench "AlgebraicGraph" (nf (length . triplesOf) algGraph)
]
]
]

selectBench :: Rdf a => String -> RDF a -> [Benchmark]
selectBench label gr =
[ bench (label ++ " SPO") $ nf selectGr (subjSelect,predSelect,objSelect,gr)
, bench (label ++ " SP") $ nf selectGr (subjSelect,predSelect,selectNothing,gr)
, bench (label ++ " S") $ nf selectGr (subjSelect,selectNothing,selectNothing,gr)
, bench (label ++ " PO") $ nf selectGr (selectNothing,predSelect,objSelect,gr)
, bench (label ++ " SO") $ nf selectGr (subjSelect,selectNothing,objSelect,gr)
, bench (label ++ " P") $ nf selectGr (selectNothing,predSelect,selectNothing,gr)
, bench (label ++ " O") $ nf selectGr (selectNothing,selectNothing,objSelect,gr)
[ bench (label <> " SPO") $ nf selectGr (subjSelect,predSelect,objSelect,gr)
, bench (label <> " SP") $ nf selectGr (subjSelect,predSelect,selectNothing,gr)
, bench (label <> " S") $ nf selectGr (subjSelect,selectNothing,selectNothing,gr)
, bench (label <> " PO") $ nf selectGr (selectNothing,predSelect,objSelect,gr)
, bench (label <> " SO") $ nf selectGr (subjSelect,selectNothing,objSelect,gr)
, bench (label <> " P") $ nf selectGr (selectNothing,predSelect,selectNothing,gr)
, bench (label <> " O") $ nf selectGr (selectNothing,selectNothing,objSelect,gr)
]

subjSelect, predSelect, objSelect, selectNothing :: Maybe (Node -> Bool)
Expand All @@ -133,25 +146,25 @@ queryNothing = Nothing

queryBench :: Rdf a => String -> RDF a -> [Benchmark]
queryBench label gr =
[ bench (label ++ " SPO") $ nf queryGr (subjQuery,predQuery,objQuery,gr)
, bench (label ++ " SP") $ nf queryGr (subjQuery,predQuery,queryNothing,gr)
, bench (label ++ " S") $ nf queryGr (subjQuery,queryNothing,queryNothing,gr)
, bench (label ++ " PO") $ nf queryGr (queryNothing,predQuery,objQuery,gr)
, bench (label ++ " SO") $ nf queryGr (subjQuery,queryNothing,objQuery,gr)
, bench (label ++ " P") $ nf queryGr (queryNothing,predQuery,queryNothing,gr)
, bench (label ++ " O") $ nf queryGr (queryNothing,queryNothing,objQuery,gr)
[ bench (label <> " SPO") $ nf queryGr (subjQuery,predQuery,objQuery,gr)
, bench (label <> " SP") $ nf queryGr (subjQuery,predQuery,queryNothing,gr)
, bench (label <> " S") $ nf queryGr (subjQuery,queryNothing,queryNothing,gr)
, bench (label <> " PO") $ nf queryGr (queryNothing,predQuery,objQuery,gr)
, bench (label <> " SO") $ nf queryGr (subjQuery,queryNothing,objQuery,gr)
, bench (label <> " P") $ nf queryGr (queryNothing,predQuery,queryNothing,gr)
, bench (label <> " O") $ nf queryGr (queryNothing,queryNothing,objQuery,gr)
]

addRemoveTriples :: (NFData a,NFData (RDF a), Rdf a) => String -> Triples -> RDF a -> RDF a -> [Benchmark]
addRemoveTriples :: (NFData (RDF a), Rdf a) => String -> Triples -> RDF a -> RDF a -> [Benchmark]
addRemoveTriples lbl triples emptyGr populatedGr =
[ bench (lbl ++ "-add-triples") $ nf addTriples (triples,emptyGr)
, bench (lbl ++ "-remove-triples") $ nf removeTriples (triples,populatedGr)
[ bench (lbl <> "-add-triples") $ nf addTriples (triples,emptyGr)
, bench (lbl <> "-remove-triples") $ nf removeTriples (triples,populatedGr)
]

addTriples :: Rdf a => (Triples,RDF a) -> RDF a
addTriples (triples,emptyGr) =
foldr (\t g -> addTriple g t) emptyGr triples
foldr (flip addTriple) emptyGr triples

removeTriples :: Rdf a => (Triples,RDF a) -> RDF a
removeTriples (triples,populatedGr) =
foldr (\t g -> removeTriple g t) populatedGr triples
foldr (flip removeTriple) populatedGr triples
2 changes: 1 addition & 1 deletion data/ttl/conformance/test-00.out
Original file line number Diff line number Diff line change
@@ -1 +1 @@
_:genid1 <http://www.w3.org/2001/sw/DataAccess/df1/tests/#x> <http://www.w3.org/2001/sw/DataAccess/df1/tests/#y> .
_:genid1 <http://www.w3.org/2001/sw/DataAccess/df1/tests/test-00.ttl#x> <http://www.w3.org/2001/sw/DataAccess/df1/tests/test-00.ttl#y> .
6 changes: 3 additions & 3 deletions examples/ESWC.hs
Original file line number Diff line number Diff line change
Expand Up @@ -13,16 +13,16 @@ heldByProp = "swc:heldBy"
eswcCommitteeMembers :: RDF TList -> [T.Text]
eswcCommitteeMembers graph =
let triples = query graph (Just (unode eswcCommitteeURI)) (Just (unode heldByProp)) Nothing
memberURIs = map objectOf triples
in map
memberURIs = fmap objectOf triples
in fmap
(\memberURI ->
let (LNode (PlainL (firstName::T.Text))) =
objectOf $ head $ query graph (Just memberURI) (Just (unode "foaf:firstName")) Nothing
(LNode (PlainL lastName)) =
objectOf $ head $ query graph (Just memberURI) (Just (unode "foaf:lastName")) Nothing
in (T.append firstName (T.append (T.pack " ") lastName)))
memberURIs

main :: IO ()
main = do
result <- parseURL
Expand Down
2 changes: 1 addition & 1 deletion examples/ParseURLs.hs
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ timBernersLee :: IO ()
timBernersLee = do
Right (rdf::RDF TList) <- parseURL (XmlParser Nothing Nothing) "http://www.w3.org/People/Berners-Lee/card.rdf"
let ts = query rdf (Just (UNode "http://www.w3.org/2011/Talks/0331-hyderabad-tbl/data#talk")) (Just (UNode "dct:title")) Nothing
let talks = map (\(Triple _ _ (LNode (PlainL s))) -> s) ts
let talks = fmap (\(Triple _ _ (LNode (PlainL s))) -> s) ts
print talks

main :: IO ()
Expand Down
Loading