Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Serialize instance for ByteString #12

Merged
merged 3 commits into from
Sep 26, 2023

Conversation

neeraj97
Copy link

No description provided.

Comment on lines 27 to 28
import Data.ByteString as StrictByteString hiding (count)
import Data.ByteString.Lazy as LazyByteString hiding (count)
Copy link
Member

@adithyaov adithyaov Sep 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import qualified

@@ -171,18 +173,27 @@ main = do
!lazyText <- do
testSList <- Stream.replicateM 20 (genStrictText 50) & Stream.toList
pure $ force $ TextL.fromChunks testSList
!strictByteString <- genStrictByteString 1000
!lazyByteString <- genLazyByteString 10 100 -- 100 Strict bytestrings each of len 10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can keep the distribution the same as the lazy text so we can compare them fairly.

Comment on lines 185 to 188
unless (StrictByteString.length strictByteString == 1000)
(error "TextS.length strictText == 1000")
unless (LazyByteString.length lazyByteString == 1000)
(error "TextS.length strictText == 1000")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change the error message accordingly.

import Streamly.Internal.Data.Unbox (newBytes, MutableByteArray)
import Streamly.Internal.Data.Serialize hiding (encode)

import qualified Streamly.Data.Stream as Stream
import qualified Data.Text as TextS
import qualified Data.Text.Lazy as TextL

import Data.ByteString as StrictByteString hiding (count)
import Data.ByteString.Lazy as LazyByteString hiding (count)
import Data.Word (Word8)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to the top. Keep this along with other explicit imports.


-- Benchmarks
defaultMain
[ bencher "[Int]" intList 100
, bencher "Strict.Text" strictText 100
, bencher "Lazy.Text" lazyText 100
, bencher "Strict.ByteString" strictByteString 100
, bencher "Strict.LazyByteString" lazyByteString 100
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strict.LazyByteString -> Lazy.ByteString

& Stream.toList
& fmap (force . StrictByteString.pack)

genLazyByteString n m = do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even the lazy text generator can be separated out like this.

Comment on lines 7 to 15
import Streamly.Internal.Data.Serialize (Serialize(..))
import Data.ByteString.Internal as Strict
import Data.ByteString.Lazy as Lazy
import qualified Streamly.Internal.Data.Unbox as Unbox
import GHC.Exts
import Foreign.ForeignPtr (withForeignPtr)
import Foreign.Ptr (plusPtr)
import GHC.Base (IO(..))
import Control.Monad (foldM_)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep the explicit imports first, qualified imports next, and the rest after that.
Import qualified where possible.


--------------------------------------------------------------------------------
-- Strict ByteString
--------------------------------------------------------------------------------
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can add an empty line here.

{-# INLINE deserialize #-}
deserialize off arr end = do
(off1, lenBytes) <- deserialize off arr end :: IO (Int, Int)
fp <- Strict.mallocByteString lenBytes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to use Strict.mallocByteString over Strict.create?
Strict.create seems to have some more functionality w.r.t safety that incurs no overhead.

Comment on lines 31 to 33
withForeignPtr fp $ \(Ptr addr#) -> IO $ \s# -> (# copyMutableByteArrayToAddr#
arrS# srcStartBytes# addr# lenBytes# s#
, () #)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indent this properly. 80 characters is the line limit.

Comment on lines 42 to 45
withForeignPtr fp $ \srcPtr -> let !(Ptr addr#) = srcPtr `plusPtr` srcOffset
in IO $ \s# -> (# copyAddrToByteArray#
addr# arrD# dstStartBytes# lenBytes# s#
, () #)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indent this properly, 80 characters is the line limit.

Comment on lines 42 to 45
withForeignPtr fp $ \srcPtr -> let !(Ptr addr#) = srcPtr `plusPtr` srcOffset
in IO $ \s# -> (# copyAddrToByteArray#
addr# arrD# dstStartBytes# lenBytes# s#
, () #)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
withForeignPtr fp $ \srcPtr -> let !(Ptr addr#) = srcPtr `plusPtr` srcOffset
in IO $ \s# -> (# copyAddrToByteArray#
addr# arrD# dstStartBytes# lenBytes# s#
, () #)
withForeignPtr fp $ \srcPtr ->
let !(Ptr addr#) = srcPtr `plusPtr` srcOffset
in IO $ \s# ->
(# copyAddrToByteArray# addr# arrD# dstStartBytes# lenBytes# s#
, () #)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggestion is still not perfect but this will do.


{-# INLINE serialize #-}
serialize off arr (Strict.PS fp srcOffset lenBytes) = do
off1 <- serialize off arr lenBytes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use machine-independent types while serializing. Use Int64 instead of Int


{-# INLINE deserialize #-}
deserialize off arr end = do
(off1, lenBytes) <- deserialize off arr end :: IO (Int, Int)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use Int64 over Int. Use machine-independent types.

--------------------------------------------------------------------------------
instance Serialize Strict.ByteString where
{-# INLINE size #-}
size i (Strict.PS _ _ l) = i + l + size 0 l
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use Unbox.sizeOf (Proxy :: Proxy Int64) or simply just use 8 instead of size 0 l.

i + l + 8

Or,

i + l + Unbox.sizeOf (Proxy :: Proxy Int64)

--------------------------------------------------------------------------------
-- Lazy ByteString
--------------------------------------------------------------------------------
instance Serialize Lazy.ByteString where
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the template haskell helper for deriving serialization for lazy bytestring.

$(Serialize.deriveSerialize ''Lazy.ByteString)

should work.

Although I'm not sure if it will be portable across bytestring upgrades.
Changing the name and order of constructors will change the deriving strategy.

We should have more config options in the template haskell helpers with which we can give an order to the constructors manually.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -0,0 +1,84 @@
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE UnboxedTuples #-}
{-# LANGUAGE UndecidableInstances #-}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document why do you need this extension.

Copy link
Member

@adithyaov adithyaov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, some comments

@adithyaov
Copy link
Member

Add version constraints to bytestring in the cabal file.

@adithyaov adithyaov changed the base branch from master to add-bytestring September 26, 2023 07:15
@adithyaov adithyaov merged commit ab98492 into composewell:add-bytestring Sep 26, 2023
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants