refactor: Barrage stream reading into Chunks #5692

niloc132 · 2024-06-27T19:46:44Z

Like #5552, applies some after-the-fact review of the design for reading Barrage/Flight stream, in anticipation of sharing this code with JavaScript clients.

Removes unused BarrageChunkAppendingMarshaller
Adds more error detail when stream contents don't match metadata
Removes an unused parameter when parsing Flight messages
Inlines width of various types into their readers
Introduces an interface for reading data into chunks, and a factory interface to allow JS clients to supply their own implementations.

Partial #188

…eading schema at beginning of stream

niloc132 · 2024-06-27T19:48:05Z

extensions/barrage/src/main/java/io/deephaven/extensions/barrage/chunk/ByteChunkReader.java

+        this.conversion = conversion;
+    }
+
+    public <T> ChunkReader transform(Function<Byte, T> transform) {


We may want to replace these Function<BoxedPrimitive, T> interfaces with some replicated version to avoid unnecessary boxing of values that can never be null.

In the interest of moving this patch along, I'm going to punt on this, the updated version is no more wrong than it previously was.

Although, we may want the transformers to "transform" null values to give better control to the future-feature of custom formatters. If this were the case then the boxing would be necessary.

I believe that these already have the DH null values as primitives, so a null input is impossible at this time. T could certainly be null though for an output, depending on what kind of chunk is going to be written to (not controlled by this code).

I simply notice that we do not call the transformer if the value is the null value. As in, a custom transformer can't react to any null values. I see your point that we could use the deephaven null value and that using primitives will make it very clear that null is being represented in a non-boxed way. You're probably right that we should avoid boxing here.

...s/barrage/src/main/java/io/deephaven/extensions/barrage/chunk/ChunkInputStreamGenerator.java

extensions/barrage/src/main/java/io/deephaven/extensions/barrage/util/BarrageStreamReader.java

.../barrage/src/main/java/io/deephaven/extensions/barrage/chunk/DefaultChunkReadingFactory.java

extensions/barrage/src/main/java/io/deephaven/extensions/barrage/chunk/ChunkReaderFactory.java

...ge/src/main/java/io/deephaven/extensions/barrage/chunk/BooleanChunkInputStreamGenerator.java

nbauernfeind · 2024-07-15T15:35:53Z

extensions/barrage/src/main/java/io/deephaven/extensions/barrage/chunk/ByteChunkReader.java

+        this.conversion = conversion;
+    }
+
+    public <T> ChunkReader transform(Function<Byte, T> transform) {


Although, we may want the transformers to "transform" null values to give better control to the future-feature of custom formatters. If this were the case then the boxing would be necessary.

extensions/barrage/src/main/java/io/deephaven/extensions/barrage/chunk/ChunkReaderFactory.java

nbauernfeind · 2024-07-15T15:45:34Z

...nsions/barrage/src/main/java/io/deephaven/extensions/barrage/util/ArrowToTableConverter.java

+        ByteBuffer original = message.getByteBuffer();
+        ByteBuffer copy = ByteBuffer.allocate(original.remaining()).put(original).rewind();
+        Schema schema = new Schema();
+        Message.getRootAsMessage(copy).header(schema);


We need a detailed comment as to why we need to copy this. I suspect the reason is that the converted arrow schema references the new buffer? We may want to push the copying into BarrageUtil if that's the case because it's super common to assume that the byte buffer is temporarily immutable.

It isn't quite about immutability, but the fact that the ByteBuffer is owned by python, and if py frees the underlying buffer we'll be reading garbage when trying to handle a later RecordBatch.

nbauernfeind · 2024-07-15T15:47:36Z

extensions/barrage/src/main/java/io/deephaven/extensions/barrage/util/BarrageStreamReader.java

+                ByteBuffer copy = ByteBuffer.allocate(original.remaining()).put(original).rewind();
+                Schema schema = new Schema();
+                Message.getRootAsMessage(copy).header(schema);
+                header.header(schema);


If possible I would like it more obvious why we need to copy here. (e.g. a comment related to what references get leaked)

I'm not 100% that we need to copy in this case - technically it appears no since line 87 does a ByteBuffer.wrap(). Instead this is an attempt to be defensive in case a future impl is reading from a slice/etc of the ByteBuffer that came in over the wire.

niloc132 added 11 commits June 27, 2024 14:36

Remove dead class

86a5ffb

Make assertion provide more info

8eea3a3

Remove unused BitSet param

1d36724

Move BYTES constant into each impl

8e2eb96

Make two reader methods public so they can be accessed from web

6fc2c6c

Commit #1 reading chunks, checkpoint to talk to nate, next will try r…

7eb176c

…eading schema at beginning of stream

Commit #2, mostly mechanical changes, splitting creation and reading

574b9fc

Commit #3, create vector/array chunk readers to do type lookups once

117c94f

Commit #4, replicate new chunk readers for primitives

cd2039f

Commit #5, also boolean chunk reader

5d51345

Better naming, docs

65aac86

niloc132 added barrage NoDocumentationNeeded NoReleaseNotesNeeded No release notes are needed. labels Jun 27, 2024

niloc132 added this to the June 2024 milestone Jun 27, 2024

niloc132 requested a review from nbauernfeind June 27, 2024 19:46

niloc132 self-assigned this Jun 27, 2024

niloc132 commented Jun 27, 2024

View reviewed changes

niloc132 added 2 commits June 27, 2024 15:33

Make use of Schema safe, only look at copied buffers

54d828e

Remove deprecated method, fix tests so they can have a Field

c981291

niloc132 commented Jun 28, 2024

View reviewed changes

.../barrage/src/main/java/io/deephaven/extensions/barrage/chunk/DefaultChunkReadingFactory.java Outdated Show resolved Hide resolved

extensions/barrage/src/main/java/io/deephaven/extensions/barrage/chunk/ChunkReaderFactory.java Outdated Show resolved Hide resolved

Rewind bytebuffers after copying

f40a9a7

nbauernfeind reviewed Jul 15, 2024

View reviewed changes

niloc132 added 3 commits July 15, 2024 11:00

Merge branch 'main' into 188-prep

1b61cb9

Clean up unused imports across the diff

e376bb3

Rename/rearrange classes from review

1b5764e

niloc132 requested a review from nbauernfeind July 15, 2024 20:53

nbauernfeind approved these changes Jul 18, 2024

View reviewed changes

niloc132 merged commit 90b9283 into deephaven:main Jul 18, 2024
16 checks passed

github-actions bot locked and limited conversation to collaborators Jul 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: Barrage stream reading into Chunks #5692

refactor: Barrage stream reading into Chunks #5692

niloc132 commented Jun 27, 2024

niloc132 Jun 27, 2024

niloc132 Jun 28, 2024

nbauernfeind Jul 15, 2024

niloc132 Jul 15, 2024

nbauernfeind Jul 18, 2024

nbauernfeind Jul 15, 2024

nbauernfeind Jul 15, 2024

niloc132 Jul 15, 2024

nbauernfeind Jul 15, 2024

niloc132 Jul 15, 2024

refactor: Barrage stream reading into Chunks #5692

refactor: Barrage stream reading into Chunks #5692

Conversation

niloc132 commented Jun 27, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment