FIX: Account for custom scalar types of incoming values #737

ivany4 · 2021-06-09T13:27:41Z

We are using JPype, which translates Java objects to Python. Recent versions of this library convert Java int, double and similar to JInt, JDouble, which are custom subclasses of corresponding Python types.

Now, when such value arrives on a channel, it results in broken logic, once its type is recorded with self.channeltype = type(value), namely:

Checks such as self.channeltype == float will evaluate to False, when JDouble is in play
PyQt signals may break, e.g. self.send_value_signal[self.channeltype].emit() will enumerate through all available subscripts, not finding an overload for JDouble, it will pick the last available, i.e. np.ndarray. The resulting exception is very confusing:

TypeError: PyDMWritableWidget.send_value_signal[numpy.ndarray].emit(): argument 1 has unexpected type 'float'

This PR tries to record self.channeltype as base types, where appropriate.

…of scalar subclass

codecov-commenter · 2021-06-09T13:47:18Z

Codecov Report

Merging #737 (d37a975) into master (00974e9) will decrease coverage by 1.77%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #737      +/-   ##
==========================================
- Coverage   58.65%   56.87%   -1.78%     
==========================================
  Files          88       88              
  Lines        9984     9550     -434     
==========================================
- Hits         5856     5432     -424     
+ Misses       4128     4118      -10

Impacted Files	Coverage Δ
pydm/widgets/base.py	`89.02% <100.00%> (-1.52%)`	⬇️
pydm/widgets/tab_bar.py	`33.54% <0.00%> (-8.77%)`	⬇️
pydm/widgets/image.py	`26.53% <0.00%> (-8.06%)`	⬇️
pydm/widgets/scale.py	`19.42% <0.00%> (-5.81%)`	⬇️
pydm/widgets/waveformtable.py	`20.00% <0.00%> (-5.00%)`	⬇️
pydm/widgets/embedded_display.py	`21.56% <0.00%> (-4.82%)`	⬇️
...ydm/connection_inspector/connection_table_model.py	`25.00% <0.00%> (-3.82%)`	⬇️
pydm/widgets/waveformplot.py	`22.79% <0.00%> (-3.81%)`	⬇️
pydm/widgets/scatterplot.py	`25.00% <0.00%> (-3.58%)`	⬇️
pydm/widgets/enum_button.py	`71.14% <0.00%> (-3.42%)`	⬇️
... and 37 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 00974e9...d37a975. Read the comment docs.

Thrameos

Minor suggestions.

Thrameos · 2021-06-09T17:27:15Z

pydm/widgets/base.py

+        for base_type in [int, float, str, np.ndarray]:
+            if self.channeltype != base_type and issubclass(self.channeltype, base_type):
+                # Leave bool and int separate, we're concerned about user-defined subclasses, not native Python ones
+                if self.channeltype != bool:
+                    self.channeltype = base_type
+                break


Not sure how often code gets called, but wouldn't it be more efficient to test for bool once and then in the loop it can just test if something is a subtype.

if self.channeltype !=bool: for base_type in [int, float, str, np.ndarray]: if issubclass(self.channeltype, base_type): self.channeltype = base_type break

Notice the check for self.channeltype!=base_type is unnecessary. If it is int then it would pass the subclass check and set channeltype to int which it already was and then break.

I agree with @Thrameos's sentiment here. value_changed gets called a lot, maybe more than any other method in all of PyDM, so it pays to make this check as lightweight as possible.

Maybe something like this?

if self.channeltype not in (int, float, bool, str, np.ndarray): for base_type in (int, float, str, np.ndarray): if issubclass(self.channeltype, base_type): self.channeltype = base_type break

If this is a critical section then skipping the for loop and the implicit loop from "not in" entirely would be the best policy. Instead use a global table cache which would be O(1).

value_type = type(value) self.channeltype = TYPE_TABLE.get(value_type, None) if self.channeltype is None: # assume we will use the type as is self.channeltype = value_type # check for user defined subtypes for base_type in (int, float, str, np.ndarray): if issubclass(self.channeltype, base_type): self.channeltype = base_type break # Cache the result so all future look ups are O(1) TABLE_TYPE[value_type] = self.channeltype

The table would be prepopulated with TYPE_TABLE[bool] = bool

Each time it is called we have one hash map look up and a comparison. It it is not found then the search occurs to find the best type. That result is cached so it never has to evaluate the loop for that type again.

hhslepicka · 2021-06-10T04:06:26Z

Wouldn't it be best to address this conversion over at the data plugin and let PyDM deal with the basic types?
You could handle that into your plugin when JPype give you the value before emitting and when you receive the write you can cast it back to the proper type via a reverse lookup dictionary.

ivany4 · 2021-06-10T08:01:56Z

Wouldn't it be best to address this conversion over at the data plugin and let PyDM deal with the basic types?
You could handle that into your plugin when JPype give you the value before emitting and when you receive the write you can cast it back to the proper type via a reverse lookup dictionary.

That depends on the perspective. If there were multiple plugins based on JPype, each of them would need to handle that separately.

ivany4 · 2021-07-05T09:42:08Z

@hhslepicka , what would be an acceptable performance hit in this method?
Compared to original PyDM code, I'm observing 2-8% slowdown in @Thrameos 's hashmap solution, which is by far the most performant.
This is perf on my machine with various values for 100000 value_changed calls each (averaged over 5 attempts):

Value	123	123.47	1.00E+02	0x1FF	0b100	TRUE	FALSE	np.array([123, 456])	"test"	FloatSubclass(0.5)	StrSubclass("test")
Orig Avg (ms)	1,282,931.51	1,322,600.19	1,288,669.65	1,350,867.69	1,318,794.36	1,309,725.72	1,323,938.08	4,508,809.83	954,742.51	1,371,229.86	955,796.24
HashMap Avg (ms)	1,351,577.09	1,361,390.72	1,358,056.51	1,357,154.96	1,350,078.21	1,373,699.45	1,375,790.97	4,567,240.99	1,002,908.61	1,420,907.02	1,041,290.06
Increase (%)	5.351%	2.933%	5.384%	0.465%	2.372%	4.885%	3.917%	1.296%	5.045%	3.623%	8.945%

Thrameos · 2021-07-06T16:12:37Z

@ivany4 Perhaps you can make up for the difference by removing the logic

            try:
                if self.channeltype == unicode:
                    # For Python 2.7, set the the channel type to str instead of unicode
                    self.channeltype = str
            except NameError:
                pass

Just add unicode to the cache mechanism as a prepopulated.

mattgibbs · 2021-07-13T00:24:24Z

Hi @ivany4, sorry for the silence. @hhslepicka left SLAC, and I have only been able to spend tiny amounts of time on PyDM lately.

I am still against including this in the base widget, I agree with @hhslepicka - this seems like it should happen at the plugin layer. I would guess you could have a common base class for a JPype-based plugin, and implement the conversion to a base type there.

Your most recent implementation seems reasonable from a performance standpoint, but I'm objecting for architectural reasons. If there's a strong reason why this can't be implemented in the data plugin, I'd be willing to merge.

ivany4 · 2021-07-19T12:47:03Z

Hi @ivany4, sorry for the silence. @hhslepicka left SLAC, and I have only been able to spend tiny amounts of time on PyDM lately.

I am still against including this in the base widget, I agree with @hhslepicka - this seems like it should happen at the plugin layer. I would guess you could have a common base class for a JPype-based plugin, and implement the conversion to a base type there.

Your most recent implementation seems reasonable from a performance standpoint, but I'm objecting for architectural reasons. If there's a strong reason why this can't be implemented in the data plugin, I'd be willing to merge.

Thanks for the feedback, @mattgibbs . Sad to see @hhslepicka leave :/
I understand your concern, and I would be happy to find another elegant solution (throw more ideas, if you have them). I was trying to avoid implementing conversion in the plugin for few reasons:

It means that incoming data will be copied into a native Python type, which sounds like a big performance hit to me (not measured though), because every incoming piece of data will be copied. My initial approach minimized the impact by playing with channeltype, which is used less frequently, mainly for writing (but unfortunately is reset on every incoming value). However I think using hash tables before updating channeltype should still be more performant than copying incoming data every single time.
PyDM implementation implies that native Python types cannot be subclassed, which may impact more than just JPype derivatives, if some other system takes the same subclassing approach. If you think this is perfectly valid solution, it's worth documenting that.

I'm currently researching the possibility to monkey-patch __eq__ for JPype types so that e.g. type(self.channeltype) == bool could succeed for JBoolean without the need for additional logic on PyDM level. Would be curious to hear @Thrameos 's opinion how bad of an idea that is. It is still insufficient though for cases like self.send_value_signal[self.channeltype].emit(num_value).

Thrameos · 2021-07-19T18:41:59Z

Monkey patching the equals for a type will undoubtedly create unexpected side effects in users code. I can't say what is would break only that I would certainly not add such a hack to JPype release nor recommend hacking to add a bad contract just to fix another libraries code. JPype is not doing anything special here as other toolkits such as numpy as doing the same. The existing code is special cases here for "np.array" and "unicode" already but I could see the exact same problem happen of "np.float32" or other external types. So this is not really a JPype issue by rather an issue with how pydm is using the type information. Using the "type" as a look up for an action as is being done here is a recipe for disaster as it violates the concept of duck typing. One should be able to define a type that meets the specification (or in this case merely derives from a base type) without breaking code.

I would recommend that the hash solution with a plugin function for constructing the channel type would be the best solution. The user has the option of controlling the channel type by installing a function and the hash solution would be faster than the existing logic as it already has several special cases. Placing a user defined plug in so that they can add a type conversion with known rules is better than requiring a monkey patch.

lets consider the original code

   def value_changed(self, new_val):
        """
        Callback invoked when the Channel value is changed.
        Parameters
        ----------
        new_val : str, int, float, bool or np.ndarray
            The new value from the channel. The type depends on the channel.
        """
        self.value = new_val
        self.channeltype = type(self.value)
        if self.channeltype == np.ndarray:
            self.subtype = self.value.dtype.type
        else:
            try:
                if self.channeltype == unicode:
                    # For Python 2.7, set the the channel type to str instead of unicode
                    self.channeltype = str
            except NameError:
                pass

In this the type of the channel is doing 2 special cases one of which will generate an exception which is clearly not a very great performance. Also any derived types for np.ndarray will also miss the if clause. So there is already problems with the existing code.

The alternative would be.

CHANNEL_CACHE = {}
CHANNEL_HANDLERS = ()

# Make unicode go to str if applicable
try:
     CHANNEL_CACHE[unicode] = str
except NameError:
     pass

# Define a hook for the user to handle derived types gracefully.
def add_channel_handler(method):
    HANDLERS.append(method)

   ...

   def value_changed(self, new_val):
        """
        Callback invoked when the Channel value is changed.
        Parameters
        ----------
        new_val : str, int, float, bool or np.ndarray
            The new value from the channel. The type depends on the channel.
        """
        self.value = new_val
        requested = type(self.value)
        self.channeltype = CHANNEL_CACHE.get(requested, None)

        # If this is a new type we need to search for it in the handlers
        if self.channeltype is None:
              self.channeltype = requested

              # first handler that knows the type will be used
              for handler in CHANNEL_HANDLERS:
                   tp = handler(requested)
                   if tp is not None:
                        self.channeltype = tp
                        break

              # cache so we never hit the lookup for this type again
              CHANNEL_CACHE[requested] = self.channeltype

        # Equals may be faster than isinstance though this will break for derived types.
        if isinstance(self.channeltype, np.ndarray):
            self.subtype = self.value.dtype.type

Now the user can install whatever hooks they want to ensure that derived types chose the appropriate base. This doesn't force any solution and is not particular to JPype nor is magical for ints or floats. The user has to install a handler to get the alternative behavior.

Ivan Sinkarenko added 2 commits June 9, 2021 15:18

FIX: Use base type as channeltype when incoming value is an instance …

41c3fa6

…of scalar subclass

TST: Add test for channeltype improvement

d37a975

ivany4 changed the title ~~Account for custom scalar subtypes in incoming values~~ Account for custom scalar types of incoming values Jun 9, 2021

ivany4 changed the title ~~Account for custom scalar types of incoming values~~ FIX: Account for custom scalar types of incoming values Jun 9, 2021

Thrameos reviewed Jun 9, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX: Account for custom scalar types of incoming values #737

FIX: Account for custom scalar types of incoming values #737

ivany4 commented Jun 9, 2021 •

edited

Loading

codecov-commenter commented Jun 9, 2021

Thrameos left a comment

Thrameos Jun 9, 2021

mattgibbs Jun 9, 2021

Thrameos Jun 9, 2021

hhslepicka commented Jun 10, 2021

ivany4 commented Jun 10, 2021

ivany4 commented Jul 5, 2021 •

edited

Loading

Thrameos commented Jul 6, 2021

mattgibbs commented Jul 13, 2021

ivany4 commented Jul 19, 2021 •

edited

Loading

Thrameos commented Jul 19, 2021

FIX: Account for custom scalar types of incoming values #737

Are you sure you want to change the base?

FIX: Account for custom scalar types of incoming values #737

Conversation

ivany4 commented Jun 9, 2021 • edited Loading

codecov-commenter commented Jun 9, 2021

Codecov Report

Thrameos left a comment

Choose a reason for hiding this comment

Thrameos Jun 9, 2021

Choose a reason for hiding this comment

mattgibbs Jun 9, 2021

Choose a reason for hiding this comment

Thrameos Jun 9, 2021

Choose a reason for hiding this comment

hhslepicka commented Jun 10, 2021

ivany4 commented Jun 10, 2021

ivany4 commented Jul 5, 2021 • edited Loading

Thrameos commented Jul 6, 2021

mattgibbs commented Jul 13, 2021

ivany4 commented Jul 19, 2021 • edited Loading

Thrameos commented Jul 19, 2021

ivany4 commented Jun 9, 2021 •

edited

Loading

ivany4 commented Jul 5, 2021 •

edited

Loading

ivany4 commented Jul 19, 2021 •

edited

Loading