Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX: Account for custom scalar types of incoming values #737

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

ivany4
Copy link
Contributor

@ivany4 ivany4 commented Jun 9, 2021

We are using JPype, which translates Java objects to Python. Recent versions of this library convert Java int, double and similar to JInt, JDouble, which are custom subclasses of corresponding Python types.

Now, when such value arrives on a channel, it results in broken logic, once its type is recorded with self.channeltype = type(value), namely:

  1. Checks such as self.channeltype == float will evaluate to False, when JDouble is in play
  2. PyQt signals may break, e.g. self.send_value_signal[self.channeltype].emit() will enumerate through all available subscripts, not finding an overload for JDouble, it will pick the last available, i.e. np.ndarray. The resulting exception is very confusing:
TypeError: PyDMWritableWidget.send_value_signal[numpy.ndarray].emit(): argument 1 has unexpected type 'float'

This PR tries to record self.channeltype as base types, where appropriate.

@ivany4 ivany4 changed the title Account for custom scalar subtypes in incoming values Account for custom scalar types of incoming values Jun 9, 2021
@ivany4 ivany4 changed the title Account for custom scalar types of incoming values FIX: Account for custom scalar types of incoming values Jun 9, 2021
@codecov-commenter
Copy link

Codecov Report

Merging #737 (d37a975) into master (00974e9) will decrease coverage by 1.77%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #737      +/-   ##
==========================================
- Coverage   58.65%   56.87%   -1.78%     
==========================================
  Files          88       88              
  Lines        9984     9550     -434     
==========================================
- Hits         5856     5432     -424     
+ Misses       4128     4118      -10     
Impacted Files Coverage Δ
pydm/widgets/base.py 89.02% <100.00%> (-1.52%) ⬇️
pydm/widgets/tab_bar.py 33.54% <0.00%> (-8.77%) ⬇️
pydm/widgets/image.py 26.53% <0.00%> (-8.06%) ⬇️
pydm/widgets/scale.py 19.42% <0.00%> (-5.81%) ⬇️
pydm/widgets/waveformtable.py 20.00% <0.00%> (-5.00%) ⬇️
pydm/widgets/embedded_display.py 21.56% <0.00%> (-4.82%) ⬇️
...ydm/connection_inspector/connection_table_model.py 25.00% <0.00%> (-3.82%) ⬇️
pydm/widgets/waveformplot.py 22.79% <0.00%> (-3.81%) ⬇️
pydm/widgets/scatterplot.py 25.00% <0.00%> (-3.58%) ⬇️
pydm/widgets/enum_button.py 71.14% <0.00%> (-3.42%) ⬇️
... and 37 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 00974e9...d37a975. Read the comment docs.

Copy link

@Thrameos Thrameos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestions.

Comment on lines +683 to +688
for base_type in [int, float, str, np.ndarray]:
if self.channeltype != base_type and issubclass(self.channeltype, base_type):
# Leave bool and int separate, we're concerned about user-defined subclasses, not native Python ones
if self.channeltype != bool:
self.channeltype = base_type
break
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how often code gets called, but wouldn't it be more efficient to test for bool once and then in the loop it can just test if something is a subtype.

     if self.channeltype !=bool:
         for base_type in [int, float, str, np.ndarray]:
              if issubclass(self.channeltype, base_type):
                 self.channeltype = base_type
              break

Notice the check for self.channeltype!=base_type is unnecessary. If it is int then it would pass the subclass check and set channeltype to int which it already was and then break.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @Thrameos's sentiment here. value_changed gets called a lot, maybe more than any other method in all of PyDM, so it pays to make this check as lightweight as possible.

Maybe something like this?

if self.channeltype not in (int, float, bool, str, np.ndarray):
    for base_type in (int, float, str, np.ndarray):
        if issubclass(self.channeltype, base_type): 
            self.channeltype = base_type
        break

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a critical section then skipping the for loop and the implicit loop from "not in" entirely would be the best policy. Instead use a global table cache which would be O(1).

         value_type = type(value)
         self.channeltype = TYPE_TABLE.get(value_type, None)
         if self.channeltype is None:
             # assume we will use the type as is
             self.channeltype = value_type

             # check for user defined subtypes
             for base_type in (int, float, str, np.ndarray):
                if issubclass(self.channeltype, base_type): 
                   self.channeltype = base_type
                   break

             # Cache the result so all future look ups are O(1)
             TABLE_TYPE[value_type] = self.channeltype

The table would be prepopulated with TYPE_TABLE[bool] = bool

Each time it is called we have one hash map look up and a comparison. It it is not found then the search occurs to find the best type. That result is cached so it never has to evaluate the loop for that type again.

@hhslepicka
Copy link
Contributor

Wouldn't it be best to address this conversion over at the data plugin and let PyDM deal with the basic types?
You could handle that into your plugin when JPype give you the value before emitting and when you receive the write you can cast it back to the proper type via a reverse lookup dictionary.

@ivany4
Copy link
Contributor Author

ivany4 commented Jun 10, 2021

Wouldn't it be best to address this conversion over at the data plugin and let PyDM deal with the basic types?
You could handle that into your plugin when JPype give you the value before emitting and when you receive the write you can cast it back to the proper type via a reverse lookup dictionary.

That depends on the perspective. If there were multiple plugins based on JPype, each of them would need to handle that separately.

@ivany4
Copy link
Contributor Author

ivany4 commented Jul 5, 2021

@hhslepicka , what would be an acceptable performance hit in this method?
Compared to original PyDM code, I'm observing 2-8% slowdown in @Thrameos 's hashmap solution, which is by far the most performant.
This is perf on my machine with various values for 100000 value_changed calls each (averaged over 5 attempts):

Value 123 123.47 1.00E+02 0x1FF 0b100 TRUE FALSE np.array([123, 456]) "test" FloatSubclass(0.5) StrSubclass("test")
Orig Avg (ms) 1,282,931.51 1,322,600.19 1,288,669.65 1,350,867.69 1,318,794.36 1,309,725.72 1,323,938.08 4,508,809.83 954,742.51 1,371,229.86 955,796.24
HashMap Avg (ms) 1,351,577.09 1,361,390.72 1,358,056.51 1,357,154.96 1,350,078.21 1,373,699.45 1,375,790.97 4,567,240.99 1,002,908.61 1,420,907.02 1,041,290.06
Increase (%) 5.351% 2.933% 5.384% 0.465% 2.372% 4.885% 3.917% 1.296% 5.045% 3.623% 8.945%

@Thrameos
Copy link

Thrameos commented Jul 6, 2021

@ivany4 Perhaps you can make up for the difference by removing the logic

            try:
                if self.channeltype == unicode:
                    # For Python 2.7, set the the channel type to str instead of unicode
                    self.channeltype = str
            except NameError:
                pass

Just add unicode to the cache mechanism as a prepopulated.

@mattgibbs
Copy link
Collaborator

Hi @ivany4, sorry for the silence. @hhslepicka left SLAC, and I have only been able to spend tiny amounts of time on PyDM lately.

I am still against including this in the base widget, I agree with @hhslepicka - this seems like it should happen at the plugin layer. I would guess you could have a common base class for a JPype-based plugin, and implement the conversion to a base type there.

Your most recent implementation seems reasonable from a performance standpoint, but I'm objecting for architectural reasons. If there's a strong reason why this can't be implemented in the data plugin, I'd be willing to merge.

@ivany4
Copy link
Contributor Author

ivany4 commented Jul 19, 2021

Hi @ivany4, sorry for the silence. @hhslepicka left SLAC, and I have only been able to spend tiny amounts of time on PyDM lately.

I am still against including this in the base widget, I agree with @hhslepicka - this seems like it should happen at the plugin layer. I would guess you could have a common base class for a JPype-based plugin, and implement the conversion to a base type there.

Your most recent implementation seems reasonable from a performance standpoint, but I'm objecting for architectural reasons. If there's a strong reason why this can't be implemented in the data plugin, I'd be willing to merge.

Thanks for the feedback, @mattgibbs . Sad to see @hhslepicka leave :/
I understand your concern, and I would be happy to find another elegant solution (throw more ideas, if you have them). I was trying to avoid implementing conversion in the plugin for few reasons:

  1. It means that incoming data will be copied into a native Python type, which sounds like a big performance hit to me (not measured though), because every incoming piece of data will be copied. My initial approach minimized the impact by playing with channeltype, which is used less frequently, mainly for writing (but unfortunately is reset on every incoming value). However I think using hash tables before updating channeltype should still be more performant than copying incoming data every single time.
  2. PyDM implementation implies that native Python types cannot be subclassed, which may impact more than just JPype derivatives, if some other system takes the same subclassing approach. If you think this is perfectly valid solution, it's worth documenting that.

I'm currently researching the possibility to monkey-patch __eq__ for JPype types so that e.g. type(self.channeltype) == bool could succeed for JBoolean without the need for additional logic on PyDM level. Would be curious to hear @Thrameos 's opinion how bad of an idea that is. It is still insufficient though for cases like self.send_value_signal[self.channeltype].emit(num_value).

@Thrameos
Copy link

Monkey patching the equals for a type will undoubtedly create unexpected side effects in users code. I can't say what is would break only that I would certainly not add such a hack to JPype release nor recommend hacking to add a bad contract just to fix another libraries code. JPype is not doing anything special here as other toolkits such as numpy as doing the same. The existing code is special cases here for "np.array" and "unicode" already but I could see the exact same problem happen of "np.float32" or other external types. So this is not really a JPype issue by rather an issue with how pydm is using the type information. Using the "type" as a look up for an action as is being done here is a recipe for disaster as it violates the concept of duck typing. One should be able to define a type that meets the specification (or in this case merely derives from a base type) without breaking code.

I would recommend that the hash solution with a plugin function for constructing the channel type would be the best solution. The user has the option of controlling the channel type by installing a function and the hash solution would be faster than the existing logic as it already has several special cases. Placing a user defined plug in so that they can add a type conversion with known rules is better than requiring a monkey patch.

lets consider the original code

   def value_changed(self, new_val):
        """
        Callback invoked when the Channel value is changed.
        Parameters
        ----------
        new_val : str, int, float, bool or np.ndarray
            The new value from the channel. The type depends on the channel.
        """
        self.value = new_val
        self.channeltype = type(self.value)
        if self.channeltype == np.ndarray:
            self.subtype = self.value.dtype.type
        else:
            try:
                if self.channeltype == unicode:
                    # For Python 2.7, set the the channel type to str instead of unicode
                    self.channeltype = str
            except NameError:
                pass

In this the type of the channel is doing 2 special cases one of which will generate an exception which is clearly not a very great performance. Also any derived types for np.ndarray will also miss the if clause. So there is already problems with the existing code.

The alternative would be.

CHANNEL_CACHE = {}
CHANNEL_HANDLERS = ()

# Make unicode go to str if applicable
try:
     CHANNEL_CACHE[unicode] = str
except NameError:
     pass

# Define a hook for the user to handle derived types gracefully.
def add_channel_handler(method):
    HANDLERS.append(method)

   ...

   def value_changed(self, new_val):
        """
        Callback invoked when the Channel value is changed.
        Parameters
        ----------
        new_val : str, int, float, bool or np.ndarray
            The new value from the channel. The type depends on the channel.
        """
        self.value = new_val
        requested = type(self.value)
        self.channeltype = CHANNEL_CACHE.get(requested, None)

        # If this is a new type we need to search for it in the handlers
        if self.channeltype is None:
              self.channeltype = requested

              # first handler that knows the type will be used
              for handler in CHANNEL_HANDLERS:
                   tp = handler(requested)
                   if tp is not None:
                        self.channeltype = tp
                        break

              # cache so we never hit the lookup for this type again
              CHANNEL_CACHE[requested] = self.channeltype

        # Equals may be faster than isinstance though this will break for derived types.
        if isinstance(self.channeltype, np.ndarray):
            self.subtype = self.value.dtype.type

Now the user can install whatever hooks they want to ensure that derived types chose the appropriate base. This doesn't force any solution and is not particular to JPype nor is magical for ints or floats. The user has to install a handler to get the alternative behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants