Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

substitute not working with SymPy #65

Open
Upabjojr opened this issue Dec 17, 2020 · 11 comments
Open

substitute not working with SymPy #65

Upabjojr opened this issue Dec 17, 2020 · 11 comments

Comments

@Upabjojr
Copy link
Contributor

Calling sorted( ) in

new_operands.extend(sorted(result))

fails with SymPy, as SymPy objects are not comparable with operators (<, <=, >, >=). Operators are overloaded in SymPy to create instances of inequality objects.

SymPy has a function called default_sort_key meant to deal with this problem:

from sympy import symbols
x, y, z = symbols("x y z")
from sympy.core.compatibility import default_sort_key
sorted([z, x, y], key=default_sort_key)

The question is, is it possible to modify MatchPy to specify a custom sorting key so that SymPy can specify the way its objects should be sorted?

Or is it an issue with Multiset?

@hbarthels
Copy link
Contributor

As a convention, operands in commutative functions are sorted in MatchPy. If I'm not mistaken, in that line we know that the operands come from a commutative functions because they are stored in a multiset. As a result, it's necessary to sort operands there.
For sure it is possible to modify MatchPy such that one can use a custom sorting key for expressions. However, I'm afraid that this could break things. Since there is this convention that operand in commutative function are sorted, it's possible that there are places where the code relies on operands being sorted according to the total ordering that is defined in MatchPy. With a custom sorting key, they could end being sorted in the "wrong" order.
Even if we are able to fix this problem, I suspect that you may run into a bunch of other problems when using SymPy expression in MatchPy. I mean, I don't know what you have tried already and what you did so far to make things compatible, but MatchPy is by no means built to be compatible with SymPy. Whatever else worked so far worked by chance. Have you considered translating SymPy expressions to MatchPy expressions? I believe to remember that someone did this during a Google Summer of Code.

@Upabjojr
Copy link
Contributor Author

Upabjojr commented Dec 18, 2020

In order to reproduce this bug, using the branch in sympy/sympy#20612

I have the following:

In [1]: from sympy.utilities.matchpy_connector import WildDot, WildPlus, WildStar

In [2]: from matchpy import substitute

In [3]: from sympy import *

In [4]: x, y, z = symbols("x y z")

In [5]: w = WildPlus("w")

In [6]: from matchpy import ManyToOneMatcher

In [7]: matcher = ManyToOneMatcher()

In [8]: from matchpy import Pattern

In [9]: matcher.add(Pattern(x + w))

In [10]: result = matcher.match(x + y + z)

In [11]: result
Out[11]: <matchpy.matching.many_to_one._MatchIter at 0x7f3022a3da10>

In [12]: result = next(iter(result))

In [13]: result
Out[13]: (Pattern(x + w), {'w': Multiset({y: 1, z: 1})})

In [14]: substitute(result[0], result[1])
---------------------------------------------------------------------------
TypeError: cannot determine truth value of Relational

ManyToOneMatcher correctly returns a Multiset. Now the problem just lies in substitute as you cannot call sorted( ) on a list of SymPy objects without specifying a sorting key.

Now, if I redefine the MatchPy substitute function to:

from matchpy import *
from matchpy.functions import *
from typing import *
from multiset import *
from sympy.core.compatibility import default_sort_key

def substitute(expression: Union[Expression, Pattern], substitution: Substitution) -> Replacement:
    if isinstance(expression, Pattern):
        expression = expression.expression
    return _substitute(expression, substitution)[0]


def _substitute(expression: Expression, substitution: Substitution) -> Tuple[Replacement, bool]:
    if getattr(expression, 'variable_name', False) and expression.variable_name in substitution:
        return substitution[expression.variable_name], True
    elif isinstance(expression, Operation):
        any_replaced = False
        new_operands = []
        for operand in op_iter(expression):
            result, replaced = _substitute(operand, substitution)
            if replaced:
                any_replaced = True
            if isinstance(result, (list, tuple)):
                new_operands.extend(result)
            elif isinstance(result, Multiset):
                new_operands.extend(sorted(result, key=default_sort_key))
            else:
                new_operands.append(result)
        if any_replaced:
            return create_operation_expression(expression, new_operands), True

    return expression, False

The code now works correctly:

In [24]: substitute(result[0], result[1])
Out[24]: x + y + z

So it looks like the usage of sorted( ) in the function _substitute( ) is the only cause of problems here (everything inside ManyToOneMatcher works fine).

@hbarthels is sorted( ) really necessary in _substitute( ) ? Can this function be modified in order to work with SymPy?

@hbarthels
Copy link
Contributor

In order to reproduce this bug, using the branch in sympy/sympy#20612

Ah, now it makes a lot more sense that all the other stuff works. 😄

is sorted( ) really necessary in _substitute( ) ?

To be honest, I don't know the code well enough to know for sure if it is necessary there. I guess it's not necessary if the expression that is returned by substitute is not used again as input to MatchPy. However, in a function like replace_all, the output of substitute will be used again. As a first step to investigate this, you could remove the sorted and run the tests.

Can this function be modified in order to work with SymPy?

I have the following idea to solve this problem: You add a function to MatchPy that allows to pass a custom sort key, which is stored in some global variable. Then, you implement something like a sorted_expressions function that is a wrapper around sorted. If a custom sort key was provided, it is used in sorted; otherwise, it calls sorted without a sort key. The sorted_expressions function then replaces all calls to sorted and sort that sort expressions. I don't know where else expressions are sorted, so you have to search for it. However, as I said above, a custom sort key could potentially break things, so I recommend that you test this thoroughly with SymPy expressions.

By the way, is it possible that in the code you posted, you accidentally used to original version of _substitute? I don't see any difference.

@Upabjojr
Copy link
Contributor Author

As a first step to investigate this, you could remove the sorted and run the tests.

Here it is: #66

I have the following idea to solve this problem: You add a function to MatchPy that allows to pass a custom sort key, which is stored in some global variable. Then, you implement something like a sorted_expressions function that is a wrapper around sorted

Removing sorted( ) seems to be enough.

However, as I said above, a custom sort key could potentially break things, so I recommend that you test this thoroughly with SymPy expressions.

Is it possible to add a unit test to MatchPy with a few checks for SymPy's compatibility? It would be great to keep SymPy and MatchPy compatible with each other in the future (there's been a lot of effort to make them compatible, so why not enforcing a check to make sure it won't break in the future?).

By the way, is it possible that in the code you posted, you accidentally used to original version of _substitute? I don't see any difference.

I copy-pasted the wrong code. Now it's fixed.

@wheerd
Copy link
Collaborator

wheerd commented Dec 20, 2020

You could change substitute() to take an optional argument for the sort key which it passes down to the sorted() function. That way it does not break existing uses but enables you to specify the sort key. If the sorting is used in more places, it could make sense to make it a global option.

@hbarthels
Copy link
Contributor

I don't think that removing or changing sorted() in substitute() is sufficient. I remember now that we sort operands in commutative functions to ensure that expressions that are equivalent modulo commutativity can easily be identified as equivalent by a syntactic comparison. If sorted() is removed in substitute(), or if the custom sort key is used in substitute() but nowhere else, it could happen that equality tests and hashing don't work the way they are supposed to for the output of substitute().

Is it possible to add a unit test to MatchPy with a few checks for SymPy's compatibility? It would be great to keep SymPy and MatchPy compatible with each other in the future (there's been a lot of effort to make them compatible, so why not enforcing a check to make sure it won't break in the future?).

Sure, that makes sense.

@Upabjojr
Copy link
Contributor Author

I don't think that removing or changing sorted() in substitute() is sufficient. I remember now that we sort operands in commutative functions to ensure that expressions that are equivalent modulo commutativity can easily be identified as equivalent by a syntactic comparison.

The other utilities of MatchPy appear to work well with SymPy... substitute( ) is the only one that causes sorting problems to me.

@hbarthels
Copy link
Contributor

The other utilities of MatchPy appear to work well with SymPy... substitute( ) is the only one that causes sorting problems to me.

I searched the code for calls to sorting functions. The call that makes sure that operands are sorted is part of the Expression class, so that shouldn't be an issue with SymPy expression. There are other calls to sorted(), for example here:

sorted_vars = tuple(sorted(pattern_vars.values(), key=lambda v: (v[0][0] or '', v[0][1], v[0][2], v[1])))

However, I can't tell if the comparison depends on comparing expressions just by looking at it. So maybe changing substitute( ) is indeed sufficient, but I wouldn't rely on it without a lot of testing.

Does SymPy automatically sort operands in commutative functions?

@Upabjojr
Copy link
Contributor Author

Does SymPy automatically sort operands in commutative functions?

Add and Mul have their own constructors that do indeed end up sorting the expressions. The transformations undergone in the constructors of commutative operators in SymPy are more complicated than just sorting and are customly written for the main operators.

@hbarthels
Copy link
Contributor

Is it important to you that we create a new release of MatchPy as soon as possible? If not, I would prefer to wait a bit in case more problems show up. I guess it would also make sense to add tests for the SymPy compatibility before the next release. Are you working on those tests?

@Upabjojr
Copy link
Contributor Author

I guess it would also make sense to add tests for the SymPy compatibility before the next release. Are you working on those tests?

We have these tests in SymPy. I have merged some extensions to sympy.utilities.matchpy_connector 15 days ago. We may need to wait for a new release of SymPy before we can add tests to MatchPy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants