Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eager map & filter? #483

Open
bijoythomas opened this issue Mar 18, 2020 · 6 comments
Open

eager map & filter? #483

bijoythomas opened this issue Mar 18, 2020 · 6 comments

Comments

@bijoythomas
Copy link

Hello, I'm new to toolz and am trying out the functions in the curried namespace. The code below

from toolz.curried import *
is_even = lambda n: n % 2 == 0
inc = lambda n: n + 1
compose(
    map(inc), 
    filter(is_even)
)([1,2,3,4])

returns a map object instead of a list (which I was expecting). However,

compose(
    groupby(lambda n: "A" if n < 2 else "B"), 
    map(lambda n: n + 1), 
    filter(lambda n: n %2 == 0)
)([1,2,3,4])

return a dict with list values as expected instead of a dict with sub-iterators (like itertools.groupby)

Is there a reason for keeping the curried map & filter lazy like the native Python3 functions?

@groutr
Copy link
Contributor

groutr commented Mar 21, 2020

toolz.groupby and itertools.groupby are not equivalent functions. 'itertools.groupby creates a new group every time the key function changes value. This effectively requires the input iterator to be sorted by the key function. toolz.groupby makes no such assumption. This is the reason why itertools.groupby is lazy and toolz.groupby is not.

map and filter have always been lazy in toolz. When toolz supported Python 2, map was an alias for itertools.imap and filter was an alias for itertools.ifilter. In Python 3, they are simply their respective builtin functions.

@eriknw
Copy link
Member

eriknw commented Jul 18, 2020

Good questions @bijoythomas and thanks for the quick, informative reply @groutr. I always like to hear experiences of new users. Since the questions have been answered, can we close this issue?

Btw, we have considered having a non-lazy namespace so one could do things like toolz.eager.map(func, data). I'm open to this idea. When teaching, learning, or exploring, it can be helpful to effortlessly see the data instead of a lazy object. One challenge is how to have a curried, eager namespace? Would it be toolz.eager.curried, toolz.curried.eager, both, or something else?

@startakovsky
Copy link

startakovsky commented Jan 21, 2021

Agree. I had these above questions myself. Good to know.

One thing I'd say is that since map and filter's value is not differentiated by this library anymore, then the docs should not show them being imported from the itertoolz library or any library. Seeing from toolz import map created some confusion while reading the documentation.

@ruancomelli
Copy link

@startakovsky there is a difference between the built-in map and toolz.curried.map since the second one is, of course, curried.

@eriknw I would suggest keeping everything lazy and just adding a consumer function that enforces eager evaluation. For instance, to eagerly evaluate a map object, you can just build a list out of it: map(f, it) is eagerly consumed by list(map(f, it)). This is usually what I do if wish to retain the computed values. If the values are not important and can be safely discarded, I usually resort to more_itertools.consume which is a lot faster and doesn't store anything. I think that this is also related to #445 .

This way, there would be no need for an eager namespace. Everything would be lazy by default, and if you want eager evaluation you either build a list (if you wish to keep the values) or consume your iterable.

@mentalisttraceur
Copy link

@ruancomelli note that toolz.last is already basically consume. (The only difference is that it returns the last value, whereas I'd expect consume to return nothing. Internally the current implementation builds on tail and thus stores a deque, but if that O(1) overhead is modest and presents a lower bar of difficulty for being automatically optimized away by the Python implementation compared to the O(n) of list.)

@mentalisttraceur
Copy link

@eriknw I think the answer to curried.eager vs eager.curried is that curried is a higher-level/more-general operation (curry(f) makes sense for any f, even if f is eager.f, but eagerness is much more specific to just iteration functions), so it should go first: toolz.curried.eager.

"Both" might also make sense as a user/developer improvement, but on the other hand by only having one, that's more Pythonic ("there should only be one way to do it"), it kinda teaches the outer-name-scope-should-be-more-general pattern by example, and there's no breaking change in starting with just one and switching to both later if it proves to be a usability problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants