Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

experimental for nested #79

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

Conversation

martindurant
Copy link
Member

@martindurant martindurant commented Sep 26, 2024

@jpivarski : the query function here works on the play data generated by nested-pandas in 10x the speed compared to the typical approach we discussed, even with the UnmaskedArray PR.

Generate the play data:

from nested_pandas.datasets import generate_data
import awkward as ak
import akimbo.pandas
import akimbo.exp  # this PR, experimental

nf = generate_data(1000, 10000)  # 10 rows, 100 nested rows per row
arr = nf.ak.array
arr2 = akimbo.exp.rec_list_swap(arr, "nested")  # to list-of-records

Times:

%timeit nf_g = nf.query("nested.t > 17.0");
83.8 ms ± 351 µs
%timeit arr["nested"][arr["nested", "t"] > 17]
183 ms ± 1.56 ms
%timeit akimbo.exp.query(arr2, "nested.t > 17")
23.2 ms ± 568 µs

Note that here we make a masked array, so it has exactly the same structure as the original (swapped) array, but where the filter fails, you get None. Else you would need ak.count, which takes about 50ms.

It feels like it should be possible to do this really efficiently with ArrayBuilder and numba? You would need to have a way to turn the "query" into something you can execute in the loop.

@jpivarski
Copy link
Collaborator

If all of the functors are structured, like map, filter, reduce, then you can do better than ArrayBuilder by making the Numba-compiled function generate an index and then apply that index to the array as a slice.

You could also add an axis argument to this and have it apply at some depth using ak.transform (having all structure above where it's applied stay the same—but the transformation has to be length-preserving). That would solve a whole class of problems in which someone wants to take apart a structure, change something, and then rebuild everything above the changed part the same way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants