Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use numba code for supported CAReduce cases #931

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ricardoV94
Copy link
Member

Description

Apparently both out implementation and numba's can be pretty dumb for reductions. This PR tries to default to the Numba case when possible. I am not sure this is actually better, see the comparisons below, some cases improve by a lot but others get worse. Seems to depend on which axis are actually being reduced:

Summary:

  • careduce_benchmark[Max-None] is 4x faster
  • careduce_benchmark[Sum-0] is 100x faster
  • careduce_benchmark[Sum-2] is 6.5x SLOWER
  • logsumexp_benchmark[0-size2] is 1.3x faster
  • logsumexp_benchmark[1-size2] is 1.6x SLOWER
----------------------------------------------------- benchmark: 40 tests -----------------------------------------------------
Name (time in ms)                                 Min                   Max                  Mean              StdDev          
-------------------------------------------------------------------------------------------------------------------------------
careduce_benchmark[Any-0] (before)            74.6352               86.0826               78.2833              3.1784          
careduce_benchmark[Any-0] (after)             72.2949               88.1325               74.8778              3.9049          
careduce_benchmark[Any-2] (before)            19.8382               22.8631               20.4905              0.8476          
careduce_benchmark[Any-2] (after)             19.8045               40.1451               22.8281              3.5232          
careduce_benchmark[Any-None] (before)         47.1671               49.2407               48.1692              0.7547          
careduce_benchmark[Any-None] (after)          47.0192               64.6291               51.9837              5.0230          
careduce_benchmark[Any-axis2] (before)        45.6168               49.6917               46.6328              1.2423          
careduce_benchmark[Any-axis2] (after)         45.3280               54.8637               48.6796              2.1270          
careduce_benchmark[Max-0] (before)           823.1772            1,072.3205              918.1883            121.3394          
careduce_benchmark[Max-0] (after)            895.3045            1,069.0787              950.5921             72.8457          
careduce_benchmark[Max-2] (before)            18.3019               28.5969               20.2784              2.2818          
careduce_benchmark[Max-2] (after)             17.7778               19.0539               18.2724              0.4013          
careduce_benchmark[Max-None] (before)         18.1158               19.4919               18.6922              0.4872          
careduce_benchmark[Max-None] (after)           5.3857                7.3592                5.7938              0.4722          
careduce_benchmark[Max-axis2] (before)        18.1986               20.8342               18.8722              0.6357          
careduce_benchmark[Max-axis2] (after)         17.9239               20.6664               18.9036              0.4819          
careduce_benchmark[Sum-0] (before)           939.0658            1,121.5008            1,002.5721             77.1697          
careduce_benchmark[Sum-0] (after)              8.7608               11.6043                9.8752              0.4344          
careduce_benchmark[Sum-2] (before)            14.9317               17.0656               15.6297              0.5323          
careduce_benchmark[Sum-2] (after)             92.5869              110.1285               96.9401              5.1967          
careduce_benchmark[Sum-None] (before)          6.0364                8.7782                6.4707              0.5112          
careduce_benchmark[Sum-None] (after)           5.2403                7.5870                5.9621              0.5091          
careduce_benchmark[Sum-axis2] (before)        14.8038               19.7759               15.8386              1.1597          
careduce_benchmark[Sum-axis2] (after)         14.7872               56.1588               19.9778              7.8031          
elemwise_speed (before)                        0.6115                1.9634                0.8762              0.1988          
elemwise_speed (after)                         0.5949                0.9883                0.6228              0.0600          
fused_elemwise_benchmark (before)              0.0657                0.2225                0.0731              0.0109          
fused_elemwise_benchmark (after)               0.0567                0.3472                0.0875              0.0348          
logsumexp_benchmark[0-size0] (before)          0.0143                0.0665                0.0156              0.0019          
logsumexp_benchmark[0-size0] (after)           0.0141                0.0460                0.0153              0.0018          
logsumexp_benchmark[0-size1] (before)         10.0825               11.5155               10.3202              0.2845          
logsumexp_benchmark[0-size1] (after)           9.5071               12.6158                9.8153              0.5590          
logsumexp_benchmark[0-size2] (before)      2,644.3215            3,462.0411            3,016.7679            296.5158          
logsumexp_benchmark[0-size2] (after)       1,899.5727            3,078.1175            2,229.7692            480.8884          
logsumexp_benchmark[1-size0] (before)          0.0146                0.0863                0.0172              0.0038          
logsumexp_benchmark[1-size0] (after)           0.0145                0.0567                0.0154              0.0015          
logsumexp_benchmark[1-size1] (before)          9.6068               13.1668                9.7613              0.3543          
logsumexp_benchmark[1-size1] (after)           9.5730               21.5240               10.2803              1.7398          
logsumexp_benchmark[1-size2] (before)      1,130.4573            1,481.2448            1,277.5037            164.1396          
logsumexp_benchmark[1-size2] (after)       1,848.0962            2,219.1350            1,986.3702            145.0611          
-------------------------------------------------------------------------------------------------------------------------------

Related Issue

  • Closes #
  • Related to #

Checklist

Type of change

  • New feature / enhancement
  • Bug fix
  • Documentation
  • Maintenance
  • Other (please specify):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant