Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement math.factorial(n) to calculate very large factorials with speed #2558

Closed
wants to merge 5 commits into from

Conversation

kmr-srbh
Copy link
Contributor

@kmr-srbh kmr-srbh commented Feb 27, 2024

Overview

  • The current implementation of math.factorial() is naive.
  • Returns 0 for large factorials due to integer overflow.

Screenshot from 2024-02-27 23-12-57

Solution

The industry standard for calculating factorial of a large number is complicated. It deals with Prime Number generation and Swing Numbers. These methods require BigInt for storing the large calculated value.

One fast method for going about calculating the calculation is to use an array to store the digits of the number as strings, in reverse, and do digit by digit multiplication. This method is used for implementing the algorithm.

I support the implementation of the Swing Number algorithms if one can understand and implement it correctly. The funny thing is that a Python implementation is available online.

Improvement

  • Works for large factorials
    Screenshot from 2024-02-27 23-17-56
  • Works for very large factorials
    Screenshot from 2024-02-27 23-19-13
  • Accurate
    Screenshot from 2024-02-27 23-22-05

Integration tests have not been added due to the prerequisite below.

Prerequisite

  1. While the function is complete, it depends on String to int conversion in ASR #2554 for getting merged. Currently it just prints the output to stdout and returns 0.
  2. As it currently stands, the implementation for handling values larger than 2^63 - 1 is handled by the BigInt module we have. It is broken and returns pointers to the number instead. A vector<char> is required for storing and working with very large values.

Note: The long assignment of digits is because LPython currently does not support list assignment like: my_list[0:4] = [1, 2, 3]. The assignments can be improved through the introduction of the above list assignment.

@kmr-srbh
Copy link
Contributor Author

I think the errors are okay for now because the factorial function returns only 0. This will be addressed.

for idx in range(f_size - 1, -1, -1):
result += str(f[idx])
print(result)
return i64(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do understand the approach, here the final result is printed out using concatenation.
I have not looked into the pre-requisites but even after a conversion from string to int, how will you return an integer here? The highest supported annotation we have is i64, which has a maximum value of 2^63 - 1.
This is not large enough to store anything over 20!, which means this will still encounter an overflow and return a garbage value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried something similar in another project, let me know if there are any other approaches. Thanks :)

Copy link
Contributor Author

@kmr-srbh kmr-srbh Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@faze-geek I understand your point. Internally in LPython, the method used for handling values larger than 2 ^ 63 - 1 is handled by the BigInt module we have. The part of the module which does the handling for large numbers is broken and returns a pointer to the value instead. I had worked on a related issue #1356 and proposed implementing the vector<char> method for handling large values. Only then will we be able to handle values larger than 2 ^ 63 - 1.

I had forgotten to mention this prerequisite. Thanks for reminding me. I am adding it above.

Copy link
Contributor Author

@kmr-srbh kmr-srbh Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried something similar in another project, let me know if there are any other approaches. Thanks :)

Oh yes there are! As I had mentioned above, the industry standard for scientific computing applications is to use the PrimeSwing algorithm. The Python implementation is available online, but I do not want to type something I do not understand. 😄

The Python math.factorial(n) uses this algorithm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Will surely explore this once the BigInt module gets working.

@kmr-srbh
Copy link
Contributor Author

@certik On further thought, I realized that to handle values large as shown above, we need to introduce a new data type BigInt where we store the number as digits in base 2 ^ 32 in a vector. The vector<string> method is not that fast. What is your take on this? If not going through BigInt, what else can we do? Please guide me.

If yes, please guide me on how can one go about creating a new data type?

@kmr-srbh kmr-srbh changed the title Implement math.factorial(n) to calculate very large factorials with speed and accuracy. Implement math.factorial(n) to calculate very large factorials with speed Mar 4, 2024
@kmr-srbh
Copy link
Contributor Author

kmr-srbh commented May 1, 2024

Closing this as not planned for now.

@kmr-srbh kmr-srbh closed this May 1, 2024
@kmr-srbh kmr-srbh deleted the modules branch May 1, 2024 05:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants