Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drotmg.f: Replace GAM**2 by existing GAMSQ #1000

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

SebWouters
Copy link

No description provided.

langou
langou previously approved these changes Mar 27, 2024
@langou
Copy link
Contributor

langou commented Mar 27, 2024

Hi Sebastian,

I am looking more into this. I only had time for a cursory review at this point.

I am reading in the comments of the routine that

THE VALUES OF GAMSQ AND RGAMSQ SET IN THE DATA STATEMENT MAY BE
INEXACT. THIS IS OK AS THEY ARE ONLY USED FOR TESTING THE SIZE
OF DD1 AND DD2. ALL ACTUAL SCALING OF DATA IS DONE USING GAM.

But I think in "double," so in drotmg, (what you did,) it might be OK to use GAMSQ. (Because GAMSQ is exactly GAM**2.) Whereas in "single," so in srotmg, it might not be OK to use GAMSQ. In the "single" precision case, according to the comment, using GAM twice would be better. (But then why DD1*GAM**2 and why not (DD1*GAM)*GAM.)

Also, for the DD1 = DD1/GAM**2, instead of dividing by GAMSQ, maybe we should multiply by RGAMSQ.

I'll have to look more into this. I just wanted to point to this comment in the subroutine.

(Also there is a typo in the comment on line 153 of xrotmg that we might want to correct "This code path if [sic] here for safety." The "if" should be an "is".)

Julien

@langou langou dismissed their stale review March 27, 2024 14:20

see comment below

@Reference-LAPACK Reference-LAPACK deleted a comment from codecov bot Mar 27, 2024
@christoph-conrads
Copy link
Contributor

Also, for the DD1 = DD1/GAM**2, instead of dividing by GAMSQ, maybe we should multiply by RGAMSQ.

In the double precision code, RGAMSQ would have to be updated with an accurate value. At the moment a single-precision value with 7 decimal digits is used.

@christoph-conrads
Copy link
Contributor

christoph-conrads commented Mar 29, 2024

Also, for the DD1 = DD1/GAM**2, instead of dividing by GAMSQ, maybe we should multiply by RGAMSQ.

Does Fortran have ldexp(base, exp) which computes base · 2^exp? One could replace expressions X * GAM**2 with ldexp(x, 24) and X / GAM**2 with ldexp(x, -24). This would keep the code readable and possibly enable legitimate compiler optimizations.

edit Fortran77 has EXP(x). /edit

edit2 Fortran supports exponentiation in its syntax (try 2.0D0**(-24)). /edit2

@langou
Copy link
Contributor

langou commented Mar 30, 2024

"Also, for the DD1 = DD1/GAM**2, instead of dividing by GAMSQ, maybe we should multiply by RGAMSQ." => In the double precision code, RGAMSQ would have to be updated with an accurate value. At the moment a single-precision value with 7 decimal digits is used.

Yes, of course, you are correct. This is a good point. Thanks. The 5.9604645D-8 is not a good approximation! In base 10, RGAMSQ is 5.9604644775390625D-08, I think that this base 10 representation will be mapped exactly to RGAMSQ (which is 2^(-24)) in the computer in double precision arithmetic. I did not check.

@christoph-conrads
Copy link
Contributor

christoph-conrads commented Apr 1, 2024

xROTMG was introduced in 1979 in ACM Algorithm 539. The linked page contains a freely available tarball with the Fortran77 source code (open Original/539.txt). In my opinion the code should either be kept in its current form to make its origins crystal clear or the code should be simplified as described in the next paragraph.

Fortran supports exponentiation in its syntax. My suggestion is to use this notation and to let the compiler take care of optimizing expressions like x / GAM, x * GAM**2 or x <= GAM**(-2) (i.e., GAMSQ and RGAMSQ are removed). I assume the variables GAMSQ and RGAMSQ were introduced to avoid deficiencies in compiler technologies 45 years ago. Moreover, a PARAMETER statement should be used because a missing variable initialization will cause compiler errors:

REAL, PARAMETER :: GAM = 2.0E0**12
DOUBLE PRECISION, PARAMETER :: GAM = 2.0D0**12

edit fixed the right-hand sides in the Fortran code above. Thanks, Julien. /edit

@christoph-conrads
Copy link
Contributor

christoph-conrads commented Apr 2, 2024

The longer that I look at this function, the longer I think it should be left as is: the change would not significantly improve performance, there are no callers of this function in LAPACK, and in general nobody seems to be using it.

The constant 4096 exists to avoid under- and overflows and the same value is used for single and double precision. Picking a larger value would avoid unnecessary re-scaling operations. Another optimization opportunity is the possibly dead code branch starting in line 153 (there is a comment above the branch saying that the authors do not expect the branch to be taken) which could be removed.

Finally, there exist clones of this implementation, e.g., a C clone created by f2c in BLIS or in golang. The commit for the golang code add DROTMG is 13 years old and reads Drotmg added (not tested!).

@angsch
Copy link
Collaborator

angsch commented Apr 2, 2024

See also #244, where a paper with a clearer implementation is referenced and a bug related to scaling is mentioned

@langou
Copy link
Contributor

langou commented Apr 2, 2024

I looked into this this morning. Here is a brain dump. I now see that @angsch added a comment. Thanks. I am copy-pasting my brain dump. I'll try to work more on this later in a few days. If anyone wants to make progress, please do.

  • The fact that we initialize RGAMSQ which is 2**(-24) to the approximate value of 5.9604645D-8 in double precision bothers me.
  • This seems to be related to the fix in gonum pointed by @angsch. (Thanks!)
  • It could be better to change with
DOUBLE PRECISION, PARAMETER :: GAM = 2.0D0**12
DOUBLE PRECISION, PARAMETER :: GAMSQ = 2.0D0**24
DOUBLE PRECISION, PARAMETER :: RGAMSQ = 2.0D0**(-24)
  • I understand that RGAMSQ and GAMSQ are not used in computation, they are used as threshold but still. I understand that the goal was to have the same code base for SINGLE and DOUBLE, but why not write the exact value ( 5.9604644775390625E-008 ) in both codes and let the compiler figure it out. Oh well.
  • Yes, we might want to replace all the DATA statements by PARAMETER statements
  • Note that there are also some DATA statements in ROTM
  • Note that GAM, GAMSQ, RGAMSQ are declared on line 106 of the current code. Then the DATA statement appears.
  • I think it would be better to replace DD1 = DD1*GAM**2 by DD1 = (DD1*GAM)*GAM, but if GAMSQ
  • RGAMSQ is inexact (as the current code is done), but GAMSQ is a floating-point number (so is exact). At least this is true in SINGLE (32-bit) and DOUBLE (64-bit). This would even be true in 16-bit floating-point arithmetic, GAMSQ should be a floating-point number. Using GAMSQ with an 8-bit floating-point arithmetic code is not going to cut it though!
  • I think that the statement "the values of GAMSQ and RGAMSQ set in the data statement may be inexact" might refer to non-IEEE arithmetics that were around in the seventies.

I'll have to read more the gonum thread.

@langou
Copy link
Contributor

langou commented Apr 2, 2024

Related. I was reading the code and the WHILE loop looked suspicious. I think you can get an infinite loop if you call the function with DD1 is +Infinity for example. See code below. I did not spend much time on it. Just looked suspicious and the code below seems to have an infinite loop.

      program test
      use iso_fortran_env, only: sp => real32, dp => real64
      IMPLICIT NONE
      DOUBLE PRECISION DPARAM(5)
      DOUBLE PRECISION DD1, DD2, DX1, DY1
      real(dp) :: pinf_dp = real(z'7ff0000000000000',dp)

      DD1 = pinf_dp
      DD2 = 1.0D+00
      DX1 = 1.0D+00
      DY1 = 1.0D+00
      CALL DROTMG(DD1,DD2,DX1,DY1,DPARAM)
      print *,DD1
      end program

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants