Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong signature for spllt_solve() in drivers/spllt_omp_bench #1

Open
learning-chip opened this issue Sep 27, 2022 · 0 comments
Open

Comments

@learning-chip
Copy link

Hi @cayrols @flipflapflop , thanks for releasing this code. I saw the paper Parallelization of the solve phase in a task-based Cholesky solver using a sequential task flow model and want to reproduce its results. However there seems to be errors with driver routines. Below are my attempts and reproducible steps.

Problem description

Compiling drivers/spllt_omp_bench.F90 gives the following error:

[ 96%] Building Fortran object CMakeFiles/spllt_omp_bench.dir/drivers/spllt_omp_bench.F90.o
/opt/SpLLT/drivers/spllt_omp_bench.F90:296:39:

  296 |     call spllt_compute_solve_dep(fkeep)
      |                                       1
Error: Missing actual argument for argument 'stat' at (1)
/opt/SpLLT/drivers/spllt_omp_bench.F90:333:55:

  333 |         workspace=workspace, task_manager=task_manager)
      |                                                       1
Error: There is no specific subroutine for the generic 'spllt_solve' at (1)
/opt/SpLLT/drivers/spllt_omp_bench.F90:355:55:

  355 |         workspace=workspace, task_manager=task_manager)
      |                                                       1
Error: There is no specific subroutine for the generic 'spllt_solve' at (1)
make[3]: *** [CMakeFiles/spllt_omp_bench.dir/build.make:63: CMakeFiles/spllt_omp_bench.dir/drivers/spllt_omp_bench.F90.o] Error 1
make[2]: *** [CMakeFiles/Makefile2:102: CMakeFiles/spllt_omp_bench.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:109: CMakeFiles/spllt_omp_bench.dir/rule] Error 2
make: *** [Makefile:118: spllt_omp_bench] Error 2

Attempted fix

The above error is caused by the wrong calling signatures for spllt_compute_solve_dep() and spllt_solve(). The first error is easily fixed by changing call spllt_compute_solve_dep(fkeep) to call spllt_compute_solve_dep(fkeep, stat = st).

The second error is caused by such invalid subroutine calls:

      call spllt_solve(fkeep, options, order, nrhs, sol_computed, info, job=1, &
        workspace=workspace, task_manager=task_manager)
...
      call spllt_solve(fkeep, options, order, nrhs, sol_computed, info, job=2, &
        workspace=workspace, task_manager=task_manager)

spllt_solve() is defined in src/spllt_solve_mod.F90 as:

   interface spllt_solve
      module procedure spllt_solve_one_double
      module procedure spllt_solve_mult_double
      module procedure spllt_solve_mult_double_worker
   end interface
...
subroutine spllt_solve_one_double(fkeep, options, x, job, info)
...
subroutine spllt_solve_mult_double(fkeep, options, nrhs, x, job, info)
...
subroutine spllt_solve_mult_double_worker(fkeep, options, nrhs, x, &
    job, task_manager, info)

Only spllt_solve_mult_double_worker() takes task_manager argument, while none of them takes workspace argument. Also, they don't take order argument (matrix permutation) as called in spllt_omp_bench.F90.

I can correctly compile another script test/test_solve_phasis.F90, so its signature should be correct:

call spllt_solve(fkeep, options, nrhs, sol_computed, &
1, task_manager, info)
call spllt_solve(fkeep, options, nrhs, sol_computed, &
2, task_manager, info)

Thus I change the problematic calls in drivers/spllt_omp_bench.F90 to:

call spllt_solve_mult_double_worker(fkeep, options, nrhs, sol_computed, 1, task_manager, info)
...
call spllt_solve_mult_double_worker(fkeep, options, nrhs, sol_computed, 2, task_manager, info)

Error after fix

Now spllt_omp_bench compiles successfully, but leads to memory error at run-time:

Matrix file                  = matrix.rb
Matrix format                = csc
Number of CPUs               =    1
Block size                   =   16
Supernode amalgamation nemin =   32
Reading...
ok
 [analysis][prune_tree] nth:            1
[>] [spllt_stf_factorize]   setup and activate nodes time:  7.000E-03 s
[>] [spllt_stf_factorize] task insert time:  7.000E-03 s
Allocation of a workspace of size   4.13E+05
 #Subtree :          249
At line 961 of file /opt/SpLLT/src/spllt_solve_dep_mod.F90
Fortran runtime error: Index '1' of dimension 1 of array 'fkeep%sbc' above upper bound of 0

Error termination. Backtrace:
#0  0x7feaafed1d21 in ???
#1  0x7feaafed2869 in ???
#2  0x7feaafed2ee6 in ???
#3  0x55a53008b148 in __spllt_solve_dep_mod_MOD_fwd_update_dependency
	at /opt/SpLLT/src/spllt_solve_dep_mod.F90:961
#4  0x55a53008bd27 in __spllt_solve_dep_mod_MOD_spllt_compute_blk_solve_dep
	at /opt/SpLLT/src/spllt_solve_dep_mod.F90:248
#5  0x55a53008dc70 in __spllt_solve_dep_mod_MOD_spllt_compute_solve_dep
	at /opt/SpLLT/src/spllt_solve_dep_mod.F90:271
#6  0x55a530071020 in MAIN__._omp_fn.1
	at /opt/SpLLT/drivers/spllt_omp_bench.F90:409
#7  0x7feaafd3878d in ???
#8  0x7feaafa6c608 in ???
#9  0x7feaafc30132 in ???
#10  0xffffffffffffffff in ???

Reproducible Dockerfile

To ease reproducibility, here's a Dockerfile to generate the compile error I got:

FROM ubuntu:20.04

RUN apt-get update \
    && DEBIAN_FRONTEND=noninteractive apt-get install -y \
    git wget vim \
    gcc g++ gfortran \
    libblas-dev liblapack-dev \
    libnuma-dev \
    libhwloc-dev \
    libmetis-dev \
    libudev-dev \
    make cmake \
    autoconf pkgconf \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /opt

# Build Spral
RUN git clone https://github.com/ralna/spral.git \
    && cd spral \
    && ./autogen.sh \
    && CC=gcc CXX=g++ FC=gfortran ./configure \
        --prefix=/opt/spral_install \
        --disable-openmp --disable-gpu \
        --with-metis="-L/usr/lib/x86_64-linux-gnu -lmetis" \
    && make \
    && make install \
    && cp *.mod /opt/spral_install/include/

# Build SpLLT
RUN git clone https://github.com/NLAFET/SpLLT \
    && cd SpLLT \
    && mkdir -p build/build_omp \
    && cd build/build_omp \
    && mkdir log \
    && CC=gcc CXX=g++ FC=gfortran cmake \
        -DRUNTIME=OMP \
        -DSPRAL_LIB=/opt/spral_install/lib \
        -DSPRAL_INC=/opt/spral_install/include \
        -DMETIS_LIB=/usr/lib/x86_64-linux-gnu \
        -DMETIS_INC=/usr/include \
        ../.. 2>&1 | tee log/cmake_spllt_omp.log \
    && make spllt 2>&1 | tee log/make_spllt.log
# `make` or `make all` leads to error at `spllt_omp_bench`

WORKDIR /opt/SpLLT/build/build_omp
RUN make test_solve_phasis 2>&1 | tee log/make_test.log

# prepare test matrix
RUN mkdir /opt/data \
    && cd /opt/data \
    && wget https://suitesparse-collection-website.herokuapp.com/RB/Schmid/thermal1.tar.gz \
    && tar zxvf thermal1.tar.gz

# run test script, success
RUN ln -s /opt/data/thermal1/thermal1.rb matrix.rb \
    && ./test_solve_phasis | 2>&1 tee log/run_test.log

# Build SpLLT driver, get compile error
RUN make spllt_omp_bench 2>&1 | tee log/make_driver.log

Run

docker build -t spllt_debug .
docker run --rm -it spllt_debug

Then various logs will be inside build_omp/log of the container.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant