LBNL-42953
TRLAN is a program designed to find a small number of extreme eigenvalues and their corresponding eigenvectors of a real symmetric matrix. Denote the matrix as the eigenvalue as and the corresponding eigenvector as they are defined by the following equation,
There are a number of different implementations of the Lanczos algorithm available(1). Why another one? Our main motivation is to develop a specialized version that only target the case where one wants both eigenvalues and eigenvectors of a large real symmetric eigenvalue problems that can not use the shift-and-invert scheme. In this case the standard non-restarted Lanczos algorithm requires one to store a large number of Lanczos vectors which can cause storage problem and make each iteration of the method very expensive. The underlying algorithm of TRLAN is a dynamic thick-restart Lanczos algorithm. Like all restarted methods, the user can choose how many vectors can be generated at once. Typically, the user choose a moderate size so that all Lanczos vectors can be stored in core. This allows the restarted methods to execute efficiently. This implementation of the thick-restart Lanczos method also uses the latest restarting technique, it is very effective in reducing the time required to compute a desired solutions compared to similar restarted Lanczos schemes, e.g., ARPACK(2).
When solving most problems, the three most time-consuming procedures in
the Lanczos method are the matrix-vector multiplication,
re-orthogonalization and computation of the Ritz vectors. To make this
package as small as possible, we have delegated the task of performing
the matrix-vector multiplication to the user. This is a reasonable
approach because there is simply too many possible ways of performing
the operation and the user usually can construct a specialize version
that is better than a generic matrix-vector multiplication routine. In
addition, there are high quality matrix-vector multiplication routines
available as parts of larger packages, for example, P_SPARSLIB, AZTEC,
BLOCKSOLVE and PETSc, see section Operator interface, for details. To
reduce the amount of the time spent in re-orthogonalization, we only
perform re-orthogonalization if it is necessary. The Ritz vectors are
computed as during the restarting process, in TRLAN, we only compute
those that are determined to be needed. This reduces the number of Ritz
vectors computed. To compute them efficiently, we call the BLAS
library to perform the actual computation.
The program is implemented in Fortran 90
. The main advantages of
using Fortran 90
compared to Fortran 77
is that
Fortran 90
offers dynamic memory management which make it more
flexible in terms of allocating temporary work arrays. If there is an
array not used for other task, it can be passed into TRLAN, else the
user can simple let TRLAN allocate its own work arrays. TRLAN internal
allocate all the space it requires up front to avoid repeated call to
allocated small pieces of workspace.
Similar to other languages, such as C/C++, Fortran 90
offers data
encapsulation which makes it convenient to pass a significant amount of
information cleanly. TRLAN packages a large amount of information in a
single object to reduce the size of the external user interface.
See section TRL_INFO module, for details. A significant advantage of using
Fortran
compared to C/C++ is the ease of using computational
libraries such as BLAS
and LAPACK
. In fact, most
numerical computations of TRLAN are performed using those library
functions. Because most machines have vendor optimized BLAS
and
LAPACK
, being able to effectively use them is crucial to the
effectiveness of TRLAN package.
Fortran 90
also provides some utility functions such as query
function for the machine precision, random number generator and timing
functions. They make the program more portable across different
platforms.
Parts of this document contains details about the software package which
may not be of interest to every user. Here are some advice on how to
use this document. If you just want to get a feel of how TRLAN looks
like in a program, take a look at section A small example or the examples come
with the software package. Chapter 3 contains a short example that uses
mostly default parameters. To assert more control over TRLAN, see
section TRL_INFO module, and section Main function interfaces. If you are somewhat
puzzled about how to choose the parameters, see section Recommended parameter choices, for
our recommendations. If you don't use Fortran 90
, see section Calling from other languages, for what to do. For most users, there is no need to read
everything in section The elements of TRL_INFO_T
, and section TRLAN error code. If you read the
entire document and are still puzzled, contact the authors, see
section Contacting the authors.
The source code of the package is available at
https://code.lbl.gov/frs/download.php/15/trlan.tar.gz.
This document is distributed with the package and is also separately available at
https://sdm.lbl.gov/%7Ekewu/ps/trlan-ug.ps
.
The package may be unpacked by
tar -xzf trlan.tar.gz
If you tar
program does not recognize flag z
,
you can unpack it in two steps
gunzip trlan.tar.gz tar -xf trlan.tar
After this, the files in the package will be unpacked into a directory called `TRLan'.
To install the package, you will need a Fortran 90
compiler, the
BLAS
and LAPACK
libraries. On parallel
machines, MPI is also needed. The compiler name and the options used
are specified in the file called `Make.inc'. A number of
examples are provided in the file for different machines. If your
compiler name and library locations are same as one of the examples, you
can uncomment the section, comment out the default values, and use the
settings. If your compiler has a different name or the libraries are
located at a different place, you will need to modify the file to refer
to their correct values. The package may be compiled into one of the
two library files libtrlan.a
and libtrlan_mpi.a
where the former is the sequential
version of the package and the latter is the parallel version. To
generate them go to `TRLan' and type
make libtrlan.a or make libtrlan_mpi.a
To compile the examples, go to the appropriate subdirectory in
`examples'. If you are on a sequential or a shared memory
computer and there is no subdirectory that matches your computer, the
source code in the SUN
directory can be used. The examples
in `T3E' and
`psp' can be run on parallel machines that support MPI.
`Makefile' in the `T3E' and `psp' directories
are only tested on a Cray T3E. They will need modification in order to
be used elsewhere. The examples in directory `psp' also need a
special supporting library called P_SPARSLIB, read the
file `README' before try to use it.
There are three examples in most example directories (except
`psp'), simple
, simple77
and simplec
. These are three programs should be doing the
same thing using three different languages. The executables can be
generated by make
make simple simplec simple77
They should output the same eigenvalues, however the actual printout may differ slightly due floating-point round-off errors.
For further questions, consult the `README' files in the
directories. To report errors in the installation procedure or suggest
improvements, the authors can be reached
at kwulbl.gov
(Kesheng Wu)
and hdsimonlbl.gov
(Horst Simon).
This is a simple example in Fortran 90
. It is short because we
have used a very simple matrix and used default parameters wherever
possible. It uses MPI to handle data communication required by TRLAN.
This example comes with the distribution of the source code in directory
`examples/T3E'. The name of the file is `s1.f90' and on T3E
is can be compiled by make s1
which generates the executable
s1
.
!!! a really simple example of how to use TRLAN Program simple Use trl_info Use trl_interface Implicit None Include 'mpif.h' ! local variable declaration Integer, Parameter :: nrow=100, lohi=-1, ned=5, maxlan=40, mev=10 Double Precision :: eval(mev), evec(nrow, mev) Type(trl_info_t) :: info Integer :: i External diag_op ! name of the matrix-vector multiplication routine Call MPI_INIT(i) ! initialize MPI ! initialize info -- tell TRLAN to compute NED smallest eigenvalues Call trl_init_info(info, nrow, maxlan, lohi, ned) ! call TRLAN to compute the eigenvalues Call trlan(diag_op, info, nrow, mev, eval, evec, nrow) Call trl_print_info(info, nrow+nrow) If (info%my_pe .Eq. 0) Then write (6, FMT=100) (i, eval(i), i=1,info%nec) End If 100 Format('E(', I1, ') = ', 1PG25.17) Call MPI_finalize(i) End Program simple !!! ! a simple matrix-vector multiplications routine ! defines a diagonal matrix with value (1, 4, 9, 16, 25, 36, ...) !!! Subroutine diag_op(nrow, ncol, xin, ldx, yout, ldy) Implicit None Integer, Intent(in) :: nrow, ncol, ldx, ldy Double Precision, Dimension(ldx*ncol), Intent(in) :: xin Double Precision, Dimension(ldy*ncol), Intent(out) :: yout Include 'mpif.h' ! local variables Integer :: i, j, ioff, joff, doff Call MPI_COMM_RANK(MPI_COMM_WORLD, i, j) doff = nrow*i Do j = 1, ncol ioff = (j-1)*ldx joff = (j-1)*ldy Do i = 1, nrow yout(joff+i) = (doff+i)*(doff+i)*xin(ioff+i) End Do End Do End Subroutine diag_op
There are two parts in this example, the main program and the
matrix-vector multiplication subroutine. The main program sets up the
info
variable to carry information to and from TRLAN, calls
TRLAN, and prints the information carried out in info
and the
eigenvalues computed. Here is a short explanation of the arguments to
trl_init_info
.
call trl_init_info(info, ! the variable to be set nrow=100, ! there are 100 rows on each processor maxlan=40, ! maximum Lanczos basis size is 40 lohi=-1, ! compute the smallest eigenvalues ned=5) ! compute 5 eigenvalues
The calling sequence of TRLAN is fairly simple because all
the gory details are hidden inside info
. The following listing
describes the information required by TRLAN to solve an eigenvalue
problem.
call trlan(diag_op, ! matrix-vector multiplication routine info, ! what eigenvalues to compute, etc. nrow, ! 100 rows on this processor mev, ! number of eigenpairs can be stored in ! eval and evec eval, ! real(8) :: eval(mev) ! array to store eigenvalue evec, ! real(8) :: evec(lde,mev) ! array to store the eigenvectors lde) ! the leading dimension of evec
The content of info
and the eigenvalues are printed
separately. The content of info
is printed by calling
trl_print_info
which accepts two arguments, info
to be
printed and the number of floating-point operations used for one
matrix-vector multiplication on this processor. The second parameter is
needed because the matrix-vector multiplication is user-supplied. The
information is used to compute the speed of the matrix-vector
multiplication and the speed of the whole program. It can be ignored,
in which case trl_print_info
will leave the related fields blank.
The short matrix-vector multiplication routine, diag_op
, performs
multiplication with a very simple matrix, diag(1, 4, 9, ...)
.
The example tries to find 5 smallest eigenvalues of this matrix, 1, 4,
9, 16, 25. The following is the output from a run on a T3E/900 located
at the National Energy Research Supercomputing Center(3).
1998/09/24 18:37:16.834 (-07:00) TRLAN execution summary (exit status = 0 ) on PE 0 Number of SMALLEST eigenpairs: 6 (computed), 5 (wanted) Times the operator is applied: 847 (MAX: 2000 ) Problem size: 100 (PE: 0), 400 (Global) Convergence tolerance: 1.490E-08 (rel), 2.384E-03 (abs) Maximum basis size: 40 Restarting scheme: 0 Number of re-orthogonalizations 847 Number of (re)start loops: 36 Number of MPI processes: 4 Number of eigenpairs locked: 0 OP(MATVEC): 9.35440E-03 sec, 1.81091E+01 MFLOPS Re-Orthogonalization: 2.12016E-01 sec, 4.68742E+01 MFLOPS Restarting: 2.36795E-01 sec, 2.02369E+01 MFLOPS TRLAN on this PE: 5.29851E-01 sec, 3.00020E+01 MFLOPS -- Global summary -- Overall MATVEC Re-orth Restart Time(ave) 5.2985E-01 9.4129E-03 2.1143E-01 2.3685E-01 Rate(tot) 1.2001E+02 7.1990E+01 1.8801E+02 8.0930E+01 E(1) = 0.99999999997742750 E(2) = 3.9999999999816311 E(3) = 8.9999999999916049 E(4) = 16.000000000026944 E(5) = 25.000000000089663 E(6) = 36.000000000367905
In short, to use TRLAN to find some extreme eigenvalues, the user
defines a matrix-vector multiplication routine with the same interface
as diag_op
, calls trl_init_info
to specify what
eigenvalues to compute and calls trlan
to perform the bulk of the
computation. The remainder of this manual will explain the user
interface and how to control TRLAN in more detail. How to Write a
particular matrix-vector multiplication for an operator is beyond the
scope of this manual. Some packages containing distributed
matrix-vector multiplications routines are listed in section Operator interface.
The example in previous chapter uses two modules, TRL_INFO and
TRL_INTERFACE. As the name suggested, TRL_INTERFACE contains the user
interface for accessing TRLAN. The module TRL_INFO only contains the
definition of the Fortran 90
derived
type TRL_INFO_T. To make it easy to access, we have
provided six access functions,
trl_init_info
, trl_set_debug
,
trl_set_iguess
,
trl_set_checkpoint
, trl_print_info
, and
trl_terse_info
. The first four are for manipulate input
parameters to TRLAN and the last two are for printing the content of
TRL_INFO_T. We will discuss these access functions in
this chapter. The remaining interface functions are described in the
next Chapter. The last two sections of this chapter may be skipped if
the reader is only seeking information on how to use the package.
This initialization routine is equivalent to a generator function in
C++
. It is intended to be called before any other TRLAN
functions. Any parameter not explicitly set by the caller is set to its
default value and all internal counters are set to zero. Its
Fortran 90
interface
block is as follows,
Subroutine trl_init_info(info, nrow, mxlan, lohi, ned, tol,& & trestart, maxmv, mpicom) Use trl_info Integer, Intent(in) :: lohi, mxlan, ned, nrow Integer, Intent(in), Optional :: maxmv, mpicom, trestart Real(8), Intent(in), Optional :: tol Type(TRL_INFO_T), Intent(out) :: info End Subroutine trl_init_info
We have seen the mandatory arguments in the example, however, there are four optional arguments that were not used before. For completeness, we will give a short description of all arguments here.
info:
Fortran 90
derived type that will carry the information to
the trlan
subroutine. It is set by this subroutine. Any prior
content will be cleared.
nrow:
nrow
refers to the number of rows located on the current processor. It may
vary from processor to processor.
maxlan:
trlan
. The restarted Lanczos algorithm will
store up to maxlan
Lanczos vectors and one (1) residual
vector. An additional memory of size maxlan*(maxlan+10)
is
required to perform the Rayleigh-Ritz projection to compute the
approximate solutions. Generally, the larger maxlan
is,
the fewer matrix-vector multiplications are needed. See
section Selecting the maximum basis size
(maxlan
), for further discussion on this parameter.
lohi:
lohi
< 0), or the largest ones (lohi
> 0), or whatever
converges first (lohi
= 0).
ned:
nrow
, maxlan
, lohi
, and
ned
, are mandatory when calling trl_init_info
. The
following parameters are optional because a reasonable value can be
determined by trl_init_info
.
tol:
restart:
maxmv:
ned*ntot
where ntot
the global problem size.
mpicom:
trlan
. This parameter
is only meaningful if MPI is used. If the sequential version is used,
this variable is simply ignored internally. If MPI is used and this
variable is not set, trl_init_info
will
duplicate MPI_COMM_WORLD
and use the resulting communicator
for its internal communication operations.
There are cases we would like to monitor the progress of the restarted
Lanczos algorithm. We can do this by setting a few logging parameters.
Since the debug information may be voluminous, trlan
writes
them to files. Each MPI process will write its own debug information to
a separate file. The name of the file and how much debug information to
write is controlled by calling trl_set_debug
.
Subroutine trl_set_debug(info, msglvl, iou, file) Use trl_info implicit none Type(TRL_INFO_T), intent(inout) :: info integer, intent(in) :: msglvl, iou Character*(*), Optional :: file End Subroutine trl_set_debug
info:
TRL_INFO_T
type variable to be modified. The function
trl_init_info
should have been called before calling
trl_set_debug
.
msglvl:
trl_init_info
sets it to zero as the default value.
iou:
trlan
is being used.
Trl_init_info
sets it to 99 as the default value.
file:
trl_set_debug
. When it is not set, the corresponding
element of TRL_INFO_T
is not changed.
Trl_init_info
sets this elements to `TRL_LOG_' by default.
TRLAN program can either use a user-supplied initial guess, generate
an arbitrary initial guess or read a set of checkpoint file to get
starting vectors. The thick-restart Lanczos may start with arbitrary
number of vectors, however, the starting vectors have to satisfy a
strict relation. The simplest way to start the algorithm is to simply
provide one starting vector. If the function trl_set_iguess
is
not called, the starting vector is set to [1, ..., 1] by default. The
checkpoint option is implemented to enable a user to continue improve
the accuracy of the solutions progressively.
Subroutine trl_set_iguess(info, nec, iguess, oldcpf) Use trl_info Implicit None Type(TRL_INFO_T) :: info Integer, Intent(in) :: iguess, nec Character(*), Intent(in), Optional :: oldcpf End Subroutine trl_set_iguess
info:
TRL_INFO_T
type variable to be modified. The function
trl_init_trl
should have been called before calling
trl_set_iguess
.
nec:
nec
is greater than zero (0), the first nec
columns of
array evec
should contain eigenvectors of the operator and the
first nec
elements of eval
should contain the
corresponding eigenvalues. This is designed to allow the user to return
to trlan
to compute more eigenvalues and eigenvectors.
Trl_init_info
sets this value to zero to indicate no converged
eigenvalues.
iguess:
<1:
iguess
is less than zero, a random perturbations will be added to
this vector before it is taken as the starting vector.
1:
>1:
oldcpf:
trl_init_info
for this is `TRL_CHECKPOINT_'.
NOTE: Reading the checkpoint files are done through I/O unit
cpio
. Trl_init_info
sets this value to 98 by default. If
I/O unit is used for another task already, use trl_set_checkpoint
to set cpio
to an unused I/O unit number.
TRLAN has implemented a scheme of checkpointing to allow the user to
stop and restart. The checkpoint files of the thick-restart Lanczos
algorithm contains all the information necessary for it to continue the
Lanczos iterations. To minimize the size of the files, the checkpoint
files are written at the end of the restart process because the basis is
the smallest in size at this point. For efficiency reasons, each MPI
processor writes its own checkpoint file in FORTRAN
unformatted
form. These checkpoint files can only be read on the same type of
machines and using the same number of MPI processors. The function
trl_set_iguess
controls whether the checkpoint files are read.
The following function controls when to write the checkpoint files.
Subroutine trl_set_checkpoint(info, cpflag, cpio, file) Use trl_info Implicit None Type(TRL_INFO_T) :: info Integer, Intent(in) :: cpflag, cpio Character(*), Optional :: file End Subroutine trl_set_checkpoint
info:
TRL_INFO_T
type variable to be modified. The function
trl_init_trl
should have been called before calling
trl_set_checkpoint
.
cpflag:
cpflag
set of checkpointing files in maxmv
iterations. If it is less or
equal to zero, no checkpoint file is written. Checkpointing files are
only written if TRLAN runs correctly. If cpflag
is greater than
zero, at least one set of checkpoint files is written when TRLAN
completes successfully. To debug the program, turn on verbose printing
by using trl_set_debug
.
Trl_init_info
sets this value to zero.
cpio:
FORTRAN
I/O unit number to be used for writing checkpoint
files. The value of cpio
is set to 98 by default
(trl_init_info
).
file:
TRL_INFO_T
is not modified.
Trl_init_info
sets this variable to `TRL_CHECKPOINT_' by
default.
Upon returning from trlan
, the user may wish to exam the progress
of trlan
. One simple way to do this is to printout the content
of info
. There are two printing functions trl_print_info
and trl_terse_info
. Function trl_print_info
is the one
that printed the results in section A small example. The following is the same
information printed using trl_terse_info
.
NOTE: The eigenvalues are not stored in info
. The following
printout is from a different run of the same example, there is slight
difference in time.
MAXLAN: 40, Restart: 0, NED: - 5, NEC: 6 MATVEC: 847, Reorth: 847, Nloop: 36, Nlocked: 0 Ttotal:0.535910, T_op:0.008962, Torth:0.209496, Tstart:0.238200
The Fortran 90
interface
of the two subroutines are as
follows.
Subroutine trl_print_info(info, mvop) Use trl_info Implicit None Type(TRL_INFO_T), Intent(in) :: info Integer, Intent(in) :: mvop End Subroutine trl_print_info
Subroutine trl_terse_info(info, iou) Use trl_info Implicit None Type(TRL_INFO_T), Intent(in) :: info Integer, Intent(in) :: iou End Subroutine trl_terse_info
info:
TRL_INFO_T
variable to be printed. In addition of keeping
track of how many eigenvalues have converged and how many are wanted.
There are significant amount of information about how many matrix-vector
multiplications have been used, how much time is used in various parts
of the program, and so forth. The verbose version of the printing
function will printout most of the recorded information that are deemed
to be useful. The terse version only printout the 12 most important
fields.
mvop:
trl_print_info
to determine the speed of
the matrix-vector multiplication and the speed of the overall eigenvalue
program. If this information is not present, the relevant fields are
left blank in the printout. The variable info
contains timing
information and floating-point operations performed inside the
trlan
. Since the matrix-vector multiplication is a user-supplied
function, the user has to provide information about its complexity.
iou:
trl_terse_info
is allowed to print to
any valid FORTRAN
I/O unit. This is different from the verbose
printing function where the printout is always sent to the I/O unit
that is used for logging debugging information.
In addition to the differences mentioned already, the function
trl_print_info
requires every processor to participate but
trl_terse_info
can be called by each processor individually.
Because of this reason, trl_print_info
can provide global
performance information but not trl_terse_info
.
The verbose version of the printout is designed to be self-explanatory.
The rate fields for floating-point operations are in MFLOPS. The rate
of Read
and Write
refers to the speed of reading and
writing checkpoint files, they are in MegaBytes per second. Since the
simple example shown does not use checkpointing, no information
regarding Read
and Write
is presented in the printout.
The heading for the terse version of the printout is bit cryptic. They are
MAXLAN:
Restart:
NED:
+
, -
, or 0
to
indicate which end of the spectrum is being computed.
NEC:
MATVEC:
Reorth:
Nloop:
Nlocked:
epsilon
|| A ||
). Because the residual norms are so
small, we lock the Ritz pairs to reduce the among of arithmetic
operations needed in Rayleigh-Ritz projection.
Ttotal:
T_op:
Torth:
Tstart:
TRL_INFO_T
In some instances, it might be necessary to directly access the status
information in info
rather than print out the information.
Whether trlan
terminated because of some kind of error, the
two elements of TRL_INFO_T
that are most like to be useful
after returning from trlan
are stat
and nec
, where the first one is the error flag
of trlan
and the second one indicates how many eigenvalues
and eigenvectors have converged.
The input parameters to TRLAN have appeared in the calling sequence
of trl_init_info
, trl_set_debug
,
trl_set_iguess
and
trl_set_checkpoint
are described earlier in this chapter. The
following are elements of TRL_INFO_T
are counters set by
trlan
. To the user, they are part of the output from TRLAN.
my_pe
npes
ntot
matvec
maxmv
.
nloop
trlan
has reached the maximum size and restarted.
north
nrand
trlan
has generated random vectors in attempt
to produce an vector that is orthogonal to the current basis vectors.
locked
epsilon
|| A ||
). The accuracies of these
eigenpairs can not be further improve by more Rayleigh-Ritz projection,
therefore they are locked to reduce arithmetic operations, they
are only used to perform re-orthogonalization but nothing else. TRLAN
locks not only the wanted eigenpairs with small residual norm, it also
locks unwanted ones depending the restarting strategy. The rational is
that if they converge quickly, it is not good idea to throw them away
because they will reappear in the next basis built.
clk_rate
Fortran 90
intrinsic function
system_clock
.
clk_max
clk_tot tick_t clk_op tick_o clk_orth tick_h clk_res tick_r
clk_op
,
tick_o
), re-orthogonalization (clk_orth
, tick_h
),
restarting (clk_res
, tick_r
), and the whole trlan
(clk_tot
, tick_t
). The four integer counters,
clk_op
, clk_orth
, clk_res
, and clk_tot
, are
output from Fortran 90
intrinsic function system_clock
.
Since it is likely the integer counters will overflow in some cases,
each of them has a floating-point counterpart. If they become larger
than a quarter of the maximum counter value clk_max
, they are
added to the floating-point counters and reset to zero.
flop rflp flop_h rflp_h flop_r rflp_r
Flop
and rflp
are for counting
the total floating pointer operations excluding those used by
matrix-vector multiplications. Flop_h
and rflp_h
are for
counting the re-orthogonalization procedure. Flop_r
and
rflp_r
count operations used in restarting. The integer
counters are used initially until they become larger than
clk_max/4
. Once they become too large, their values are added to
the corresponding floating-point counter and they are reset to zero.
Since the matrix-vector multiplication routine is supplied by the user,
TRLAN can not account for the floating-point operations used in that
procedure.
crat tmv tres trgt
crat
is the convergence factor.
It is measured after tmv
number of matrix-vector multiplications
are used. The value of the target at tmv
was tres
.
The convergence factor is updated as follows. Among the Ritz values,
search for the one that is closest to trgt
. This Ritz value is
regarded as the updated version of the previous target Ritz value. Let
res
be the residual norm of the Ritz value, the convergence
factor is computed as crat
= Exp(Log(res / tres
) /
(matvec - tmv
)).
After crat
is updated, the current target value is identified as
the first Ritz value that is not converged yet. The value of tmv
and the corresponding residual norm tres
are recorded.
anrm
tol
,
all the Ritz pairs with residual norm less than tol * anrm
are
considered converged.
stat
This section lists all error numbers defined in TRLAN and discusses possible remedies to the errors. Currently (TRLAN version 1.0), defines the follow error numbers,
0
trlan
has not computed all wanted eigenpairs,
check the value of nec
to see exactly how many wanted eigenpairs
have converged. In case you have not computed all wanted eigenpairs,
the possible solutions are:
trl_set_checkpoint
to generate checkpoint files for future use.
maxlan
, See section Selecting the maximum basis size (maxlan
) for
more details.
maxmv
,
See section Selecting the maximum iterations for more details.
-1
nloc
) does not match
the value of nrow
used when calling trlan
. Most likely
the user has used the info
variable defined for a different
problem.
SOLUTION: Make sure trl_init_info
is called before
trlan
and the arguments to both functions are correct for the
intended eigenvalue problem.
-2
evec
is smaller
than local problem size, lde < nrow
. There isn't enough space in
evec
to store the eigenvectors correctly.
SOLUTION: Allocate the array evec
with leading
dimension larger or equal to nrow
.
-3
eval
is too small to store the eigenvalues,
mev < info%ned
. There isn't enough columns in evec
either.
SOLUTION: Increase the size of array eval
and
increase the number of columns in evec
.
-4
misc
) is
maxlan * (maxlan + 10)
. TRLAN tries to allocate its own
workspace if the user has not provided enough workspace to store the
Lanczos basis vectors and the projection matrix, et al.
SOLUTION:
maxlan
. This will decrease the among of
workspace required.
-5
base
) size required here is
(maxlan + 1 - mev) * nrow
.
SOLUTION: See solutions for error code -4
.
-11
trlanczos
with insufficient workspace.
SOLUTION: Increase workspace provided.
-12
trlanczos
with insufficient
workspace.
SOLUTION: Increase workspace provided.
-101
TRLAN
does not have enough
workspace. This should not happen unless trl_orth
is directly
called outside of TRLAN with insufficient workspace.
SOLUTION: Increase workspace provided.
-102
1E160
, this error code should not
be generated. If it is, normally it is an indication of other errors.
For example, the workspace given by the user is not as large as the user
indicated to trlan
, the array evec
is actually smaller
than (lde, mev)
, the first nec
columns of evec
are
not orthonormal vectors on input, or you have encounter a bug in TRLAN
or one of the libraries used by TRLAN.
SOLUTION:
trlan
is as large as
claimed, i.e., the actual size of wrk
is at least as large as
lwrk
. If wrk
is present but not lwrk
, the actual
size of wrk
should be no less than mev
.
trlan
is not going to compute all
the eigenvectors because trlan
uses the space in evec
to
store the Lanczos vectors during its computations.
nec
is not zero, make sure that the
known eigenvectors are stored in the first nec
columns of
evec
and the eigenvalues in the first nec
elements of
eval
.
-111
-112
LAPACK
routine
dsytrd
/ssytrd
has failed. This is extremely unlikely to
happen.
SOLUTION: Check to make sure you have a correct version
of the LAPACK
. If your LAPACK
is installed correctly, see
suggestions for error -102
.
-113
LAPACK
routine
dorgtr
/sorgtr
has failed. This is extremely unlikely to
happen.
SOLUTION: See solutions to error -102
.
-121
wrk
passed to trlan
is no less than lwrk
.
SOLUTION: See solutions to error -102
.
-122
-102
.
-131
-102
.
-132
LAPACK
routine dstein
/sstein
has
failed. Normally, if this happens, TRLAN will switch to a different
method of computing the eigenvectors. It is very unlikely the user will
see this error flag. If it does show up, it might be an indication of
error in the program.
SOLUTION: See solutions to error -102
.
-141
-102
.
-142
LAPACK
routine dsyev
/ssyev
has failed to compute the eigenvalues and eigenvectors of the projection
matrix.
SOLUTION: Check to make sure LAPACK
is installed
correctly. See solution to error -102
.
-143
-144
dsyev
/ssyev
.
SOLUTION: See solutions to error -102
.
-201
trl_cgs
directly, make sure workspace size lwrk
matches the actual size of wrk
when calling trlan
.
-202
-203
FORTRAN 90
random number generator to set each processor with a
different seed value. For example, the following code segment will
cause random_number
to generate different random numbers on each
processor the next time it is used and it also produces an random vector
as initial guess for the Lanczos iterations.
call random_number(evec(1:nrow,1)) do i = 1, info%my_pe call random_number(evec(1:nrow,1)) end doIf reseting the seed of the random number generator does not fix the problem, then there might be a more serious problem. See also solutions to error
-102
.
-204
-102
.
-211
-2
unless the checkpoint files are not for the
same problem or not produced with the same number of processors.
SOLUTION: Make sure the checkpoint files are generated on
the same type of machines and for the same problem using the same
number of processors.
-212
cpio
is not used for
something else already.
-213
-214
maxlan
.
SOLUTION: Increase the size of maxlan
.
-215
-216
-221
-222
k
Lanczos vectors to
be written out, the number of bytes is 8*(2+2*k+nrow*(k+1))
for
each processor. The maximum value for k
is maxlan
.
-223
This list represents all error code defined in TRLAN version
1.0. The function trlan
should never return an
error code not listed here. If you encountered one and you are sure
that all the arguments to TRLAN functions are correct, you have
uncovered a flaw in TRLAN, contact the authors with a description of
your problem. A short example is always welcome.
The majority of the arithmetic computations to solve an eigenvalue problem are executed in the main function of TRLAN package and the user's own matrix-vector multiplication function. This section gives a more detail description of the interfaces of these two functions.
trlan
The main computation kernel of TRLAN package is named trlan
. We
have see a short description of its arguments in section A small example.
To provide a different view of the interface, we show its Fortran
90
interface
block.
Subroutine trlan(op, info, nrow, mev, eval, evec, lde, wrk, lwrk) Use trl_info Implicit None Type(TRL_INFO_T) :: info Integer, Intent(in) :: lde, mev, nrow Integer, Intent(in), Optional :: lwrk Double Precision, Intent(inout) :: eval(mev), evec(lde,mev) Double Precision, Target, Dimension(:), Optional :: wrk Interface Subroutine OP(nrow, ncol, xin, ldx, yout, ldy) Integer, Intent(in) :: nrow, ncol, ldx, ldy Double Precision, Dimension(ldx*ncol), Intent(in) :: xin Double Precision, Dimension(ldy*ncol), Intent(out) :: yout End Subroutine OP End Interface End Subroutine trlan
Most of the arguments of this subroutine are explained before in section A small example. However for completeness, we will list all of them here.
op
xin
and stores
the resulting vectors in yout
. In linear algebra terms, this is
a matrix-vector multiplication routine. The vectors to be multiplied
are stored in xin
and the resulting vectors are stored in
yout
.
info
Fortran 90
derived type TRL_INFO_T
. It
carries the information to and from TRLAN. See section TRL_INFO module, for
more details.
nrow
mev
eval
and the number of columns in
array evec
. It denotes the maximum number of eigenpairs that can
be stored in eval
and evec
.
NOTE: Since the array evec
will be used internal to store
mev
Lanczos vectors, even if you do not think TRLAN is able to
compute mev
eigenvectors at the end, you still declare
evec
as large as (nrow, mev
).
eval
evec
eval
) and the
eigenvectors (evec
). On entry to trlan
, if
info%nec
is greater than zero (0), the first info%nec
elements of eval
shall contain the eigenvalues already known and
the first info%nec
columns of evec
shall contain the
corresponding eigenvectors. These eigenpairs are assumed to have zero
residual norms and will not be modified by TRLAN. On exit from
trlan
, the converged solutions are stored in the front of
eval
and evec
, i.e. the first info%nec
eigenvalues
in eval
contains the converged eigenvalues and the first
info%nec
columns of evec
are the corresponding
eigenvectors.
lde
evec
. It is expected to be no
less than nrow
, otherwise the eigenvectors can not be stored
properly and trlan
will abort with error code info%stat =
-2
.
wrk
lwrk
wrk
will
be used as workspace inside and lwrk
shall be the number of
elements in array wrk
. TRLAN will try to use this workspace if
it is large enough for either misc
(size maxlan * (maxlan +
10)
) or base
(size (maxlan + 1 - mev) * nrow
). If
wrk
is present but not lwrk
, the workspace size is assumed
to be mev
. It does not make sense to have only lwrk
without argument wrk
. If it is the case, lwrk
is ignored.
If argument wrk
is present and there is enough space to store the
residual norms of the solutions, the first info%nec
elements of
wrk
will contain the residual norms corresponding to the
info%nec
converged solutions.
TRLAN program require the user to provide his/her own matrix-vector multiplication routine. The matrix-vector multiplication routine needs to have the following interface.
Subroutine OP(nrow, ncol, xin, ldx, yout, ldy) Integer, Intent(in) :: nrow, ncol, ldx, ldy Double Precision, Dimension(ldx*ncol), Intent(in) :: xin Double Precision, Dimension(ldy*ncol), Intent(out) :: yout End Subroutine OP
nrow
ncol
xin
and yout
) to be
multiplied.
xin
Double Precision, Dimension(1:ldx,1:ncol), Intent(in)::xin Real*8 xin(1:ldx, 1:ncol)The
i
th column of xin
is xin((i-1)*ldx+1 :
(i-1)*ldx+nrow)
if xin
is declared as one-dimensional array. If
the user routine actually declare it as a two-dimensional array, the
i
th column should be xin(1:nrow, i)
. TRLAN calls OP using
Fortran 77 style argument matching, only starting address of xin
will be passed.
For those who are familiar with C/C++
: xin
is actually
passed as double *
that points to the first element of array.
Elements in a column are ordered consecutively and the i
th column
starts at (i-1)*ldx
.
ldx
xin
when it is declared as
two-dimensional array.
yout
Double Precision,Dimension(1:ldy,1:ncol),Intent(out)::yout Real*8 yout(1:ldy, 1:ncol)The usage notes on
xin
also apply to yout
.
ldy
yout
when it is declared as
two-dimensional array.
This simple interface only has enough information to describe the input
and output vectors. Here are some possible ways of passing the matrix
information to this subroutine. In Fortran 90
, we recommend
using a module to encapsulate information related to the matrix. If
Fortran 77
is used, a common block
may be used for the
same purpose. Normally, if another language like C
or C++
is used, the matrix can be packaged in a struct
or a
class
, and accessed through a global variable.
In case the user does not want to write his/her own matrix-vector multiplication routine. There are a number of packages out there that can be used. Useful software depots and information archives include
ACM TOMS
ACTS Toolkit
National HPCC Software Exchange
NETLIB
Scientific Application on Linux
Potentially useful packages include
Aztec
BlockSolve
P_SPARSLIB
PETSc
SPARSKIT
NOTE: All of the packages mentioned above have matrix-vector multiplications routines. However some of them are designed for solving linear systems or even larger granularity tasks, some effort may be required to directly using their matrix-vector multiplication routines.
Before calling trlan
, the user needs to decide a few parameters.
The most important parameters are arguments to function
trl_init_info
. The parameters like nrow
, lohi
and
ned
are determined by the problem to be solved, other parameters
to control the execution of TRLAN might not be familiar to casual users.
This part of the manual will give some recommendations on how to
determine those parameters.
maxlan
)
A few factors come into play when picking the maximum basis size
maxlan
, for example, the available computer memory size, the
number of eigenvalues wanted, and the separation of wanted eigenvalues
from the others. The first rule of thumb is that maxlan
should
at least as large as
ned + min(6, ned)
.
Generally, the larger it is, the better TRLAN will perform. The limitation on using a very large basis is that there might not be enough computer memory to store the basis in memory. Another concern regarding using a large basis is that the Gram-Schmidt orthogonalization process will be expensive. In addition, if the basis size is larger than 1000, then the time spent in finding the eigenvectors of the projection matrix may be a substantial portion of the overall execution time.
If the wanted eigenvalues are easier to compute compared to others, then
it does not matter how large the basis size is, the restarted Lanczos
method will find the solutions fairly quickly. If the wanted
eigenvalues converge slower than the unwanted ones, such as the example
in section A small example, then the above recommended minimum size is too small
to be effective. In this case, the user should look at how many
eigenvalues were locked and compare it with the number of eigenvalues
converged. In difficult case, it is not unusually to see a large number
of unwanted eigenpairs converge before the wanted one are finally
computed. In the previous example, the minimum recommended basis size
is 11. Since it is relatively small and we know the eigenvalue problem
is relatively hard, we first tried maxlan = 20
. After 2,000
matrix-vector multiplications, there are two wanted eigenvalues
converged, and six eigenvalues were locked. After this first test, we
use the following guidelines to choose the next basis size.
maxlan
for each locked eigenvalue.
ned / nec
.
The first rule suggests the new basis size of about 30 and the second suggest the next choice could be 50. The basis size used in the example is 40. We are able to find the five smallest eigenvalues with this choice. Further tests show that using basis size of 30 can compute the same 5 eigenvalues in 1056 matrix-vector multiplications, and using a basis size of 50 TRLAN only need 777 matrix-vector multiplications. However, in both cases, more time was used. This demonstrates the complexity of the choice. In this particular case, either 30, 40 or 50 is a reasonable choice.
The convergence test used in this program is r < tol * || A ||
.
Normally, if the matrix is stored, the accuracy of the matrix-vector
multiplication routine is on the order of epsilon * || A ||
. The
unit round-off error (epsilon
) of a 64-bit IEEE floating-point
number is approximately 2.2E-16
. The default value of tol
is about 1.49E-8
. Typically, if 5 digits of accuracy is desired
for the eigenvectors, tol
should be set to 1E-5
.
This is another parameter that can change the execution time dramatically. However, effective restarting schemes are still subject of active academic researches. On the example given before, schemes 1 and 2 uses about the same amount of matrix-vector multiplications which are more than the number of matrix-vector multiplications used with schemes 3 and 4. However, because schemes 3 and 4 perform more restarts and they save more basis vectors during restarting, their restarting procedures are more expensive. The actual execution time with schemes 3 and 4 are longer than those with schemes 1 and 2. Based on these observations, schemes 3 and 4 are better if the matrix-vector multiplication is very time-consuming, say, one matrix-vector multiplication takes more time than an average restart. If the matrix-vector multiplication is relatively inexpensive, then schemes 1 and 2 are preferred. Scheme 5 attempts to mimic the restarting strategy in ARPACK, in many cases, it has comparable performance as the scheme 1.
TRLAN is stopped usually after it has found all the wanted eigenvalues
and the corresponding eigenvectors. The other normal stopping condition
is to stop after maxmv
number of matrix-vector multiplications.
Typically, we would allow a fixed number of matrix-vector
multiplications for each eigenvalue, for example, 100 per eigenvalue.
When we are trying to find the correct value to use for maxlan
and restart
we may limit the number of matrix-vector
multiplications used to reduce the time consumed. The default value in
trl_init_info
is very large especially for large problems. An
more acceptable limit might be 1000 matrix-vector multiplications per
eigenvalue. This should be sufficient for most problems. If more than
1000 matrix-vector multiplications are used to compute one eigenvalue,
other means of computing eigenvalues should be tried. For example, the
shift-and-invert Lanczos method is often able to compute the desired
eigenvector in a few steps. The shift-and-invert scheme computes the
extreme eigenvalues of
first, then derive the actual eigenvalues of
To use this scheme, one need to either invert the matrix or at least
being able to solve linear systems,
If neither is feasible, then the Davidson method might be an alternative
to consider.
If TRLAN does not return with status 0, consult section Setting debug parameters, to setup a debugging session, and refer to section TRLAN error code, for error encountered and possible solutions.
Some of the issues related to workspace requirements have been mentioned through out this manual. This section provide a central location to collect all the information for ease of reference.
Inside of trlan
, there are three large chucks of workspace,
evec
, base
and misc
. The user always provides the
array evec
, since it is necessary to carry input and output
information for TRLAN. Its size is clearly defined in the calling
sequence by lde
and mev
. The array base
is used to
store the basis vectors if the array evec
can not store
maxlan+1
vectors. Given the maximum basis size maxlan
,
the size of base
is (maxlan + 1 - mev) * nrow)
. The array
misc
is used to store the projection matrix, the eigenvalues and
eigenvectors of the projection matrix, workspace required by all lower
level routines of TRLAN, library routines from LAPACK
and
BLAS
. Its size should be no less than maxlan * (maxlan +
10)
. If it is larger in size, some library routines might run faster.
Thus if there is large amount of computer memory, the user can let TRLAN
use it by pass in a large array wrk
.
If the user provides a workspace wrk
to trlan
, then its
size is checked to see either one of misc
or base
or both
of them can fit inside the workspace. If at lease one of them can fit
into wrk
, it would be used. If wrk
is large enough for
both base
and misc
, the array base
will use
(maxlan + 1 - mev) * nrow)
elements and the rest is given to
misc
. If trlan
can not use wrk
, it will allocate
workspace of appropriate size internally.
If wrk
is provided, its content is not used on input. However
before returning, trlan
will copy the residual norms of the
converged Ritz pairs in the first nec
elements of wrk
.
We have isolated the communication needs of TRLAN in four subroutines,
trl_init_info
, trl_sync_flag
, trl_g_sum
, and
trl_g_dot
. The four subroutines are located in a file called
`trl_comm_mpi.f90' for the MPI version and `trl_comm_none.f90'
for sequential version. If the the matrix-vector multiplication routine
and the main program are written for sequential machine, the user can
simply compile with `trl_comm_none.f90' to get the sequential
version of the program. This setup makes it easy to adopt TRLAN for
different types of eigenvalue problems.
The function trl_init_info
is used to initialize the
TRL_INFO_T
type variable to be used by trlan
. The
function trl_sync_flag
is used to synchronize the status flags
used inside TRLAN. In the current implementation, it computes the
minimum value of info%stat
on each processor and reset all
info%stat
to the minimum value. Given that the error flags are
all less than zero (0), if any processor has detected an error, all of
them will be set to indicate an error. Trl_g_sum
computes the
global sum of an input array and it returns the global sum in the same
array. The subroutine trl_g_dot
computes the dot-products among
the Lanczos vectors.
If desired, one can change these four routines to suit different
situations. For example, if the physical domain of the eigenvalue
problem has certain symmetry, usually the discretization does not
contain the whole domain but only a portion of it. Since not every
element of a vector is stored, the dot-product routine needs to be
modified. In this case, only trl_g_dot
and trl_g_sum
need
to be modified in order for TRLAN for function properly.
TRLAN program is implemented in Fortran 90
. Since Fortran
90
is backward compatible with previous versions of Fortran
.
There should be no problem to use it in any other Fortran
program. However, at the moment, the authors are not aware of a scheme
to reliably access Fortran 90
subroutine with optional arguments,
a subroutine with fixed arguments is created to get around this problem.
The fixed arguments subroutine has the following interface,
subroutine trlan77(op, ipar, nrow, mev, eval, evec, lde, & wrk, lwrk) integer ipar(32), nrow, mev, lde, lwrk double precision eval(mev), evec(lde, mev), wrk(lwrk) external op
The Fortran 90
derived type TRL_INFO_T
variable
is removed from this user interface since its primary access function
trl_init_info
contains optional arguments as well. Here is a
list showing how the integer array ipar
is mapped to the elements
of TRL_INFO_T
,
ipar(1) = stat
,
ipar(2) = lohi
,
ipar(3) = ned
,
ipar(4) = nec
,
ipar(5) = maxlan
,
ipar(6) = restart
,
ipar(7) = maxmv
,
ipar(8) = mpicom
,
ipar(9) = verbose
,
ipar(10) = log_io
,
ipar(11) = iguess
,
ipar(12) = cpflag
,
ipar(13) = cpio
,
ipar(14) = mvop
,
ipar(24) = locked
,
ipar(25) = matvec
,
ipar(26) = nloop
,
ipar(27) = north
,
ipar(28) = nrand
,
ipar(29) = total time in milliseconds
,
ipar(30) = MATVEC time in milliseconds
.
ipar(31) = re-orthogonalization time in milliseconds
.
ipar(32) = restarting time in milliseconds
.
Among the parameters, ipar(2 : 14)
are input parameters,
ipar(1)
, ipar(4)
and ipar(24 : 32)
are output
parameters.
There are two floating-point number elements in TRL_INFO_T
,
tol
and crat
. Before calling trlan77
, the first
element of wrk
should be set to the residual tolerance
tol
. Inside trlan77
, wrk(1)
is transfered to
tol
. On return from trlan77
, the first ipar(4)
elements of wrk
store the residual norms corresponding to the
converged eigenvalues and eigenvectors. Element ipar(4)+1
of
wrk
will store the last known value of crat
.
Caution: Fail to set wrk(1)
to a valid floating-point
number will cause TRLAN to produce floating-point exceptions!
The subroutine trlan77
is relative simple to call from
C/C++
. The file `simplec.c' in the distribution of TRLAN
version 1.0 has equivalent functionalities as
`simple.f90' and `simple.f77'.
Here are a few suggestions on what to watch out for when using TRLAN. Some of suggestions are simply good programming practices.
implicit none
in Fortran
programs. This is very
effective in catching typos. It is also a good practice to check the
programs with automated tools such as lint
and ftnchek
.
lwrk
is actually the size of array wrk
.
trlan
matches arguments to trl_init_info
.
trl_set_debug
prior to calling trlan
. Set the
ninth and tenth element of array ipar
to appropriate values when
trlan77
is used. Consult section Setting debug parameters to setup a
debugging session. Refer to section TRLAN error code for error encountered and
possible solutions.
The authors of TRLAN and this document can be contacted at the following
email addresses: kwulbl.gov
(Kesheng Wu),
hdsimonlbl.gov
(Horst Simon). Kesheng Wu can also be reached
at kwuieee.org
and kwucomputer.org
. The authors also
maintain their own research pages on the web at
https://sdm.lbl.gov/%7Ekewu and
http://www.nersc.gov/~simon. The updated software package
can also be found at both web addresses.
This is a list books and research papers on the Lanczos algorithm and restarting. See section Operator interface, for a list of software archives.