Next: Results on Molecules Up: Optimization Methods Previous: Structural optimization Contents

Hessian Optimization

The SR method generally performs very well, whenever there is only one energy scale in the variational wave function. However if there are several energy scales in the problem, some of the variational parameters, e.g. the ones defining the low energy valence orbitals, converge very slowly with respect to the others, and the number of iterations required for the equilibration becomes exceedingly large. Moreover the time step $\Delta t$ necessary for a stable convergence depends on the high energy orbitals, whose dynamics cannot be accelerated beyond a certain threshold. Futhermore the SR method is based on a first order dynamics, and as will be illustrated in the following (see section 5.4.2), it is not adapted to perform parameters optimization during a ion Langevin Dynamics. In this thesis to overcome these difficulties we have used a very efficient optimization method the Stochastic Reconfiguration with Hessian acceleration (SRH) (50).

The central point of the SRH is to use not only directions given by the generalized forces 2.11, to lower the energy expectation value, but also the information coming from the Hessian matrix to accelerate the convergence. The idea to use the Hessian matrix is not new, already Lin, Zhang and Rappe (45) proposed to use analytic derivatives to optimize the energy, but their implementation was inefficient and unstable.
Now we will review the SRH method and we will explain the reason of its efficiency.
Given an Hamiltonian and a trial-function $\psi_\alpha(x) = \langle x \vert \psi_\alpha \rangle$ depending on a set of parameters $\alpha = \alpha_1,\alpha_2,....\alpha_n$ , we want to optimize the energy expectation value of the energy on this wave-function:

$\displaystyle E_\alpha = \frac{\langle \psi_\alpha \vert H \vert \psi_\alpha \rangle}{\langle \psi_\alpha \vert \psi_\alpha \rangle}$

(2.24)

respect to the parameters set.
To simplify the notation henceforth the symbol

indicates the quantum expectation value over $\psi_\alpha$ , so that $E_\alpha = \langle H \rangle$ . In order the optimize the energy and to find the new parameters $\alpha' = \alpha + \gamma$ , we expand the trial-function up to second order in $\gamma$ :

$\displaystyle \vert\psi_{\alpha+\gamma} \rangle = \left \{ 1 + \left [ \sum_k^p... ...)(O^{k'} -\langle O^{k'} \rangle) \right] \right \} \vert \psi_{\alpha} \rangle$

(2.25)

with $\beta=1$ , where

is the operator with associated diagonal elements (15):

$\displaystyle O_k(x)= \frac{\partial_{\alpha_k} \psi_\alpha(x)}{\psi_\alpha (x)}$

(2.26)

Here the constant $\beta$ will be used in order to highlight the various terms in the energy expansions. Using the fact that:

$\displaystyle \langle \psi_{\alpha} \vert\psi_{\alpha} \rangle$	$\displaystyle =$	$\displaystyle 1$
$\displaystyle \langle \psi_{\alpha+\gamma} \vert\psi_{\alpha+\gamma} \rangle$	$\displaystyle =$	$\displaystyle 1 + \left ( 1+ \beta \right ) \sum_{k,k'=1}^{p} \gamma_k \gamma_{... ...(O^k -\langle O^k \rangle)(O^{k'} -\langle O^{k'} \rangle)\rangle + O(\gamma^3)$

we can expand up second order the energy given by the new wave-function $\psi_{\alpha+\gamma}(x)$ and obtain:

$\displaystyle E_{\alpha+\gamma}$	$\displaystyle =$	$\displaystyle \frac{\langle \psi_{\alpha+\gamma} \vert H\vert\psi_{\alpha+\gamma} \rangle}{\langle \psi_{\alpha+\gamma} \vert\psi_{\alpha+\gamma} \rangle}$
	$\displaystyle =$	$\displaystyle E_{\alpha} + 2 \sum \gamma_k \langle \left ( H - E_\alpha \right ) O^k \rangle$
	$\displaystyle +$	$\displaystyle \left ( 1+ \beta \right ) \sum_{k,k'=1}^{p} \gamma_k \gamma_{k'} ... ...pha \right ) (O^k -\langle O^k \rangle)(O^{k'} -\langle O^{k'} \rangle) \rangle$
	$\displaystyle +$	$\displaystyle \frac{1}{2} \langle \left [ O^{k} -\langle O^{k} \rangle , \left [ H -E_\alpha , O^{k} -\langle O^{k} \rangle \right ]\right] \rangle$

We can define:

$\displaystyle S_h^{k,k'}$	$\displaystyle =$	$\displaystyle \langle \left [ O^{k} -\langle O^{k} \rangle , \left [ H -E_\alpha , O^{k} -\langle O^{k} \rangle \right ]\right] \rangle$	(2.27)
$\displaystyle G^{k,k'}$	$\displaystyle =$	$\displaystyle 2 \langle \left ( H - E_\alpha \right ) (O^k -\langle O^k \rangle)(O^{k'} -\langle O^{k'} \rangle) \rangle$	(2.28)
$\displaystyle f_k$	$\displaystyle =$	$\displaystyle -2 \langle \left ( H - E_\alpha \right ) O^k \rangle$	(2.29)

and so the expansion of the energy reads:

$\displaystyle \Delta E = - \sum_k \gamma_k f_k + \frac{1}{2} \sum_{k,k'} \left ( 1+ \beta \right ) \left [ S_h + ( 1 + \beta ) G \right ]^{k,k'}$

(2.30)

The wave-function parameters can then iteratively changed to stay at the minimum of the above quadratic form whenever it is positive defined, and in such case the minimum energy is obtained for:

$\displaystyle \vec{\gamma} = B^{-1} \vec{f}$

(2.31)

where

$\displaystyle B = S_h + ( 1 + \beta) G$

(2.32)

It can happen that the quadratic form is not positive definite and so the energy expansion 2.30 is not bounded from below, this can due to different reasons: non quadratic corrections; statistical fluctuations of the Hessian matrix expectation value; or because we are far from the minimum. In this cases the correction due to the equation 2.31 may lead to an higher energy than $E_\alpha$ . To overcame this problem the matrix

is changed in such way to be always positive definite $B' = B + \mu S$ , where

is the Stochastic Reconfiguration matrix 2.9. The use of the

matrix guarantees that we are moving in the direction of a well defined minimum when the change of the wave-function is small, moreover in the limit of large $\mu$ we recover the Stochastic Reconfiguration optimization method. To be sure that the change of the wave-function is small we use a control parameter to impose a constraint to the variation of the wave-function $\Delta WF$ by means of the inequalities

$\displaystyle \vert\Delta WF\vert^2 \leq r^2$

(2.33)

where, using 2.9 and 2.25, $\vert\Delta WF\vert^2 = \langle \phi_\alpha \vert \phi_{\alpha + \gamma} \rangle = \sum_{k,k'} \gamma_k \gamma_{k'} S^{k,k'}$ . This constraint always allows to work with a positive definitive matrix

, and for small

the energy is certainly lower than $E_\alpha$ . We want to emphasize that the condition $\mu \geq 0$ is non zero both when 2.32 is not positive defined and when $\vert\Delta WF\vert$ corresponding to eq. 2.31 exceeds

. This is equivalent to impose a Lagrange multiplier to the energy minimization, namely $\Delta E + \mu \vert\Delta WF\vert^2$ , with the condition $\vert\Delta WF\vert = r$ .

There is another important ingredients for an efficient implementation of the Hessian technique to QMC. In fact, as pointed out in Ref.(58,50) is extremely important to evaluate the quantities appearing in the Hessian 2.32 in the form of correlation function . This because the fluctuation of operators in this form are usually smaller than the one of especially if and are correlated. Therefore using the fact that the expectation value of the derivative of a local value $O_L = \hat{O} \Psi/\Psi$ of an Hermitian operator $\hat{O}$ respect to any real parameter in a real wave function $\Psi$ is always zero (see for instance (45)), we can rearrange the Hessian terms in more symmetric way in form of correlation function:

$\displaystyle f_k$	$\displaystyle =$	$\displaystyle \langle E_L(x) O^k(x)\rangle -\langle E_L(x)\rangle \langle O^k(x)\rangle$
$\displaystyle S_h^{k,k'}$	$\displaystyle =$	$\displaystyle \langle \partial_{\alpha_k} E_L(x) O^{k'}(x)\rangle -\langle \partial_{\alpha_k} E_L(x)\rangle \langle O^{k'}(x)\rangle$
	$\displaystyle +$	$\displaystyle \langle \partial_{\alpha_{k'}} E_L(x) O^k(x)\rangle -\langle \partial_{\alpha_{k'}} E_L(x)\rangle \langle O^k(x)\rangle$
$\displaystyle S^{k,k'}$	$\displaystyle =$	$\displaystyle \langle O^{k'}(x) O^k(x)\rangle -\langle O^{k'}(x)\rangle \langle O^k(x)\rangle$
$\displaystyle G$	$\displaystyle =$	$\displaystyle \langle \delta E_L(x) \delta O^{k'}(x) \delta O^k(x) \rangle$

Because the

matrix 2.28 is zero for the exact ground state and therefore is expected to be very small for a good variational wave-function, it is possible, following the suggestion of Ref. (50), to chose $\beta = -1$ , so that $B = S_h + \mu S$ . As shown by Ref. (50) this choice can even lead to faster convergence than the full Hessian matrix.
The matrix

is the only part that is not in the form of a correlation function, for this reason is important that

does not depend on it, in such way to reduce the fluctuation of the Hessian matrix, and this can naively explain the suggestion of Ref. (50) to chose $\beta = -1$ .
As for the SR method the parameters are iteratively updated using the equation:

$\displaystyle \vec{\gamma} = \left [ S_h + \mu S \right ]^{-1} \vec{f}$

(2.34)

where the forces $\vec{f}$ and the matrix

are evaluated using VMC.

Next: Results on Molecules Up: Optimization Methods Previous: Structural optimization Contents

Claudio Attaccalite 2005-11-07