next up previous contents
Next: Setting the SR parameters Up: Optimization Methods Previous: Optimization Methods   Contents

Stochastic Reconfiguration

Stochastic Reconfiguration (SR) technique was initial developed to partially solve the sign problem in lattice green function Monte Carlo (51) and then it was used as an optimization method for a generic trial-function (49,15). An important advantage of this technique is that we use more information about the trial-function than the simple steepest descent allowing a faster optimization of the many-body wave-function.
Given a generic trial-function $ \Psi_T$ , not orthogonal to the ground state it is possible to obtain a new one closer to the ground-state by applying the operator $ (\Lambda-\hat{H})$ to this wave-function for a sufficient large $ \Lambda$ . The idea of the Stochastic Reconfiguration is to change the parameters of the original trial-function in order to be as close as possible to the projected one.
For this purpose we define:
$\displaystyle \vert \Psi_P \rangle$ $\displaystyle =$ $\displaystyle \left( \Lambda - \hat{H} \right) \vert \Psi_T (\alpha_k'...\alpha_p) \rangle$ (2.1)
$\displaystyle \vert \Psi_T' \rangle$ $\displaystyle =$ $\displaystyle \delta \alpha_0 \vert \Psi_T \rangle + \sum^p_{k=1} \delta \alpha_k \frac{\partial}{\partial \alpha_k} \vert \Psi_T \rangle$ (2.2)

where $ \Psi_P$ is the projected one and $ \Psi_T'$ is the new trail-function obtained changing variational parameters. We can write the equation eq. 2.2 as:

$\displaystyle \Psi_T'= \sum^p_{k=0} \delta \alpha_k \hat{O}_k \Psi_T$ (2.3)

where

$\displaystyle \hat{O}_k \Psi_T(x)=\frac{\partial}{\partial \alpha_k} \ln {\Psi_T(x)}$    and $\displaystyle \hat{O}_0 = \hat{I}$ (2.4)

Now we want to choose the new parameters in such a way that $ \Psi_T'$ is as close as possible to $ \Psi_P$ . Thus we require that a set of mixed average correlation function, corresponding to the two wave-functions 2.2, 2.1, are equal. Here we impose precisely that:

$\displaystyle \frac{\langle \Psi_T \vert\hat{O}_k\vert \Psi_T' \rangle}{\langle...
...i_T \vert\hat{O}_k\vert \Psi_P \rangle}{\langle \Psi_T \vert \Psi_P \rangle}\\ $ (2.5)

for $ k=0,...,n$ . This is equivalent to the equation system:
$\displaystyle \delta \alpha_0 + \sum_{l=1} \delta \alpha_l \langle \hat{O}^l \rangle$ $\displaystyle =$ $\displaystyle \Lambda - \langle \hat{H} \rangle$ (2.6)
$\displaystyle \delta \alpha_0 \langle \hat{O}^k \rangle + \sum_{l=1} \delta \alpha_l \langle \hat{O}^k \hat{O}^l \rangle$ $\displaystyle =$ $\displaystyle \Lambda \langle \hat{O}^k \rangle - \langle \hat{O}^k \hat{H} \rangle$    for $\displaystyle k\neq0$ (2.7)

Because the equation for $ \alpha_0$ is related to the normalization of the trial-function and this parameter doesn't effect any physical observable of the system, we can substitute $ \delta \alpha_0$ from the first equation in the others:

$\displaystyle \sum_{l=1} \delta \alpha_l s_{kl} = \langle \hat{O}^k \rangle \langle \hat{H} \rangle - \langle \hat{O}^k \hat{H} \rangle$ (2.8)

where

$\displaystyle s_{kl} = \langle ( \hat{O}^k - \langle \hat{O}^k \rangle ) ( \hat{O}^l -\langle \hat{O}^l \rangle )\rangle$ (2.9)

The solution of this equation system defines a direction in the parameters space. If we vary parameters along this direction for a sufficient small step $ \Delta t$ we will decrease the energy.
The matrix $ s_{k,l}$ is calculated at each iteration through a standard variational Monte Carlo sampling; the single iteration constitutes a small simulation that will be referred in the following as ``bin''. After each bin the wave function parameters are iteratively updated according to

$\displaystyle \alpha_i^{new} = \alpha_i^{old} +\delta \alpha_k \Delta t \\ $ (2.10)

SR is similar to a standard steepest descent (SD) calculation, where the expectation value of the energy $ E( \alpha_k)={ \frac{\langle \Psi \vert H \vert \Psi \rangle} {\langle \Psi \vert \Psi
\rangle }} $ is optimized by iteratively changing the parameters $ \alpha_i$ according to the corresponding derivatives of the energy (generalized forces):

$\displaystyle f_k = - \frac{\partial E}{\partial \alpha_k} = - \frac{ \langle \...
...\langle \Psi \vert H \vert \Psi \rangle }{ \langle \Psi \vert \Psi \rangle^2 },$ (2.11)

namely:

$\displaystyle \alpha_k \to \alpha_k + \Delta t f_k.$ (2.12)

where $ \Delta t$ is a suitable small time step, which can be taken fixed or determined at each iteration by minimizing the energy expectation value.

Indeed the variation of the total energy $ \Delta E$ at each step is easily shown to be negative for small enough $ \Delta t$ because, in this limit

$\displaystyle \Delta E = - \Delta t \sum_i f_i^2 + O(\Delta t^2).$

Thus the method certainly converges at the minimum when all the forces vanish. In the SR we have

$\displaystyle \alpha_i^{new} = \alpha_i^{old} + \sum_i \bar s^{-1}_{i,k} f_k \Delta t$ (2.13)

Using the analogy with the steepest descent, it is possible to show that convergence to the energy minimum is reached when the value of $ \Delta t$ is sufficiently small and is kept constant for each iteration. Indeed the energy variation for a small change of the parameters is:

$\displaystyle \Delta E = -\Delta t \sum_{i,j} \bar s^{-1}_{i,j} f_i f_j,$

and it is easily verified that the above term is always negative because the reduced matrix $ s$ , as well as $ s^{-1}$ , is positive definite, being $ s$ an overlap matrix with all positive eigenvalues.
For a stable iterative method, such as the SR or the SD one, a basic ingredient is that at each iteration the new parameters $ \alpha^\prime$ are close to the previous $ \alpha $ according to a prescribed distance. The fundamental difference between the SR minimization and the standard steepest descent is just related to the definition of this distance. For the SD it is the usual one, that is defined by the Cartesian metric $ \Delta_\alpha = \sum_k \vert \alpha^\prime_k - \alpha_k\vert^2$ , instead the SR works correctly in the physical Hilbert space metric of the wave function $ \Psi$ , yielding $ \Delta_\alpha= \sum_{i,j} s_{i,j} (\alpha^\prime _i-\alpha_i)
( \alpha^\prime_j-\alpha_j),$ namely the square distance between the two wave functions corresponding to the two different sets of variational parameters $ \{ \alpha^\prime \}$ and $ \{ \alpha_k \}$ . Therefore, from the knowledge of the generalized forces $ f_k$ , the most convenient change of the variational parameters minimizes the functional $ \Delta E +\bar \Lambda \Delta_\alpha $ , where $ \Delta E$ is the linear change in the energy $ \Delta E = -\sum_{i} f_i (\alpha^\prime_i-\alpha_i) $ and $ \bar \Lambda$ is a Lagrange multiplier that allows a stable minimization with small change $ \Delta_\alpha$ of the wave function $ \Psi$ . Then the final iteration (2.13) is easily obtained.

The advantage of SR compared with SD is obvious because sometimes a small change of the variational parameters corresponds to a large change of the wave function, and the SR takes into account this effect through the Eq. 2.13. In particular the method is useful when a non orthogonal basis set is used, as we have in this thesis. Moreover by using the reduced matrix $ s$ it is also possible to remove from the calculation those parameters that imply some redundancy in the variational space, as it is shown in the following sections of this chapter.



Subsections
next up previous contents
Next: Setting the SR parameters Up: Optimization Methods Previous: Optimization Methods   Contents
Claudio Attaccalite 2005-11-07