---
title: A Quick Demo of trustOptim
author:  Michael Braun
date:  "`r Sys.Date()`"
output:  rmarkdown::html_vignette
bibliography:  trustOptim.bib
vignette: >
  %\VignetteIndexEntry{Quick demo}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

```{r, echo = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#", message=FALSE)
options(digits=4, scipen=0)
```

This is a quick demo of how to use the *trustOptim* package.  For this
example, the objective function is the Rosenbrock function.
$$
 f(x_{1:N},y_{1:N})=\sum_{i=1}^N
 \left[100\left(x^2_i-y_i\right)^2+\left(x_i-1\right)^2\right]
$$

The parameter vector contains $2N$ variables ordered as $x_1, y_1,
x_2, y_2, ... x_n, y_n$. The optimum of the function is a vector of
ones, and the value at the minimum is zero.

The following functions return the objective, gradient, and Hessian
(in sparse format) of this function.

```{r}
require(trustOptim)
require(Matrix)
f <- function(V) {

    N <- length(V)/2
    x <- V[seq(1,2*N-1,by=2)]
    y <- V[seq(2,2*N,by=2)]
    return(sum(100*(x^2-y)^2+(x-1)^2))
}

df <- function(V) {
    N <- length(V)/2
    x <- V[seq(1,2*N-1,by=2)]
    y <- V[seq(2,2*N,by=2)]

    t <- x^2-y
    dxi <- 400*t*x+2*(x-1)
    dyi <- -200*t
    return(as.vector(rbind(dxi,dyi)))
 }

hess <- function(V) {

    N <- length(V)/2
    x <- V[seq(1,2*N-1,by=2)]
    y <- V[seq(2,2*N,by=2)]
    d0 <- rep(200,N*2)
    d0[seq(1,(2*N-1),by=2)] <- 1200*x^2-400*y+2
    d1 <- rep(0,2*N-1)
    d1[seq(1,(2*N-1),by=2)] <- -400*x

    H <- bandSparse(2*N,
                    k=c(-1,0,1),
                    diagonals=list(d1,d0,d1),
                    symmetric=FALSE,
                    repr='C')
    return(drop0(H))
}
```

For this demo, we
start at a random vector.

```{r}
set.seed(1234)
N <- 3
start <- as.vector(rnorm(2*N, -1, 3))
```

Next, we call `trust.optim`, with all default arguments.

```{r}
opt <- trust.optim(start, fn=f, gr=df, hs=hess, method="Sparse")
```

In the above output, `f` is the objective function, and `nrm_gr` is
the norm of the gradient. The `status` messages illustrate how the
underlying trust region algorithm is progressing, and are useful
mainly for debugging purposes.  Note that the objective value is
non-increasing at each iteration, but the norm of the gradient is
not.  The algorithm will continue until either the norm of the gradient is
less than the control parameter `prec`, the trust region radius is
less than `stop.trust.radius`, or the iteration count exceeds
`maxit`.  See the package manual for details of the control
parameters.  We use the default control parameters for this demo
(hence, there is no control list here.


The result contains the objective value, the minimum, the gradient at
the minimum (should be numerically zero for all elements), and the
Hessian at the minimum.

```{r}
opt
```
Note that `opt$fval`, and all elements of `opt$gradient` are zero,
within machine precision.  The solution is correct, and the Hessian is
returned as a compressed sparse Matrix object (refer to the *Matrix*
package for details).

One way to *potentially* speed up convergence (but not necessarily
compute time) is to apply a preconditioner to the algorithm.  Other
than the identity matrix (the default), the package current supports
only a modified Cholesky preconditioner.  This is implemented with a
control parameter `preconditioner=1`. To save space, we
report the optimizer status only ever 10 iterations.


```{r}
opt1 <- trust.optim(start, fn=f, gr=df, hs=hess, method="Sparse",
      control=list(preconditioner=1, report.freq=10))
```

Here, we see that adding the preconditioner actually increases the
number of iterations.  Sometimes preconditioners help a lot, and
sometimes not at all.

##Quasi-Newton Methods

The `trust.optim` function also supports quasi-Newton approximations
to the Hessian.  The two options are BFGS and SR1 updates.  See
@NocedalWright2006 for details.  You do not need to provide the
Hessian for these methods, and they are often preferred when the
Hessian is dense.  However, they may take longer to converge, which is
why we need to change the `maxit` control parameter.  To save space,
we report the status of the optimizer only every 10 iterations.

```{r}
opt.bfgs <- trust.optim(start, fn=f, gr=df, method="BFGS", control=list(maxit=5000, report.freq=10))
opt.bfgs
```

And we can do the same thing with SR1 updates.

```{r}
opt.sr1 <- trust.optim(start, fn=f, gr=df, method="SR1", control=list(maxit=5000, report.freq=10))
opt.sr1
```

Note that the quasi_Newton updates do not return a Hessian.  We do not
think that the final approximations from BFGS or SR1 updates are
particularly reliable.  If you need the Hessian, you can use the
*sparseHessianFD* package.

# References