F1Method.jl Documentation

This package provides an efficient tool to compute gradient and Hessian matrix of an objective function implicitly defined by the solution of a steady-state problem.

Why the F-1 method?

When using Newton-type algorithms for optimization, computing the gradient and Hessian can be computationally expensive. A typical scientific application is to optimize the parameters of a model which solves for a root through another iterative Newton-like algorithm. In this case, there are a number of shortcuts that can be leveraged. This was the motivation for the work of Pasquier et al. (2019, in preparation), and for this package.

Usage

This is an example use of the software. We define a state function, $\boldsymbol{F}(\boldsymbol{x},\boldsymbol{p})$, to which we apply a solver based on Newton's method (for root searching) to find the steady-state solution, $\boldsymbol{x}$, such that $\boldsymbol{F}(\boldsymbol{x},\boldsymbol{p}) = 0$. This defines the steady-state solution as an implicit function of the parameters, $\boldsymbol{p}$. We denote this solution by $\boldsymbol{s}(\boldsymbol{p})$. The Newton solver requires the Jacobian, $\nabla_{\boldsymbol{x}}\boldsymbol{F}$, to update the state iterates. Hence, we start by creating the functions F(x,p) and ∇ₓF(x,p). As an example, we use a simple model with only two state variables and two parameters. (Note here for simplicity we use the ForwardDiff package to evaluate the Jacobian.)

# State function F
F(x,p) = [
    -2 * (p[1] - x[1]) - 4 * p[2] * (x[2] - x[1]^2) * x[1]
    p[2] * (x[2] - x[1]^2)
]

# Jacobian of F wrt p
∇ₓF(x,p) = ForwardDiff.jacobian(x -> F(x,p), x)

# output

∇ₓF (generic function with 1 method)

We also define a cost function f(x,p) (that we wish to minimize under the constraint that $\boldsymbol{F}(\boldsymbol{x},\boldsymbol{p}) = 0$). (The F-1 method requires that we provide the derivatives w.r.t. the state, x, hence the use of ForwardDiff again for this example.)

# Define mismatch function f(x,p) and its derivative ∇ₓf(x,p)
# (Note ∇ₓF and ∇ₓf are required by the F1 method)
function state_mismatch(x)
    δ(x) = x .- 1
    return 0.5δ(x)'δ(x)
end
function parameter_mismatch(p)
    δ(p) = log.(p)
    return 0.5δ(p)'δ(p)
end
f(x,p) = state_mismatch(x) + parameter_mismatch(p)
∇ₓf(x,p) = ForwardDiff.jacobian(x -> [f(x,p)], x)

# output

∇ₓf (generic function with 1 method)

Once these are set up, we need to let the F-1 method know how to solve for the steady-state. We do this by using the DiffEqBase API. For that, we first write a small Newton solver algorithm, we overload the solve function from DiffEqBase, and we overload the SteadyStateProblem constructor.

function newton_solve(F, ∇ₓF, x; Ftol=1e-10)
    while norm(F(x)) ≥ Ftol
        x .-= ∇ₓF(x) \ F(x)
    end
    return x
end

# Create a type for the solver's algorithm
struct MyAlg <: DiffEqBase.AbstractSteadyStateAlgorithm end

# Overload DiffEqBase's solve function
function DiffEqBase.solve(prob::DiffEqBase.AbstractSteadyStateProblem,
                          alg::MyAlg;
                          Ftol=1e-10)
    # Define the functions according to DiffEqBase.SteadyStateProblem type
    p = prob.p
    t = 0
    x0 = copy(prob.u0)
    dx, df = copy(x0), copy(x0)
    F(x) = prob.f(dx, x, p, t)
    ∇ₓF(x) = prob.f(df, dx, x, p, t)
    # Compute `u_steady` and `resid` as per DiffEqBase using my algorithm
    x_steady = newton_solve(F, ∇ₓF, x0, Ftol=Ftol)
    resid = F(x_steady)
    # Return the common DiffEqBase solution type
    DiffEqBase.build_solution(prob, alg, x_steady, resid; retcode=:Success)
end

# Overload DiffEqBase's SteadyStateProblem constructor
function DiffEqBase.SteadyStateProblem(F, ∇ₓF, x, p)
    f(dx, x, p, t) = F(x, p)
    f(df, dx, x, p, t) = ∇ₓF(x, p)
    return DiffEqBase.SteadyStateProblem(f, x, p)
end

# output

We chose an initial value for the state, x, and the parameters, p:

x₀, p₀ = [1.0, 2.0], [3.0, 4.0]

# output

([1.0, 2.0], [3.0, 4.0])

Finally, we wrap the objective, gradient, and Hessian functions defined by the F-1 method.

# Initialize the cache for storing reusable objects
mem = F1Method.initialize_mem(x₀, p₀)
# Define the functions via the F1 method
F1_objective(p) = F1Method.objective(f, F, ∇ₓF, mem, p, MyAlg())
F1_gradient(p) = F1Method.gradient(f, F, ∇ₓf, ∇ₓF, mem, p, MyAlg())
F1_Hessian(p) = F1Method.hessian(f, F, ∇ₓf, ∇ₓF, mem, p, MyAlg())

# output

F1_Hessian (generic function with 1 method)

We can now use these directly to compute objective, gradient, and Hessian:

F1_objective(p₀)

# output

35.56438050824269

F1_gradient(p₀)

# output

1×2 Array{Float64,2}:
 50.3662  0.346574

F1_Hessian(p₀)

# output

2×2 Array{Float64,2}:
 52.989   0.0
  0.0    -0.0241434