| Title: | Derivative-Based Optimization with User-Defined Convergence Criteria |
|---|---|
| Description: | Provides a derivative-based optimization framework that allows users to combine eight convergence criteria. Unlike standard optimization functions, this package includes a built-in mechanism to verify the positive definiteness of the Hessian matrix at the point of convergence. This additional check helps prevent the solver from falsely identifying non-optimal solutions, such as saddle points, as valid minima. |
| Authors: | Eunseong Cho [aut, cre] |
| Maintainer: | Eunseong Cho <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.7 |
| Built: | 2026-05-31 13:26:18 UTC |
| Source: | https://github.com/eunscho/optimflex |
Implements the damped BFGS Quasi-Newton algorithm with a Strong Wolfe line search for non-linear optimization, specifically tailored for SEM.
bfgs( start, objective, gradient = NULL, hessian = NULL, lower = -Inf, upper = Inf, control = list(), ... )bfgs( start, objective, gradient = NULL, hessian = NULL, lower = -Inf, upper = Inf, control = list(), ... )
start |
Numeric vector. Starting values for the optimization parameters. |
objective |
Function. The objective function to minimize. |
gradient |
Function (optional). Gradient of the objective function. |
hessian |
Function (optional). Hessian matrix of the objective function. |
lower |
Numeric vector. Lower bounds for box constraints. |
upper |
Numeric vector. Upper bounds for box constraints. |
control |
List. Control parameters including convergence flags:
|
... |
Additional arguments passed to objective, gradient, and Hessian functions. |
bfgs is a Quasi-Newton method that maintains an approximation of the
inverse Hessian matrix. It is widely considered the most robust and
efficient member of the Broyden family of optimization methods.
BFGS vs. DFP:
While both bfgs and dfp update the inverse Hessian using
rank-two formulas, BFGS is generally more tolerant of inaccuracies in the
line search. This implementation uses the Sherman-Morrison formula to
update the inverse Hessian directly, avoiding the need for matrix inversion
at each step.
Strong Wolfe Line Search:
To maintain the positive definiteness of the Hessian approximation and
ensure global convergence, this algorithm employs a Strong Wolfe line search.
This search identifies a step length that satisfies both sufficient
decrease (Armijo condition) and the curvature condition.
Damping for Non-Convexity:
In Structural Equation Modeling (SEM), objective functions often exhibit
non-convex regions. When use_damped = TRUE, Powell's damping
strategy is applied to the update vectors to preserve the positive
definiteness of the Hessian approximation even when the curvature condition
is not naturally met.
A list containing optimization results and iteration metadata.
Nocedal, J., & Wright, S. J. (2006). Numerical Optimization. Springer.
Fletcher, R. (1987). Practical Methods of Optimization. Wiley.
quad <- function(x) (x[1] - 2)^2 + (x[2] + 1)^2 res <- bfgs(start = c(0, 0), objective = quad) print(res$par)quad <- function(x) (x[1] - 2)^2 + (x[2] + 1)^2 res <- bfgs(start = c(0, 0), objective = quad) print(res$par)
Implements the standard Powell's Dogleg Trust-Region algorithm for non-linear optimization.
dogleg( start, objective, gradient = NULL, hessian = NULL, lower = -Inf, upper = Inf, control = list(), ... )dogleg( start, objective, gradient = NULL, hessian = NULL, lower = -Inf, upper = Inf, control = list(), ... )
start |
Numeric vector. Starting values for the optimization parameters. |
objective |
Function. The objective function to minimize. |
gradient |
Function (optional). Gradient of the objective function. |
hessian |
Function (optional). Hessian matrix of the objective function. |
lower |
Numeric vector. Lower bounds for box constraints. |
upper |
Numeric vector. Upper bounds for box constraints. |
control |
List. Control parameters including convergence flags:
|
... |
Additional arguments passed to objective, gradient, and Hessian functions. |
This function implements the classic Dogleg method within a Trust-Region framework, based on the strategy proposed by Powell (1970).
Trust-Region vs. Line Search:
Trust-Region methods define a neighborhood around the current point (the trust region
with radius ) where a local quadratic model is assumed to be reliable.
Unlike Line Search methods that first determine a search direction and then
find an appropriate step length, this approach constrains the step size first
and then finds the optimal update within that boundary.
Powell's Dogleg Trajectory: The "Dogleg" trajectory is a piecewise linear path connecting:
The current point.
The Cauchy Point (): The minimizer of the quadratic model along
the steepest descent direction.
The Newton Point (): The unconstrained minimizer of the quadratic model .
The algorithm selects a step along this path such that it minimizes the quadratic
model while remaining within the radius .
Relationship to Double Dogleg:
While the double_dogleg algorithm (Dennis and Mei, 1979) introduces a bias
point to follow the Newton direction more closely, this standard Dogleg follows
the original two-segment trajectory.
A list containing optimization results and iteration metadata.
Powell, M. J. D. (1970). A Hybrid Method for Nonlinear Equations. Numerical Methods for Nonlinear Algebraic Equations.
Nocedal, J., & Wright, S. J. (2006). Numerical Optimization. Springer.
quad <- function(x) (x[1] - 2)^2 + (x[2] + 1)^2 res <- dogleg(start = c(0, 0), objective = quad) print(res$par)quad <- function(x) (x[1] - 2)^2 + (x[2] + 1)^2 res <- dogleg(start = c(0, 0), objective = quad) print(res$par)
Implements the Double Dogleg Trust-Region algorithm for non-linear optimization.
double_dogleg( start, objective, gradient = NULL, hessian = NULL, lower = -Inf, upper = Inf, control = list(), ... )double_dogleg( start, objective, gradient = NULL, hessian = NULL, lower = -Inf, upper = Inf, control = list(), ... )
start |
Numeric vector. Starting values for the optimization parameters. |
objective |
Function. The objective function to minimize. |
gradient |
Function (optional). Gradient of the objective function. |
hessian |
Function (optional). Hessian matrix of the objective function. |
lower |
Numeric vector. Lower bounds for box constraints. |
upper |
Numeric vector. Upper bounds for box constraints. |
control |
List. Control parameters including convergence flags starting with 'use_'.
|
... |
Additional arguments passed to objective, gradient, and Hessian functions. |
This function implements the Double Dogleg method within a Trust-Region framework, primarily based on the work of Dennis and Mei (1979).
Trust-Region vs. Line Search:
While Line Search methods (like BFGS) first determine a search direction and then
find an appropriate step length, Trust-Region methods define a neighborhood
around the current point (the trust region with radius ) where a local
quadratic model is assumed to be reliable. The algorithm then finds a step that
minimizes this model within the radius. This approach is generally more robust,
especially when the Hessian is not positive definite.
Powell's Dogleg vs. Double Dogleg:
Powell's original Dogleg method (1970) constructs a trajectory consisting of
two line segments: one from the current point to the Cauchy point, and another
from the Cauchy point to the Newton point. The "Double Dogleg" modification
by Dennis and Mei (1979) introduces an intermediate "bias" point ()
along the Newton direction.
Cauchy Point (): The minimizer of the quadratic model along
the steepest descent direction.
Newton Point (): The minimizer of the quadratic model .
Double Dogleg Point (): A point defined as ,
where is a scaling factor (bias) that ensures the path stays
closer to the Newton direction while maintaining monotonic descent in
the model.
This modification allows the algorithm to perform more like a Newton method earlier in the optimization process compared to the standard Dogleg.
A list containing optimization results and iteration metadata.
Dennis, J. E., & Mei, H. H. (1979). Two New Unconstrained Optimization Algorithms which use Function and Gradient Values. Journal of Optimization Theory and Applications, 28(4), 453-482.
Powell, M. J. D. (1970). A Hybrid Method for Nonlinear Equations. Numerical Methods for Nonlinear Algebraic Equations.
Nocedal, J., & Wright, S. J. (2006). Numerical Optimization. Springer.
quad <- function(x) (x[1] - 2)^2 + (x[2] + 1)^2 res <- double_dogleg(start = c(0, 0), objective = quad) print(res$par)quad <- function(x) (x[1] - 2)^2 + (x[2] + 1)^2 res <- double_dogleg(start = c(0, 0), objective = quad) print(res$par)
Provides a high-speed numerical gradient using forward or central differences.
fast_grad(f, x, diff_method = c("forward", "central"), ...)fast_grad(f, x, diff_method = c("forward", "central"), ...)
f |
Function. The objective function. |
x |
Numeric vector. Parameters at which to evaluate the gradient. |
diff_method |
String. Differentiation method: "forward" or "central". |
... |
Additional arguments passed to |
A numeric vector of gradients.
High-speed numerical Hessian calculation using finite differences.
fast_hess(f, x, diff_method = c("forward", "central"), ...)fast_hess(f, x, diff_method = c("forward", "central"), ...)
f |
Function. The objective function. |
x |
Numeric vector. Parameters at which to evaluate the Hessian. |
diff_method |
String. Differentiation method: "forward" or "central". |
... |
Additional arguments passed to |
A symmetric Hessian matrix.
Calculates the Jacobian matrix for a vector-valued function (e.g., residuals).
fast_jac(f_res, x, diff_method = c("forward", "central"), ...)fast_jac(f_res, x, diff_method = c("forward", "central"), ...)
f_res |
Function. A function returning a vector of residuals. |
x |
Numeric vector. Parameters at which to evaluate the Jacobian. |
diff_method |
String. Differentiation method: "forward" or "central". |
... |
Additional arguments passed to |
A Jacobian matrix of dimension (m x n).
Implements a full-featured Gauss-Newton algorithm for non-linear optimization, specifically optimized for Structural Equation Modeling (SEM).
gauss_newton( start, objective, residual = NULL, gradient = NULL, hessian = NULL, jac = NULL, lower = -Inf, upper = Inf, control = list(), ... )gauss_newton( start, objective, residual = NULL, gradient = NULL, hessian = NULL, jac = NULL, lower = -Inf, upper = Inf, control = list(), ... )
start |
Numeric vector. Starting values for the optimization parameters. |
objective |
Function. The objective function to minimize. |
residual |
Function (optional). Function that returns the residuals vector. |
gradient |
Function (optional). Gradient of the objective function. |
hessian |
Function (optional). Hessian matrix of the objective function. |
jac |
Function (optional). Jacobian matrix of the residuals. |
lower |
Numeric vector. Lower bounds for box constraints. |
upper |
Numeric vector. Upper bounds for box constraints. |
control |
List. Control parameters including convergence flags:
|
... |
Additional arguments passed to objective, gradient, and Hessian functions. |
gauss_newton is a specialized optimization algorithm for least-squares
and Maximum Likelihood problems where the objective function can be
expressed as a sum of squared residuals.
Scaling and SEM Consistency:
To ensure consistent simulation results and standard error (SE) calculations,
this implementation adjusts the Gradient and the Approximate
Hessian to match the scale of the Maximum Likelihood (ML)
fitting function . This alignment is critical when calculating
asymptotic covariance matrices using the formula .
Comparison with Newton-Raphson:
Unlike newton_raphson or modified_newton, which require the full
second-order Hessian, Gauss-Newton approximates the Hessian using the
Jacobian of the residuals. This is computationally more efficient and
provides a naturally positive-semidefinite approximation, though a ridge
adjustment is still provided for numerical stability.
Ridge Adjustment Strategy:
The function includes a "Ridge Rescue" mechanism. If the approximate Hessian
is singular or poorly conditioned for Cholesky decomposition, it iteratively
adds a diagonal ridge until numerical stability is achieved.
A list containing optimization results and iteration metadata.
Nocedal, J., & Wright, S. J. (2006). Numerical Optimization. Springer.
Bollen, K. A. (1989). Structural Equations with Latent Variables. Wiley.
quad <- function(x) (x[1] - 2)^2 + (x[2] + 1)^2 res <- gauss_newton(start = c(0, 0), objective = quad) print(res$par)quad <- function(x) (x[1] - 2)^2 + (x[2] + 1)^2 res <- gauss_newton(start = c(0, 0), objective = quad) print(res$par)
Efficiently checks if a matrix is positive definite using Cholesky decomposition.
is_pd_fast(M)is_pd_fast(M)
M |
Matrix. The matrix to verify. |
Logical. TRUE if positive definite, FALSE otherwise.
Performs bound-constrained minimization using the L-BFGS-B algorithm. This implementation handles box constraints via Generalized Cauchy Point (GCP) estimation and subspace minimization, featuring a limited-memory (two-loop recursion) inverse Hessian approximation.
l_bfgs_b( start, objective, gradient = NULL, hessian = NULL, lower = -Inf, upper = Inf, control = list(), ... )l_bfgs_b( start, objective, gradient = NULL, hessian = NULL, lower = -Inf, upper = Inf, control = list(), ... )
start |
Numeric vector. Initial values for the parameters. |
objective |
Function. The scalar objective function to be minimized. |
gradient |
Function (optional). Returns the gradient vector. If |
hessian |
Function (optional). Returns the Hessian matrix. Used for final
positive definiteness verification if |
lower, upper
|
Numeric vectors. Lower and upper bounds for the parameters. Can be scalars if all parameters share the same bounds. |
control |
A list of control parameters:
|
... |
Additional arguments passed to objective, gradient, and Hessian functions. |
A list containing optimization results and metadata.
This function adds three features for rigorous convergence control. First, it
applies an AND rule: all selected convergence criteria must be satisfied
simultaneously. Second, users can choose among eight distinct criteria
(e.g., changes in , , gradient, or predicted decrease) instead of relying
on fixed defaults. Third, it provides an optional verification using the
Hessian computed from derivatives (analytically when provided, or via numerical
differentiation). Checking the positive definiteness of this Hessian at the final
solution reduces the risk of declaring convergence at non-minimizing stationary
points, such as saddle points.
Byrd, R. H., Lu, P., Nocedal, J., & Zhu, C. (1995). A limited memory algorithm for bound constrained optimization. SIAM Journal on Scientific Computing, 16(5), 1190-1208.
Morales, J. L., & Nocedal, J. (2011). L-BFGS-B: Remark on algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization. ACM Transactions on Mathematical Software, 38(1), 1-4.
quad <- function(x) (x[1] - 2)^2 + (x[2] + 1)^2 res <- l_bfgs_b(start = c(0, 0), objective = quad) print(res$par)quad <- function(x) (x[1] - 2)^2 + (x[2] + 1)^2 res <- l_bfgs_b(start = c(0, 0), objective = quad) print(res$par)
Implements an optimized Newton-Raphson algorithm for non-linear optimization featuring dynamic ridge adjustment and backtracking line search.
modified_newton( start, objective, gradient = NULL, hessian = NULL, lower = -Inf, upper = Inf, control = list(), ... )modified_newton( start, objective, gradient = NULL, hessian = NULL, lower = -Inf, upper = Inf, control = list(), ... )
start |
Numeric vector. Starting values for the optimization parameters. |
objective |
Function. The objective function to minimize. |
gradient |
Function (optional). Gradient of the objective function. |
hessian |
Function (optional). Hessian matrix of the objective function. |
lower |
Numeric vector. Lower bounds for box constraints. |
upper |
Numeric vector. Upper bounds for box constraints. |
control |
List. Control parameters including convergence flags:
|
... |
Additional arguments passed to objective, gradient, and Hessian functions. |
modified_newton is a line search optimization algorithm that utilizes
second-order curvature information (the Hessian matrix) to find the minimum
of an objective function.
Modified Newton vs. Trust-Region:
Unlike the dogleg and double_dogleg functions which use a
Trust-Region approach to constrain the step size, this function uses a
Line Search approach. It first determines the Newton direction
(the solution to ) and then performs a backtracking line
search to find a step length that satisfies the sufficient decrease
condition (Armijo condition).
Dynamic Ridge Adjustment:
If the Hessian matrix is not positive definite (making it unsuitable for
Cholesky decomposition), the algorithm applies a dynamic ridge adjustment.
A diagonal matrix is added to the Hessian, where is
increased until the matrix becomes positive definite. This ensures the
search direction always remains a descent direction.
Differentiation Methods: The function allows for independent selection of differentiation methods for the gradient and Hessian:
forward: Standard forward-difference numerical differentiation.
central: Central-difference (more accurate but slower).
complex: Complex-step differentiation (highly accurate for gradients).
richardson: Richardson extrapolation via the numDeriv package.
A list containing optimization results and iteration metadata.
quad <- function(x) (x[1] - 2)^2 + (x[2] + 1)^2 res <- modified_newton(start = c(0, 0), objective = quad) print(res$par)quad <- function(x) (x[1] - 2)^2 + (x[2] + 1)^2 res <- modified_newton(start = c(0, 0), objective = quad) print(res$par)
Implements the standard Newton-Raphson algorithm for non-linear optimization without Hessian modifications or ridge adjustments.
newton_raphson( start, objective, gradient = NULL, hessian = NULL, lower = -Inf, upper = Inf, control = list(), ... )newton_raphson( start, objective, gradient = NULL, hessian = NULL, lower = -Inf, upper = Inf, control = list(), ... )
start |
Numeric vector. Starting values for the optimization parameters. |
objective |
Function. The objective function to minimize. |
gradient |
Function (optional). Gradient of the objective function. |
hessian |
Function (optional). Hessian matrix of the objective function. |
lower |
Numeric vector. Lower bounds for box constraints. |
upper |
Numeric vector. Upper bounds for box constraints. |
control |
List. Control parameters including convergence flags:
|
... |
Additional arguments passed to objective, gradient, and Hessian functions. |
newton_raphson provides a classic second-order optimization approach.
Comparison with Modified Newton:
Unlike modified_newton, this function does not apply dynamic ridge
adjustments (Levenberg-Marquardt style) to the Hessian. If the Hessian is
singular or cannot be inverted via solve(), the algorithm will
terminate. This "pure" implementation is often preferred in simulation
studies where the behavior of the exact Newton step is of interest.
Predicted Decrease:
This function explicitly calculates the Predicted Decrease (),
which is the expected reduction in the objective function value based on
the local quadratic model:
Stability and Simulations:
All return values are explicitly cast to scalars (e.g., as.numeric,
as.logical) to ensure stability when the function is called within
large-scale simulation loops or packaged into data frames.
A list containing optimization results and iteration metadata.
Nocedal, J., & Wright, S. J. (2006). Numerical Optimization. Springer.
Bollen, K. A. (1989). Structural Equations with Latent Variables. Wiley.
quad <- function(x) (x[1] - 2)^2 + (x[2] + 1)^2 res <- newton_raphson(start = c(0, 0), objective = quad) print(res$par)quad <- function(x) (x[1] - 2)^2 + (x[2] + 1)^2 res <- newton_raphson(start = c(0, 0), objective = quad) print(res$par)