Nonmonotone and Perturbed Optimization
Abstract
The primary purpose of this research is the analysis of nonmonotone optimization algorithms to which standard convergence analysis techniques do not apply. We consider methods that are inherently nonmonotone, as well as nonmonotonicity induced by data perturbations or inexact subproblem solution. One of the principal applications of our results is the analysis of gradient-type methods that process the data incrementally. The computational significance of these algorithms is well documented in the neural networks literature. such algorithms are known to be particularly well-suited for large data sets, as well as for real time applications. One of the most important methods of this type is the classical online back propagation (BP) algorithm for training artificial neural networks. Neural networks constitute a large interdisciplinary area of research within the broader area of machine learning that has found applications in many branches of science and technology. However, much of the network in the area has been based on heuristic concepts and trial and error experimentation. This research fills some of the existing theoretical gaps. in particular, we obtain the first deterministic convergence results for the BP algorithm and its various practically important modifications.
We also investigate error-stability properties of the generalized gradient projection method. When specialized to neural network training, our general results allow us to establish stability of BP in the presence of noise, and give its precise characterization. We also outline applications to weight perturbation training. In a classical optimizations setting, some new results are derived for a perturbed generalize gradient projection method applied to convex and weakly sharp problems.
Next we develop a general approach to convergence analysis of feasible descent methods in the presence of perturbations. The important novel feature of our analysis is the perturbations need not tend to zero in the limit. in this case, standard convergence analysis techniques are not applicable, and we present a new approach. It is shown that a certain E-approximate solution can be obtained, where E depends on the level of perturbations linearly. Applications to the gradient projection, proximal minimization and extragradient algorithms and described.
We also consider a practical generalization of the parallel variable distribution algorithm of Ferris and Mangasarian. In particular, our generalization is twofold: we propose an asynchronous algorithm, and allow inexact subproblem solution. We show that the generalized method retains all the attractive properties of the original method and yet is more practical. We also derive some stronger convergence results for algorithms of this class.
Permanent Link
http://digital.library.wisc.edu/1793/65046Type
Technical Report
Citation
95-13